Back to Jobs

Senior Software Engineer, Data Processing

TrulyRemote Verified

Hand-curated global remote job with direct application link

Technical Requirements

PythonAWSData PipelinesDistributed SystemsSQLAirflowDagster

About the Role

Protege is hiring a Senior Software Engineer to own the data processing layer at ingestion — the part of the platform that takes large-scale source data and turns it into clean, structured, enriched, validated, AI-ready datasets. This is a hands-on, backend- and data-heavy role with end-to-end ownership of the pipelines that move and process data at volume.

Ingestion & Processing Systems

  • Design, build, and operate the ingestion systems that process large volumes of multimodal data into usable, well-structured datasets
  • Own the ingestion path end to end, from how data lands to how it is validated, processed, tracked, and made available downstream
  • Build modality-specific processing steps for real-world source data, such as medical imaging processing, audio and video metadata extraction, quality validation, and notes processing
  • Build parsers, validators, and normalization logic that can systematically handle messy, non-standard, and high-variance source formats
  • Turn repeated one-off data handling work into reusable processing patterns, internal tooling, and platform capabilities

Scale, Performance & Reliability

  • Build for high volume and high throughput, optimizing systems for reliability, cost, and speed
  • Work across distributed and parallel compute systems to process workloads that do not fit well on a single machine
  • Choose the right execution model for the workload, including batch processing, distributed execution, and modern compute patterns for unstructured data and inference-heavy processing
  • Diagnose and resolve bottlenecks across ingestion and processing systems, and keep performance from degrading as volume and modality complexity grow

Data Quality, Security & Compliance

  • Build validation and quality checks that catch bad, incomplete, or malformed data before it propagates downstream
  • Handle sensitive and regulated data, including PHI, with the security and care the domain demands, including de-identification where required
  • Track provenance, metadata, and usage constraints through the ingestion path so downstream use remains compliant and auditable
  • Raise the quality bar for observability, debuggability, and operational reliability across the ingestion layer

Cross-Functional Partnership

  • Partner with product and Data Lab to support new modalities, new partner requirements, and non-standard source data
  • Work directly with partner engineering teams when needed to translate source-system realities into robust ingestion and processing design
  • Surface recurring patterns that are worth standardizing into reusable transforms, validators, and internal tooling
  • Help shape how Protege handles new data types as the platform expands into more complex data environments

What You Bring

  • 5+ years building and operating production backend or data systems, with real experience in data processing at scale
  • Hands-on experience designing and running large-scale data pipelines
  • Strong programming skills in Python
  • Experience with distributed data processing
  • Strong proficiency with AWS
Senior Software Engineer, Data Processing
Protege
Apply