Role at a glance
As Rally's founding Platform / Infrastructure Engineer, you'll take full ownership of our cloud infrastructure, CI/CD platform, and developer experience — building the foundation that lets our engineering team move fast, ship reliably, and scale into an AI-native future.
What you'll do
- CI/CD: Evaluate, select, and own Rally's CI/CD platform. Define and track DORA metrics to drive continuous improvement in delivery velocity.
- Developer Experience: Build on-demand ephemeral preview environments for a large service footprint that can't run locally. Improve inner-loop developer workflows: build times, local tooling, and service scaffolding.
- Infrastructure ownership: Own Rally's full AWS stack (ECS/Fargate, Aurora PostgreSQL, MSK, DynamoDB), and mature our Terraform IaC — modularization, drift detection, CI for infra changes. Own cost optimization and per-team cost visibility.
- Observability & reliability: Maintain and evolve Rally's Datadog observability stack. Build automation tools and runbooks to reduce operational toil and accelerate incident recovery. Drive post-incident reviews and translate findings into systemic improvements.
- Security & compliance: Container scanning, secrets management, IAM least-privilege enforcement. Support SOC 2 audit requirements as needed.
What you'll bring
- 5+ years in infrastructure, platform engineering, or SRE
- Deep AWS experience: ECS/Fargate, RDS/Aurora, MSK, DynamoDB
- Production Terraform with a track record of improving IaC maturity
- CI/CD platform experience
- Strong observability fundamentals
Nice to have
- Early/founding platform team experience
- Node.js/TypeScript familiarity (Prisma, GraphQL)
- Kafka or equivalent event streaming operations
- Temporal or similar workflow orchestration
- Deploying Ephemeral/preview environments for complex distributed monolith architectures
- FinOps practices and cost tooling
- Edge/CDN deployment (CloudFront, Cloudflare Workers, Vercel)