About the role:
We are seeking an exceptional Platform Engineering Manager to lead our Cloud Infrastructure Squad within the Engineering organization. This role will report to the VP of Platform Engineering. As the owner of Garner’s cloud infrastructure platform, you will be responsible for the reliability, scalability, and security of the foundational systems and services that every engineering team depends on.
What you will do:
- Own the reliability, performance, and security of Garner’s cloud infrastructure on AWS, serving as the accountable party for platform uptime, incident response, and operational excellence
- Define and execute the technical roadmap for the cloud infrastructure squad, prioritizing investments across Kubernetes, Terraform, Istio, Postgres, NATS, and observability tooling
- Lead and develop a high-performing team of infrastructure engineers. Hire exceptional talent, set clear expectations, provide rigorous feedback, and hold the team accountable to Garner’s high standards
- Partner closely with product engineering, data engineering, and security teams to ensure the platform enables fast, safe, and scalable delivery across the organization
- Establish and monitor key reliability and performance metrics (SLOs, SLAs, error budgets); use data to drive decisions and continuously improve platform quality
- Manage quarterly planning and delivery for the squad, translating Garner’s engineering priorities into a clear roadmap, while coordinating dependencies, and escalating tradeoffs proactively and appropriately
The ideal candidate has:
- 5+ years of hands-on cloud infrastructure or platform engineering experience, with at least 2 years in an engineering management role leading cloud-based infrastructure teams
- Deep expertise with AWS and modern infrastructure tooling, including Kubernetes, Terraform, and at least a working familiarity with Istio, Postgres, and NATS
- A track record of building and scaling reliable, secure, and cost-efficient cloud platforms, including experience setting and managing SLOs/SLAs and driving incident response processes
- Strong people management skills with the ability to hire great engineers, deliver candid developmental feedback, and build a culture of ownership and high standards
- Excellent cross-functional communication, articulating technical tradeoffs clearly to engineering peers and non-technical stakeholders alike, while holding stakeholders accountable with clarity and directness
- Sound judgment in balancing engineering velocity with reliability and security, knowing when to slow down or speed up and the skill to cut through complexity and ambiguity
- A desire to be a part of a high-performing, mission-driven team that operates with intense urgency, a strong sense of individual accountability, and a commitment to authentic feedback
Technologies we use:
- Python, Go, Terraform, Kubernetes, Istio, Postgres, ElasticSearch, NATS, AWS, Claude