Roles & Responsibilities:
- Act as the primary Product Owner for the Continuous Integration (CI) system, managing how it centrally runs builds and tests by seamlessly hooking into source control, the build system, and integrating tightly with Buildkite and GitHub
- Design and implement short and long term scalable integrations that connect Bazel with the CI/CD pipelines to automate workflows, integrate with AI/LLM/Agent tools, trigger specific jobs, manage build artifacts, and ensure proper handoffs
- Build, scale, and manage the CI AWS infrastructure, overseeing capacity planning and actively working to optimize performance by reducing job execution times
- Develop comprehensive observability (including logs, metrics, traces, and alerts) into the CI infrastructure to accurately track system efficiency, identify errors, and focus reliability efforts.
- Maintain the CI system as a reliable, high-performing platform, providing day-to-day on-call support, incident management, and troubleshooting for the wider software engineering organization.
Knowledge, Skills and Abilities:
- Strong knowledge of CI/CD best practices and principles. Direct experience with Buildkite is a plus.
- Experience working with infrastructure as code technologies such as Terraform or equivalent.
- Experience observability technologies like Promtheus, OpenTelemetry or similar.
- Experience with Linux operating systems in both day to day development and operational environments.
- Strong knowledge of AWS or other similar cloud platforms.
- Bonus would be knowledge building services and tools in Go, Python or similar languages.