🚀 About Us
We’re Bolt.new by StackBlitz! We’re the team that brought you WebContainers, the first-of-its-kind technology that made it possible to run Node.js right inside your browser. We built Bolt.new — the fastest way to go from idea to production without writing traditional code. It’s a next-gen, AI-powered app builder that helps you create, edit, and deploy full-stack web and mobile apps instantly.
✨ About This Opportunity
As a Staff Site Reliability Engineer, you'll be the reliability conscience of our engineering organization, embedding with product and platform teams to shape designs and ensure systems are observable, scalable, and operable. You will set technical direction, define standards, and drive initiatives that span multiple teams. While this is a high-influence individual-contributor role, you will respond to incidents and share the on-call rotation alongside the team to ensure the reliability of millions of developers building on our platform.
🛠️ How You'll Contribute
- Embed With Teams Early: Partner with development teams throughout the project lifecycle, from design and architecture reviews through launch readiness.
- Define Production-Readiness Standards: Establish and evolve design reviews, launch checklists, and operational acceptance criteria.
- Make Reliability Measurable: Define meaningful SLIs, SLOs, and error budgets to drive prioritization decisions.
- Build the Paved Roads: Create frameworks and tooling across AWS, GCP, and Azure, using Terraform as the common backbone.
- Cross-Team Leadership: Influence roadmaps, resolve technical disagreements, and identify process or technical debt across the organization.
- Mature Our Incident Practice: Lead by influence on incident management and blameless postmortems.
- Represent Us Externally: Build relationships with cloud providers and represent the company in customer trust conversations.
- On-call rotation: Participate in an on-call rotation, currently one week per month.
💡 Qualifications
- Multi-Cloud Fluency: Proficiency across AWS, GCP, and Azure with a strong focus on Terraform.
- Our Stack: Comfort supporting and contributing to TypeScript (frontend/backend) and Ruby on Rails services.
- SRE / Production Engineering Experience: Significant experience operating at scale in SRE or platform engineering roles.
- Software Engineering Excellence: Strong fundamentals with the ability to write production-quality code.
- Technical Leadership & Influence: Demonstrated ability to lead across team boundaries without formal authority.
- Strategic Execution & Systems Thinking: Ability to drive ambiguous, high-scope problems to completion and identify systemic technical debt.
- Data-Driven Leadership: Experience building measurement frameworks and translating operational data into improvements.