Job Overview
We are looking for a Senior/Staff Platform Engineer to help improve the reliability, performance, and scalability of our production platform.
This role focuses on operating reliable infrastructure, improving observability, driving incident response, and using data-driven reliability practices such as SLIs, SLOs, SLAs, error budgets, and DORA metrics. Database experience with MongoDB, Elasticsearch, or Redis is a must.
Help us run and secure our platform that allows our users to connect and create their part of the VRChat universe. If you’re interested in keeping the machinery behind the scenes humming and finely tuned, then this role could be right up your alley.
The role reports to the Head of Platform at VRChat. This Engineer will work closely with the IT and Engineering teams, as well as the heads of various functions to plan and deploy infrastructure.
Duties & Responsibilities
Operate and improve production infrastructure with a focus on reliability, security, performance, and cost efficiency.
Define, measure, and improve reliability using SLIs, SLOs, SLAs, error budgets, and DORA metrics.
Build and improve monitoring, alerting, dashboards, logging, and incident response processes.
Participate in incident management, root cause analysis, postmortems, and follow-up remediation.
Automate infrastructure and operational workflows using modern IaC and scripting tools.
Work closely with engineering teams to improve service reliability, deployment quality, and operational readiness.
Turn ambiguous infrastructure, reliability, and operational problems into clear, scalable, and measurable solutions.
Engage with backend codebases through code reviews, pull requests, and occasional feature or tooling work to build shared context with product engineering teams.
Experience, Skills & Qualifications
8+ years of experience in SRE, DevOps, Platform Engineering, or Infrastructure Engineering.
Strong experience operating high-availability production systems.
Experience with cloud or hybrid cloud environments and tools such as Terraform or OpenTofu.
Strong knowledge of Linux, networking, automation, observability, and incident management.
Strong communication skills and ability to work with technical and non-technical stakeholders.
Operational knowledge of databases such as MongoDB, Elasticsearch, or Redis.
Nice to Have
Experience with AWS, including core infrastructure services, cost optimization, and multi-account architecture.
Experience with Kubernetes, including networking, service discovery, ingress, and workload reliability.
Experience with Cilium or other Kubernetes networking/security solutions.
Experience supporting large-scale storage systems.
Experience with CDNs, caching, distributed systems, or real-time platforms.
Benefits
Work from anywhere! VRChat is a 100% remote company
Health Benefits
401K for US & RRSP for Canadian Employees
Stock Options
Generous paid holiday schedule
Unlimited/Flexible vacation time
Paid parental leave benefits