Back to Jobs

Senior Platform Engineer — AI Agent Infrastructure

TrulyRemote Verified

Hand-curated global remote job with direct application link

Technical Requirements

Event-driven ArchitectureAWSPostgreSQLMongoDBDockerTerraformDatadogGo

Who We Are

At Yuno, we are building the payment infrastructure that allows all companies to participate in the global market. Founded by seasoned experts from the payments and tech industries, our technology provides access to leading payment capabilities, enabling companies to engage customers confidently and maintain global operations through seamless integrations.

We empower high-performing teams at brands like InDrive, McDonald’s, Rappi, and Viva Aerobus to integrate over 1,000 payment methods via a single API. By leveraging advanced AI and the latest technologies, we orchestrate smart routing and fraud prevention across 80+ countries.

About The Role

Yuno is building a platform that provisions, deploys, and manages AI agents at scale on AWS. The platform is in production and growing. We need someone to own the infrastructure, evolve the architecture, and make sure the system is reliable, observable, and ready to scale.

This is not a "maintain what exists" role. You'll drive architectural decisions — designing event-driven communication, improving streaming reliability, building observability, and shaping the platform's infrastructure as it grows.

Your Contribution Will Be

  • Messaging and event-driven architecture — design and implement the messaging layer for inter-service communication, replacing synchronous patterns with durable, reliable async messaging.
  • Infrastructure and deployment — own the cloud infrastructure, automate provisioning with IaC, and ensure the platform scales reliably.
  • Observability and reliability — build the monitoring, tracing, and alerting that keeps the platform healthy. When something breaks at 3am, your dashboards and alerts should explain why before anyone has to dig.
  • Platform evolution — evaluate and drive architectural decisions as the platform matures. Strong voice in choosing technologies, designing systems, and deciding when to evolve the infrastructure.

Skills You Need

Minimum Qualifications

  • Event-driven architecture and messaging systems — you've designed systems around message queues (Kafka, NATS, RabbitMQ, or similar). You understand at-least-once delivery, consumer groups, dead letters, backpressure, and ideally have migrated a system from synchronous to async messaging.
  • AWS — deep experience with EC2, VPC, IAM, S3, RDS. You understand networking because inter-service communication runs over internal VPC.
  • Databases — solid knowledge of both SQL (PostgreSQL) and NoSQL (MongoDB, Redis). You understand when to use each, indexing strategies, replication, and performance tuning.
  • Docker — container lifecycle, resource limits, health checks, bind mounts, multi-stage builds.
  • Distributed systems debugging — you've debugged async flows and cascading failures across services in production, and can explain what failed and how you fixed it.
  • Infrastructure as Code — Terraform or Pulumi. You believe infrastructure should be reviewed in PRs, not clicked in consoles.
  • Observability — Datadog fluency or equivalent (dashboards, monitors, APM, log pipelines, distributed tracing).
  • Tech Stack — hands-on experience with Go, AWS (EC2, S3, VPC, RDS PostgreSQL), Docker, PostgreSQL, MongoDB, Redis, and Datadog.

Preferred Qualifications 

  • AI / MLOps infrastructure — experience running AI workloads in production (model serving, LLM inference, GPU/resource management, agent evaluation and observability tools like LangFuse, LangSmith, Braintrust, MLflow).
  • Multi-tenant container platforms — experience with platforms that run customer/user workloads in containers (Replit, Railway, Fly.io, or internal PaaS systems).
  • Kubernetes — you've done the migration from "Docker on bare EC2" to K8s at least once and know what breaks during the transition.
  • Data pipelines and orchestration — Airflow, Prefect, or similar. Knowledge of data warehouses (Databricks, Snowflake, BigQuery) is a plus

Nice to Have

  • ECS experience.
  • s6-overlay for container process supervision.
  • Experience with AI agent framework ecosystems.

What We Offer at Yuno

  • Competitive Compensation.
  • Remote Work – You can work from everywhere!
  • Home Office Bonus – A one-time allowance to help you create your ideal home office.
  • Work Equipment.
  • Stock Options.
  • Health Plan wherever you are.
  • Flexible Days Off.
  • Language, Professional, and Personal Growth courses.
Senior Platform Engineer — AI Agent Infrastructure
Yuno
Apply