Personalized agent runtime: Agentic workflows that adapt to a user’s preferences, data, and ongoing behavior over time.
Memory & retrieval systems: Short/long-term memory, durable state, and retrieval pipelines across vector DBs and relational data.
Voice experiences (real-time + async): Speech-to-speech/voice agents, streaming audio pipelines, turn-taking, interruption handling, latency tuning, and QA for natural conversations.
Agent evaluation + reliability: Offline/online evals, regression suites, red-teaming, monitoring, and rollout controls so agents are trustworthy in production.
Production agent infrastructure: Scalable orchestration patterns for multi-step jobs, background tasks, and user-facing interactions (sync + async), with clear SLAs/SLOs.
Tooling + developer experience: Libraries and primitives that make it easy for the team to build new agent capabilities quickly and safely.

Ship user-facing agent experiences end-to-end: prototype → production → iteration based on real usage.
Architect and implement stateful agent systems (workflows, tool calling, memory, retrieval, and human-in-the-loop where needed).
Build voice features end-to-end where they unlock value: realtime speech agents, voice UI/UX, prompt/audio routing, and guardrails for safe tool execution.
Build/own an evaluation harness:
- curated test sets + scenario suites
- automated scoring / rubric-based graders
- prompt/model/version tracking
- canary + A/B experimentation and safe rollout patterns
Design data + retrieval pipelines:
- chunking, enrichment, metadata strategy
- hybrid retrieval (vector + keyword + structured filters)
- re-ranking, caching, and latency optimization
- multi-tenant safety and data isolation
Integrate with and extend our platform primitives:
- Django/DRF/ASGI services
- async execution + queues + workflow orchestration
- PostgreSQL + pgvector
- Kubernetes deployments, autoscaling, and cost controls
Establish engineering rigor for agents:
- observability (traces, spans, structured logs)
- reliability patterns (timeouts, retries, circuit breakers, graceful degradation)
- security/privacy controls for data access and tool execution

Strong software engineering fundamentals (design, testing, code quality, performance, security).
Production experience deploying AI systems in front of users (not just notebooks/demos).
Experience building agentic or LLM-powered systems with memory and tool use.
Comfort working across application + infrastructure layers: APIs, background jobs, data stores, and deployment.
Hands-on experience with at least one agent framework (or equivalent custom implementation), such as:
- LangChain / LangGraph
- LlamaIndex
- AutoGen / CrewAI-style multi-agent patterns
Strong understanding of retrieval and vector search concepts: embeddings, indexing, filtering, evaluation.

Experience with vector databases and/or search stacks (e.g., Pinecone, Chroma, Weaviate, Qdrant, pgvector).
Experience designing evaluation systems (offline eval, human eval loops, production monitoring, prompt/model regression).
Experience building voice/real-time systems (streaming, WebRTC or similar), and/or integrating speech (STT/TTS) into production applications.
Experience building durable, long-running workflows (Temporal or similar orchestration engines).
Familiarity with observability tooling (OpenTelemetry, Datadog, or similar).
Experience shipping multi-tenant SaaS systems with strong privacy boundaries.

System design for agentic applications (state, memory, evaluation, failure modes).
Practical retrieval/RAG design (data modeling, indexing, relevance, latency).
Production engineering practices (testing strategy, observability, rollouts).
Ability to communicate tradeoffs and make good technical decisions under uncertainty.

Compensation: Competitive salary commensurate with experience (Senior level)
Location: Remote
Type: Full-time
Requirements: Overlap with Americas timezones for collaboration; reliable high-speed internet

Senior AI Engineer (Core)