Position Overview
We are looking for a Senior Platform Architect to own the platform-level architecture of our scale-up and scale-out systems as we grow from single-accelerator products to server- and rack-level deployments. In this role you will define how our current and next-generation accelerators come together as high-performance platforms, defining system topology, host integration, memory organization and platform-level requirements to achieve performance, scalability and operability.
Key responsibilities:
- Define the platform-level architecture of scale-up and scale-out fabrics across our accelerator product line, from single-node systems through to rack level deployments.
- Specify physical interconnect infrastructure, defining architectural requirements and trade-offs including how interconnect characteristics impact system-level workload performance.
- Help translate workload and performance requirements into concrete system-level architecture and interconnect specifications.
- Define host integration, covering PCIe, CXL and memory attachment, and the system-level memory organization across the platform.
- Own platform performance, power, cost and scalability, ensuring scale-up and scale-out designs sustain target performance as systems grow from single nodes to large clusters.
- Define operability and RAS (reliability, availability, serviceability) requirements so platforms are manageable and dependable in production.
- Take a leading technical role in the system architecture team, interfacing directly with key partners and internal stakeholders to align architecture decisions.
- Drive methodology and best practices for platform-level design as the team scales.
Qualifications:
- Experience: Significant experience (5+ years) in system, platform, or hardware architecture, with a strong track record at server and/or rack scale.
- Core knowledge: Scale-up and scale-out system design, distributed workload mapping, functional partitioning, and interconnect/fabric architecture.
- Fabrics & interconnect: Deep, hands-on understanding of Ethernet, UALink, optical, and switched-fabric technologies, and the ability to reason about their system-level performance trade-offs.
- Systems thinking: Ability to connect workload characteristics to hardware architecture and to quantify the impact of design choices on end-to-end performance.
- Leadership and collaboration: Demonstrated ability to lead and collaborate across multidisciplinary teams and to interface effectively with partners and senior stakeholders.
- Bonus: Experience architecting AI/HPC accelerator systems, familiarity with distributed training/inference workloads, and exposure to data-center-scale deployment.