Who We Are

Lightning AI is the company behind PyTorch Lightning. Founded in 2019, we build an end-to-end platform for developing, training, and deploying AI systems—designed to take ideas from research to production with less friction.

Through our merger with Voltage Park, a neocloud and AI Factory, Lightning AI combines developer-first software with cost-efficient, large-scale compute. Teams get the tools they need for experimentation, training, and production inference, with security, observability, and control built in.

What We’re Looking For

Lightning AI is seeking a Senior Network Engineer with hands-on Cumulus Linux expertise to build and scale the network backbone behind our AI infrastructure platform. You’ll play a critical role in designing highly reliable, automated data center networks that support some of the most demanding AI workloads in the world.

What You'll Do

Design and deploy scalable spine/leaf network architectures for AI data centers
Engineer high-performance Ethernet fabrics supporting GPU clusters and AI workloads
Build and maintain EVPN/VXLAN, BGP, and high-speed routing environments
Optimize east-west traffic flows for AI training and inference operations
Support RoCE/RDMA networking and low-latency transport technologies
Support backbone, DCI, WAN, and edge connectivity solutions
Collaborate with compute, storage, AI platform, and operations teams to deliver integrated infrastructure solutions
Develop automation and Infrastructure-as-Code (IaC) solutions for network provisioning and operations
Troubleshoot complex network, performance, and congestion issues across distributed environments
Improve network observability, telemetry, and operational visibility

Required Qualifications

Experience with Cumulus NOS
5+ years of experience in large-scale data center networking
Experience in spine-leaf architectures and L3 fabrics
Experience with BGP, EVPN, VXLAN
Experience operating high-performance computing (HPC) or GPU-dense environments
Experience designing networks for hyperscalers, neoclouds, or high-scale SaaS infrastructure
Experience in automation with Python, Ansible, or Terraform
Experience with network observability tooling and telemetry pipelines

Ideal Experience

Familiarity with NVIDIA networking (Spectrum, Quantum, BlueField, etc.)
Familiarity with RDMA, RoCE, or InfiniBand fabrics
Experience with multi-region backbone design
Exposure to bare-metal provisioning systems

Senior Network Engineer

TrulyRemote Verified

Technical Requirements

Who We Are

What We’re Looking For

What You'll Do

Required Qualifications

Ideal Experience

Similar Jobs

Senior Software Engineer (Full Stack)

Senior DevOps Engineer

Staff Software Engineer

DevOpsSec Engineer