Crusoe
AI & HPC managed services
Building managed infrastructure for GPU-intensive AI training and inference workloads — Kubernetes orchestration across NVIDIA and AMD accelerators, Slurm scheduling, and the operational tooling around them.
// Senior Staff Software Engineer
AI infrastructure·Distributed systems·Crusoe
I design and operate the cloud-native orchestration platforms that eliminate the underlying infrastructure complexity of GPU-intensive AI workloads. Two decades across platform engineering, distributed systems, and DevOps — solving the problems no one else wants to own. If I do my job right, you have no idea I exist.
AI & HPC infrastructure
Managed services for GPU-intensive training and inference — NVIDIA and AMD accelerators on Kubernetes, Slurm scheduling, and heterogeneous compute at scale.
Distributed systems at scale
Architecting the data, compute, and control planes that run production — and keeping them running when they're under load.
Operating at production scale
On-call, incidents, capacity, observability — the operational discipline that turns a working system into a reliable one.
DevOps & GitOps
Self-service CI/CD, infrastructure-as-code, container delivery, and the deployment automation behind two decades of production systems.
Crusoe
Building managed infrastructure for GPU-intensive AI training and inference workloads — Kubernetes orchestration across NVIDIA and AMD accelerators, Slurm scheduling, and the operational tooling around them.
Workday
Two-phase migration architecture — DataSync for transfer, EMR Spark for transformation — that decoupled copy from logic, allowing thorough validation and reuse downstream. Delivered without disrupting production workloads.
Workday
Architected and operated the EKS-based telemetry platform that replaced a sprawl of per-system tooling — single pane of glass across every environment, lower capex, and the operational signal engineers actually trusted.
Symantec
Led the production deployment of Symantec Endpoint Protection Cloud — the company's first SaaS product — and built the self-service CI/CD pipeline behind it from scratch.
| Years | Where | What |
|---|---|---|
| 2025 – now | Crusoe | AI & HPC infrastructure. Managed services for GPU workloads — NVIDIA and AMD accelerators on Kubernetes at scale. Current |
| 2019 – 2024 | Workday | Distributed infrastructure, DevOps tooling, and fleet-wide observability. DataLake migration to AWS. Kubernetes platform for public-cloud delivery with zero-downtime deploys. |
| 2014 – 2019 | Symantec | Cloud security. First SaaS product to production. Established in-house DevOps practice — self-service CI/CD, IaC, and microservice containerization with Docker & Kubernetes. |
| 2008 – 2014 | USC · NASA JPL | MS & PhD coursework. Earth-science data systems at JPL. Built a git-based assignment-delivery and grading pipeline as TA — early DevOps instincts. |
| 2004 – 2008 | KFUPM | BS, Computer Engineering. Hardware-software fundamentals. |
I've spent two decades building infrastructure that has to work — Kubernetes platforms, data pipelines, observability systems, and the operational discipline behind all of them. Currently at Crusoe, building managed AI/HPC infrastructure for GPU workloads. Previously at Workday (distributed systems and DevOps) and Symantec (cloud security).
I care about systems that are operable, not just designed. The interesting work is in the failure modes, the migrations, the incidents — the parts that don't make it into architecture diagrams. Patient with detail, allergic to drama, comfortable on the bridge when production is on fire.
Stack
// elsewhere
Open to conversations about AI infrastructure, platform engineering, and hard problems at scale.