Software Engineer, Cloud Infrastructure

OpenAI (Headquarters: San Francisco)

Location: San Francisco, Seattle   |   Full-Time
Kubernetes Envoy Istio Networking Cloud Infrastructure Distributed Systems Cloud Native Azure Go Python ChatGPT OpenAI API AI Engineer Back End Engineer
Join the Cloud Infrastructure team at OpenAI Applied and build the foundational platform for ChatGPT and the OpenAI API. OpenAI is an AI research and deployment company focused on ensuring AGI benefits humanity.

As a Software Engineer on this team, you will be responsible for designing, building, and operating the core infrastructure that powers OpenAI's large-scale AI models and products like ChatGPT and the API. This involves tackling challenges in distributed systems, networking, and container orchestration to ensure the platform is scalable, reliable, and performant.

Responsibilities:
- Design, build, and operate large-scale Kubernetes clusters.
- Implement and manage service mesh solutions using Istio and Envoy.
- Develop and maintain networking infrastructure to support high-traffic services.
- Automate infrastructure deployment, scaling, and management (Infrastructure as Code).
- Monitor, troubleshoot, and optimize system performance and reliability.
- Collaborate closely with research and product engineering teams to deliver foundational infrastructure.

Technical Skills Required:
- Deep expertise in Kubernetes architecture and operations.
- Strong understanding and hands-on experience with Envoy proxy.
- Experience deploying and managing Istio service mesh.
- Solid grasp of networking fundamentals (TCP/IP, DNS, load balancing, BGP).
- Proficiency in cloud platforms (Azure preferred, AWS/GCP relevant).
- Strong coding skills (e.g., Go, Python) for automation and infrastructure development.
- Experience with monitoring and observability tools (e.g., Prometheus, Grafana).

Ideal Candidate:
We are looking for engineers with significant experience in building and managing complex, distributed cloud infrastructure. You should be comfortable working with container orchestration technologies, service meshes, and advanced networking concepts. A passion for building reliable systems at scale and enabling cutting-edge AI research and products is essential. This role requires working ONSITE in either San Francisco or Seattle.
Post Date: April 21, 2025