Staff Infrastructure Engineer

LiveKit (Headquarters: Remote)

Location: Remote, U.S   |   Full-Time   |   $120,000 - $250,000
Infrastructure SRE Site Reliability Engineer Golang Kubernetes distributed systems automation configuration management Linux networking CNI latency sensitive workloads monitoring performance reliability oncall incident management webRTC Real-time communications Back End Engineer Staff Engineer
Company: LiveKit builds open-source APIs to power the future of computing. We are a company of engineers building software stacks for other engineers. Passion for building something truly impactful. Remote company, first principles, global presence! LiveKit is on a mission to help developers create and scale real-time experiences.

Role: We are hiring a Site Reliability Engineer to help manage and scale the core components of the LiveKit infrastructure. Visibility, performance, and reliability of our globally distributed architecture is critical and a top priority.

Responsibilities:
- Build and own the foundational infrastructure that our products run upon.
- Work directly on our products' golang code base to implement SRE related objectives.
- Take a data driven approach to quantifying system performance and reliability and use it to drive project priorities.
- Oncall participation including leading incident management for complex situations.
- Work on automation and advanced configuration management to allow our team to manage large numbers of clusters distributed across the world running various products.
- Work with infrastructure vendors when their solutions aren't meeting our real time performance and reliability needs.

Technical Skills:
- Experience managing complex multi-region distributed systems running on top of container orchestration systems like Kubernetes.
- Experience with Linux networking, overlay networks, and Kubernetes CNIs.
- Low level knowledge for troubleshooting and tuning latency sensitive workloads.
- Golang proficiency.

Ideal Candidate:
- A balance of strengths in both software engineering and large scale system administration.
- Passionate about maintainability and keeping system complexity at bay, but able to balance this with meeting launch deadlines.
- Incident management training and experience being an Incident Commander (Bonus).
Post Date: April 24, 2025