We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Infrastructure Engineer
Location: san francisco
|
Full-Time
|
$200,000 -
$300,000
infra
infrastructure
kubernetes
docker
aws
gcp
azure
terraform
ci/cd
monitoring
mlops
music generation
ai
generative ai
equity
Back End Engineer
Data Engineer
Riffusion is developing frontier music generation models and building the most powerful and fun music creation product in the world. We are pioneers in the generative music space, pushing the boundaries of what's possible with AI and music. We are seeking a world-class Infrastructure Engineer to join our team in San Francisco (in person). This role is critical to our success, as you will be responsible for designing, building, scaling, and maintaining the core infrastructure that powers our large-scale model training and product deployment. Responsibilities: - Design, implement, and manage robust, scalable, and cost-effective cloud infrastructure (primarily on AWS, GCP, or Azure). - Build, operate, and optimize container orchestration systems (Kubernetes). - Develop and maintain CI/CD pipelines for automated testing, integration, and deployment of our models and applications. - Implement and manage comprehensive monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, ELK stack, Datadog) to ensure high availability, performance, and reliability. - Manage infrastructure for GPU-intensive workloads, ensuring efficient resource utilization for model training and inference. - Collaborate closely with machine learning researchers and software engineers to understand infrastructure requirements and provide reliable solutions. - Ensure security best practices are implemented across the infrastructure. - Automate infrastructure provisioning and management using tools like Terraform or Pulumi. Technical Skills Required: - Deep expertise with at least one major cloud provider (AWS, GCP, Azure). - Strong proficiency with containerization (Docker) and orchestration (Kubernetes). - Proven experience with Infrastructure as Code (IaC) tools like Terraform or Pulumi. - Solid understanding of networking concepts (VPCs, load balancing, DNS, firewalls). - Experience building and maintaining CI/CD pipelines (e.g., Jenkins, GitLab CI, GitHub Actions). - Expertise in monitoring, logging, and alerting systems and practices. - Scripting skills (e.g., Python, Bash). - Familiarity with managing infrastructure for ML/AI workloads, including GPU management, is highly desirable. Ideal Candidate: - You are a highly skilled infrastructure expert passionate about building robust, scalable systems for demanding AI/ML workloads. - You thrive in a fast-paced startup environment and are comfortable with ambiguity. - You possess strong problem-solving skills and a proactive approach to identifying and resolving infrastructure challenges. - You have excellent communication skills and can collaborate effectively with technical teams. - You are excited about the potential of generative AI in music creation.
Post Date:
April 17, 2025