We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
MLOps / Infrastructure Engineer
Location: Remote (US)
|
Full-Time
Python
LLMs
transformers
cybersec
MLOps
Infrastructure
DevOps
CI/CD
Kubernetes
Docker
Cloud
AWS
GCP
Azure
Monitoring
AI Engineer
Data Engineer
**About 10a Labs:** 10a Labs is building next-gen tools for LLM safety, red teaming, and AI threat detection. We are a terrific team with an incredible mission focused on making AI safer. **Role: MLOps / Infrastructure Engineer** We are looking for an MLOps / Infrastructure Engineer to build and manage the platform that enables the efficient training, deployment, and operation of our machine learning models, especially LLMs. You will bridge the gap between ML development and production operations, ensuring our systems are reliable, scalable, and automated. **Responsibilities:** * Design, build, and maintain scalable infrastructure for ML model training, evaluation, and deployment. * Develop and manage CI/CD pipelines for ML models and applications. * Implement monitoring, logging, and alerting for ML systems in production. * Optimize ML workflows for performance, cost, and reliability. * Manage cloud infrastructure (e.g., AWS, GCP, Azure) and container orchestration (e.g., Kubernetes). * Collaborate with ML engineers and data scientists to streamline the path from research to production. * Automate infrastructure provisioning and management (Infrastructure as Code). * Ensure the security and compliance of the ML infrastructure. **Ideal Candidate:** * Strong experience in MLOps, DevOps, or Infrastructure Engineering, preferably in an ML context. * Proficiency in scripting languages like Python or Bash. * Hands-on experience with cloud platforms (AWS, GCP, or Azure). * Experience with containerization (Docker) and orchestration (Kubernetes). * Familiarity with CI/CD tools (e.g., Jenkins, GitLab CI, Argo CD). * Experience with ML frameworks (e.g., PyTorch, TensorFlow) and MLOps tools (e.g., MLflow, Kubeflow). * Understanding of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack). * Interest in LLMs, AI safety, or cybersecurity is a plus. * Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
Post Date:
April 15, 2025