Location: BOSTON or NYC (Hybrid)   |   Full-Time
MLE Machine Learning Engineer MLOps Infrastructure Staff Engineer Python GCP Kubernetes Docker LLM AI Healthcare Boston NYC Hybrid Vertex AI AI Engineer Data Engineer Staff Engineer
Company: Layer Health is building an AI layer for healthcare, founded by MIT/Harvard researchers, focusing on synthesizing medical records using LLMs to reduce friction and improve patient care. We've raised a $21M Series A.

Role: Staff+ MLE/Infra Engineer

Location: Hybrid in Boston (Back Bay) or NYC (Grand Central). No remote option available.

Join our ~20 person team as a senior engineer focused on the infrastructure supporting our machine learning models, particularly Large Language Models (LLMs). You will design, build, and manage the systems for training, evaluating, deploying, and monitoring ML models at scale on GCP, enabling our research and product teams to iterate quickly and reliably.

Responsibilities:
* Design, build, and maintain scalable infrastructure for ML model training, deployment, and inference (MLOps).
* Optimize ML workflows for performance, cost, and reliability on GCP.
* Develop tools and automation for model versioning, experiment tracking, and monitoring.
* Collaborate with Research Scientists and MLEs to productionize models, including LLMs.
* Work with backend and data platform teams to ensure seamless integration of ML models.
* Stay current with the latest advancements in MLOps, LLM infrastructure, and cloud technologies.
* Ensure security and compliance of ML systems handling sensitive medical data.

Technical Skills:
* Strong software engineering background, proficient in Python.
* Proven experience building and managing ML infrastructure (MLOps).
* Deep knowledge of cloud platforms, preferably GCP (AI Platform, Kubernetes Engine, Vertex AI).
* Experience with containerization (Docker, Kubernetes) and orchestration.
* Familiarity with ML frameworks (e.g., PyTorch, TensorFlow) and LLM ecosystem tools.
* Experience with data processing pipelines and tools.
* Understanding of infrastructure-as-code (Terraform) and CI/CD practices.

Ideal Candidate:
* Minimum 4 years (Staff level implies significantly more experience is likely preferred) of professional experience in software engineering, with a focus on ML infrastructure or MLOps.
* Experience deploying and scaling machine learning models, especially LLMs, in production.
* Strong problem-solving skills applied to complex infrastructure and ML challenges.
* Excellent communication and collaboration skills.
* Passion for applying AI/ML to solve real-world healthcare problems.
Post Date: May 21, 2025