Location: San Francisco, CA   |   Full-Time
AI ML Inference CUDA PyTorch Python Distributed Systems AI Engineer

About Krea: Krea is an innovative AI startup specializing in browser-based creative tools powered by advanced AI systems. We focus on delivering unparalleled personalization, controllability, and aesthetic capabilities through state-of-the-art AI models. Our clients include industry giants like Pixar, Shopify, Fox News, and Amazon Studios. We’re backed by prominent investors including a cofounder of Meta/Facebook AI Research and a founding member of OpenAI.

About the Role: As an AI/ML Inference Engineer at Krea, you’ll play a critical role in deploying and optimizing cutting-edge AI models for our creative tools. You’ll work on scaling inference across our distributed systems, ensuring our AI capabilities deliver exceptional performance and reliability.

Key Responsibilities:

  • Design, develop, and deploy scalable AI inference systems
  • Optimize ML models for real-time performance in browser environments
  • Collaborate with research teams to translate cutting-edge models into production-ready solutions
  • Implement monitoring and maintenance for AI services
  • Troubleshoot and resolve technical issues in AI systems

Required Skills:

  • Proficiency in Python, C++, CUDA, and PyTorch
  • Experience with model optimization and quantization techniques
  • Understanding of distributed systems and GPU acceleration
  • Strong background in machine learning inference deployment
  • Excellent problem-solving skills with a focus on system performance
  • Experience working with cloud platforms (AWS, GCP, Azure)

Ideal Candidate:

  • 5+ years of experience in AI/ML engineering
  • Proven track record in deploying ML models at scale
  • Passion for creative applications of AI
  • Ability to work in fast-paced startup environment
  • Strong communication skills for cross-functional collaboration
  • Experience with containerization and orchestration tools
Post Date: July 23, 2025