Forward Deployed Engineer

Location: London/SF/NYC or remote | Full-Time
GPU Compute Networking DevOps SRE SWE Infrastructure Full Stack Engineer Back End Engineer Staff Engineer
**About Fluidstack:** Fluidstack builds and operates GPU supercomputers for top AI labs, governments, and enterprises. Our customers include Mistral, Poolside, Black Forest Labs, Meta, and more. We specialize in deploying clusters of 1,000+ GPUs and optimizing compute, storage, and networking infrastructure for demanding workloads. Fluidstack combines expertise in high-performance computing with modern cloud technologies to deliver scalable GPU solutions for research, enterprise, and government clients across industries. We operate globally with offices in London, San Francisco, New York City, and multiple remote locations. Our team combines skills from data center operations, software engineering, networking, and customer success to ensure reliable and efficient GPU infrastructure for our diverse clientele.

**About The Role:** As a Forward Deployed Engineer at Fluidstack, you'll be on the front lines of deploying and operating our GPU clusters for customers. This role involves working directly with clients to ensure their infrastructure is deployed correctly and efficiently. You'll validate the performance and correctness of our compute, storage, and networking systems, collaborate with providers to optimize these subsystems, and build internal tooling to automate deployments and improve reliability.

**Responsibilities:**
- Deploying and managing clusters of 1,000+ GPUs for various customers
- Validating and optimizing performance of compute, storage, and networking infrastructure
- Migrating petabytes of data from public cloud platforms to local storage efficiently and cost-effectively
- Debugging issues at any level of the stack - from hardware failures ("this server's fan is blocked by a plastic bag") to optimizing S3 dataloaders from buckets in different regions
- Building internal tooling to decrease deployment time and increase cluster reliability
- Working with providers to optimize underlying subsystems
- Collaborating with customers to understand their needs and deliver tailored solutions
- Continuously improving infrastructure automation and monitoring systems

**Requirements:**
- Proven ability to learn quickly and master new technologies
- Strong problem-solving skills with a focus on customer success
- Experience in software engineering, systems administration, or related fields
- Willingness to work hard and tackle challenging technical problems
- Excellent communication and collaboration skills
- Ability to debug issues at any level of the stack
- Not required to have prior experience with GPUs or AI/ML

**What We Offer:**
- Opportunity to work on cutting-edge GPU infrastructure
- Competitive salary and benefits package
- Remote-first work environment with flexible locations
- Collaborative team with diverse expertise
- Impactful work that powers AI research and enterprise applications
- Opportunities for professional growth and learning
Post Date: June 20, 2025