Conversational Interaction Specialist

Fauna Robotics (Headquarters: New York, NY)

Location: New York, NY   |   Full-Time   |   $100,000 - $200,000
Conversational AI Speech Recognition ASR Text-to-Speech TTS Voice Interaction HCI ML Audio Signal Processing Python C++ Noise Reduction Echo Cancellation LLM Robotics ROS IoT UX AI Engineer Staff Engineer
Company: Fauna Robotics is a New York-based startup devoted to the mission of developing safe, intelligent, human-centric robots that live, work, and play by our sides. We offer equity ownership, health benefits (Medical, Dental, Vision), and the opportunity to work on groundbreaking robotics technology in a collaborative and innovative environment.

Role: We are looking for a talented Conversational Interaction Specialist to deliver cutting-edge voice-driven interactions on Fauna’s robots. This role sits at the intersection of machine learning, human-computer interaction, and software engineering, requiring both deep technical expertise in speech technologies and a strong sense of user experience design. Building world-class speech interactions for robots demands both rapid prototyping and a mastery of state-of-the-art techniques. You will work with automatic speech recognition (ASR), text-to-speech (TTS), and conversational AI systems to create seamless, expressive, and intelligent voice-driven interfaces on our robotic platform.

Key Responsibilities:
- Design and build voice-driven experiences that enable natural, engaging, and intuitive human-machine interactions.
- Prototype speech-first interactive experiences, integrating ASR, TTS, and conversation management systems.
- Research and implement state-of-the-art voice and conversational AI techniques, working at the intersection of machine learning and human-computer interaction.
- Collaborate with engineers, designers, and researchers to improve speech UX, including latency and accuracy.
- Evaluate and improve the usability and performance of speech-driven systems through user testing and iterative development.

Required Skills & Qualifications:
- Work Experience: 4+ years of professional software development experience, or PhD-level research experience.
- Education: Bachelor’s, Master’s, or PhD in Computer Science, Computational Linguistics, or a related field – or equivalent practical experience.
- Technical Expertise: Expertise in developing and tuning ASR and TTS models in real-time applications. Ability to develop and characterize processing techniques for audio signals, such as noise reduction or echo cancellation. Deep understanding of design factors for conversational systems, including turn-taking, prosody, and intent recognition. Strong programming skills in Python, C++, or similar, with experience in signal processing and/or machine learning.

Nice-to-have Skills:
- Expertise in training end-to-end ML models for speech.
- Familiarity with large language models (LLMs) for conversational AI.
- Experience with robotics, ROS/ROS2, IoT, or other physical computing platforms (microcomputers, microcontrollers).
- Experience conducting research evaluations with human participants.
Post Date: April 17, 2025