Excited to grow your career?
We value our talented employees, and whenever possible strive to help one of our associates grow professionally before recruiting new talent to our open positions. If you think the open position you see is right for you, we encourage you to apply!
Our people make all the difference in our success.
What you can expect
As an Audio AI Engineer, you will research and develop algorithms for accent conversion, voice conversion, speech synthesis, and speech recognition on low-latency streaming architectures. You’ll prototype and refine end-to-end audio models that enhance intelligibility and naturalness while maintaining speaker identity. Working closely with product and platform teams, you’ll help bring these models into real-time communication systems. You will also evaluate and optimize model performance across dimensions such as quality, latency, and scalability. Staying current with advances in speech processing, you’ll contribute to innovation through patents and internal knowledge sharing.
About the Team
Zoom's Audio team develops real-time audio features based on AI algorithms. Members of the team are spread worldwide, including the U.S., China and Singapore.
What we’re looking for
-
Hold a PhD or equivalent experience in a relevant field in Streaming, Accent Conversion, Voice Conversion, TTS, or ASR. More than 2 years of relevant industry experience considered a plus.
-
Show proficiency in deep learning frameworks like PyTorch or TensorFlow.
-
Demonstrate effective programming skills in Python, C/C++, or similar languages.
-
Have an understanding of sequence modeling architectures (Transformers, RNNs, diffusion models, or conformers).
-
Demonstrate experience developing and deploying low-latency, real-time speech or audio models with streaming architectures and optimized pipelines.
-
Show familiarity with model compression and acceleration techniques, including quantization, pruning, and distillation.
-
Exhibit experience working with real-time audio systems in networked communication environments.
-
Publish in top-tier conferences such as ICASSP, INTERSPEECH, NeurIPS, and ICLR.











