Dr. Dimitri Palaz

Researcher in Speech Technology

Principal Research Scientist at Speech Graphics

Game Designer

News and announcements

Awards

September 2023

I am honored to have received two prestigious best paper awards recently, both for the same paper. Check out the Awards page for more info. The paper can be found here.

Blog post: Interspeech 2023

February 2024

I recently wrote a blog post about my latest paper published at Interspeech 2023, which study the use of LLMs for representing emotion label in the context of emotion recognition in speech audio. Check the blog post here and the paper here.

GameFace Interview

November 2023

I was recently interviewed about my journey in speech technology, machine learning and the video game industry, check it out here.

About me

I am a research scientist with 14 years of experience in speech technologies, deep neural networks and representation learning. My research is focused on detecting patterns in speech, including phoneme recognition, continuous speech recognition and voice activity detection. I've also worked on paralinguistic detection tasks, specifically speech emotion recognition and non-verbal vocalisation detection, focusing on the generalisation capability of such models. Lately I'm interested in self-supervision and acoustic-to-articulatory inversion.

I am Principal Research Scientist and Tech Lead at Speech Graphics since 2017. In my research, I work with deep learning models to improve audio-driven facial animation. My role also implies strategic planning, project management, MLOps, product management, software development (C++) and infrastructure.

I obtained a Ph.D. from EPFL in Switzerland in 2016, where I worked on applying deep learning methods to speech recognition, focusing on using the raw speech signal as input. My doctoral studies were done at Idiap Research Institute under the supervision of Ronan Collobert, Mathew Magimai Doss and Hervé Bourlard.

Research interests

Deep learning and representation learning
Pattern recognition in speech
Generalization capability and biases of deep learning models
Speech processing
Automatic Speech Recognition
Signal processing