NTR Zoomed a webinar about audio processing

On May 26 Dmitry Menshikov, a Data Scientist at NTR, lead the webinar: “Audio preprocessing for speech recognition systems.”

PPT slide: Graphic illustration of the Fourier transform

The webinar was attended by employees, partners and student interns.

Sound modeling is an essential building block in human speech processing systems such as speech-to-text, speaker diarization, etc. The majority of systems are not designed to deal with ‘raw’ audio signals and require prior feature-extraction, i.e., a way of representing the sound. 

The current industry standard is spectral analysis techniques consisting of representing “raw” audio signal as Filterbanks and Mel Frequency Cepstral Coefficients (MFCC). 

The webinar covered some of the math behind that technique and included step-by-step examples in Jupyter Notebook. We also discussed an alternative method, based on the human anatomy and physical processes behind the speech, called Linear Predictive Coding (LPC) that could potentially compete with the industry standard.

Leave a Reply

Your email address will not be published. Required fields are marked *