NTR organizes and hosts scientific webinars on neural networks and invites speakers from all over the world to present their recent work at the webinars.
On March 2 Pavel Denisov, Institute for Natural Language Processing, University of Stuttgart, Germany, led a technical Zoom webinar on End-to-end Spoken Language Understanding model using cross-modal Teacher-Student learning.

About the webinar:
A Spoken Language Understanding (SLU) task is traditionally performed in two steps:
- a speech recognition system transforms speech to text;
- the Natural Language Understanding (NLU) model extracts the meaning from that text.
These two steps can be replaced with a single end-to-end neural network based model that has a number of advantages, but also a problem consisting of the relatively small amount of annotated training data, which can be alleviated by various transfer learning methods.
Pavel talked about approaching end-to-end SLU modeling based on the parameters transfer from speech recognition and NLU models followed by fine-tuning using cross-modal Teacher-Student learning.
He said they utilized Sentence-BERT semantic sentence embedding model as a Teacher and fine-tune their SLU model to produce similar semantic representation of speech recordings.
Their experiments with Dialog Act classification and Intent recognition showed that the accuracy of end-to-end model is comparable to the accuracy of the traditional two-step approach, without training on any speech samples labeled for the downstream task. Also that the results of the end-to-end model can be improved by fine-tuning on a few labeled speech samples per class.
Materials available:
Webinar presentation.
Moderator and contact:
NTR CEO Nick Mikhailovsky: nickm@ntrlab.com.