NTR webinar: End-to-end Spoken Language Understanding model using cross-modal Teacher-Student learning

NTR organizes and hosts technical webinars about neural networks and invites speakers from all over the world to lead them. Webinars are held on Tuesdays and most are in Russian. 

On March 2 Pavel Denisov, Institute for Natural Language Processing, University of Stuttgart, Germany, led a technical Zoom webinar on End-to-end Spoken Language Understanding model using cross-modal Teacher-Student learning.

About the webinar: 

A Spoken Language Understanding (SLU) task is traditionally performed in two steps: 

  1. a speech recognition system transforms speech to text;
  2. the Natural Language Understanding (NLU) model extracts the meaning from that text. 

These two steps can be replaced with a single end-to-end neural network based model that has a number of advantages, but also a problem consisting of the relatively small amount of annotated training data, which can be alleviated by various transfer learning methods. 

Pavel talked about approaching end-to-end SLU modeling based on the parameters transfer from speech recognition and NLU models followed by fine-tuning using cross-modal Teacher-Student learning. 

He said they utilized Sentence-BERT semantic sentence embedding model as a Teacher and fine-tune their SLU model to produce similar semantic representation of speech recordings. 

Their experiments with Dialog Act classification and Intent recognition showed that the accuracy of end-to-end model is comparable to the accuracy of the traditional two-step approach, without training on any speech samples labeled for the downstream task. Also that the results of the end-to-end model can be improved by fine-tuning on a few labeled speech samples per class.

Materials available:

Moderator and contact:

NTR CEO Nick Mikhailovsky: nickm@ntrlab.com.

Leave a Reply

Your email address will not be published. Required fields are marked *