NTR webinar: End-to-end Spoken Language Understanding model using cross-modal Teacher-Student learning

NTR organizes and hosts scientific webinars on neural networks and invites speakers from all over the world to present their recent work at the webinars.

On March 2 Pavel Denisov, Institute for Natural Language Processing, University of Stuttgart, Germany, led a technical Zoom webinar on End-to-end Spoken Language Understanding model using cross-modal Teacher-Student learning.

About the webinar: 

A Spoken Language Understanding (SLU) task is traditionally performed in two steps: 

  1. a speech recognition system transforms speech to text;
  2. the Natural Language Understanding (NLU) model extracts the meaning from that text. 

These two steps can be replaced with a single end-to-end neural network based model that has a number of advantages, but also a problem consisting of the relatively small amount of annotated training data, which can be alleviated by various transfer learning methods. 

Pavel talked about approaching end-to-end SLU modeling based on the parameters transfer from speech recognition and NLU models followed by fine-tuning using cross-modal Teacher-Student learning. 

He said they utilized Sentence-BERT semantic sentence embedding model as a Teacher and fine-tune their SLU model to produce similar semantic representation of speech recordings. 

Their experiments with Dialog Act classification and Intent recognition showed that the accuracy of end-to-end model is comparable to the accuracy of the traditional two-step approach, without training on any speech samples labeled for the downstream task. Also that the results of the end-to-end model can be improved by fine-tuning on a few labeled speech samples per class.

Materials available:

Webinar presentation.

Moderator and contact:

NTR CEO Nick Mikhailovsky: nickm@ntrlab.com.

Leave a Reply

Your email address will not be published. Required fields are marked *