Raw source and filter modelling for dysarthric speech recognition

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

155 Downloads (Pure)

Abstract

Acoustic modelling for automatic dysarthric speech recogni-tion (ADSR) is a challenging task. Data deficiency is a majorproblem and substantial differences between the typical anddysarthric speech complicates transfer learning. In this paper,we build acoustic models using the raw magnitude spectra ofthe source and filter components. The proposed multi-streammodel consists of convolutional and recurrent layers. It allowsfor fusing the vocal tract and excitation components at differ-ent levels of abstraction and after per-stream pre-processing.We show that such a multi-stream processing leverages thesetwo information streams and helps the model towards normal-ising the speaker attributes and speaking style. This poten-tially leads to better handling of the dysarthric speech with alarge inter-speaker and intra-speaker variability. We comparethe proposed system with various features, study the train-ing dynamics, explore usefulness of the data augmentationand provide interpretation for the learned convolutional fil-ters. On the widely used TORGO dysarthric speech corpus,the proposed approach results in up to 1.7% absolute WER re-duction for dysarthric speech compared with the MFCC base-line. Our best model reaches up to 40.6% and 11.8% WERfor dysarthric and typical speech, respectively.
Original languageEnglish
Title of host publicationICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing
PublisherIEEE
Number of pages5
Publication statusAccepted/In press - 7 May 2022

Fingerprint

Dive into the research topics of 'Raw source and filter modelling for dysarthric speech recognition'. Together they form a unique fingerprint.

Cite this