Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs

Zhengjun Yue, Erfan Loweimi, Heidi Christensen, Jon Barker, Zoran Cvetkovic

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

212 Downloads (Pure)

Abstract

Raw waveform acoustic modelling has recently received in- creasing attention. Compared with the task-blind hand-crafted features which may discard useful information, representations directly learned from the raw waveform are task-specific and potentially include all task-relevant information. In the con- text of automatic dysarthric speech recognition, raw waveform acoustic modelling is under-explored owing to data scarcity. Parametric CNNs can compensate for this problem owing to having notably fewer parameters and requiring less training data in comparison with conventional non-parametric CNNs. In this paper, we explore the usefulness of raw waveform acous- tic modelling using various parametric CNNs for ADSR. Ad- ditionally, we investigate the properties of the learned filters and monitor the training dynamics of various models. Fur- thermore, we study the effectiveness of data augmentation and multi-stream acoustic modelling through combining the non-parametric and parametric CNNs fed by hand-crafted and raw waveform features. Experimental results on the widely- used TORGO dysarthric database show that the parametric CNNs significantly outperform the non-parametric CNNs on dysarthric speech (up to 2.7% and 1.8% absolute error reduc- tion), reaching up to 35.9% and 11.9% WERs for dysarthric and typical speech respectively. Multi-streaming acoustic mod- elling further improves the performance resulting in up to 33.2%and 10.3% WERs for dysarthric and typical speech, re- spectively.
Original languageEnglish
Title of host publicationProceedings of INTERSPEECH 2022
PublisherISCA-INST SPEECH COMMUNICATION ASSOC
Number of pages5
Publication statusPublished - 2022

Fingerprint

Dive into the research topics of 'Dysarthric Speech Recognition From Raw Waveform with Parametric CNNs'. Together they form a unique fingerprint.

Cite this