Speech acoustic modelling from raw phase spectrum

Erfan Loweimi, Zoran Cvetkovic, Peter Bell, Steve Renals

Research output: Contribution to journalConference paperpeer-review

8 Citations (Scopus)
18 Downloads (Pure)

Abstract

Magnitude spectrum-based features are the most widely employed front-ends for acoustic modelling in automatic speech recognition (ASR) systems. In this paper, we investigate the possibility and efficacy of acoustic modelling using the raw short-time phase spectrum. In particular, we study the usefulness of the raw wrapped, unwrapped and minimum-phase phase spectra as well as the phase of the source and filter components for acoustic modelling. Furthermore, we explore the effectiveness of simultaneous deployment of the vocal tract and excitation components of the raw phase spectrum using multi-head CNNs and investigate multiple information fusion schemes. This paves the way for developing an effective phase-based multi-stream information processing systems for speech recognition. The performance, even for wrapped phase with a noise-like shape, is comparable to or better than the magnitude-based classic features, and up to 4.8% WER has been achieved in the WSJ (Eval-92) task.

Original languageEnglish
Pages (from-to)6738-6742
Number of pages5
JournalICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2021-June
DOIs
Publication statusPublished - 2021
Event2021 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021 - Virtual, Toronto, Canada
Duration: 6 Jun 202111 Jun 2021

Keywords

  • Acoustic modelling
  • ASR
  • Multi-head CNNs
  • Phase-based source-filter separation
  • Raw phase spectrum

Fingerprint

Dive into the research topics of 'Speech acoustic modelling from raw phase spectrum'. Together they form a unique fingerprint.

Cite this