Hierarchical Attention Transfer Networks for Depression Assessment from Speech

Ziping Zhao, Zhongtian Bao, Zixing Zhang, Nicholas Cummins, Haishuai Wang, Bjoern Schuller

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

48 Citations (Scopus)

Abstract

A growing area of mental health research is the search for speech-based objective markers for conditions such as depression. However, when combined with machine learning, this search can be challenging due to a limited amount of annotated training data. In this paper, we propose a novel crosstask approach which transfers attention mechanisms from speech recognition to aid depression severity measurement. This transfer is applied in a two-level hierarchical network which mirrors the natural hierarchical structure of speech. Experiments based on the Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) dataset, as used in the 2017 Audio/Visual Emotion Challenge, demonstrate the effectiveness of our Hierarchical Attention Transfer Network. On the development set, the proposed approach achieves a root mean square error (RMSE) of 3.85, and a mean absolute error (MAE) of 2.99, on a Patient Health Questionnaire (PHQ)-8 scale [0], [24], while on the test set, it achieves an RMSE of 5.66 and an MAE of 4.28. To the best of our knowledge, these scores represent the best-known speech-only results to date on this corpus.

Original languageEnglish
Title of host publication2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PublisherIEEE
Pages7159-7163
Number of pages5
ISBN (Electronic)9781509066315
DOIs
Publication statusPublished - May 2020

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2020-May
ISSN (Print)1520-6149

Keywords

  • Attention Transfer
  • Depression
  • Hierarchical Attention
  • Monotonic Attention

Fingerprint

Dive into the research topics of 'Hierarchical Attention Transfer Networks for Depression Assessment from Speech'. Together they form a unique fingerprint.

Cite this