4.5 Article

Analysis of acoustic space variability in speech affected by depression

Journal

SPEECH COMMUNICATION
Volume 75, Issue -, Pages 27-49

Publisher

ELSEVIER
DOI: 10.1016/j.specom.2015.09.003

Keywords

Depression; Objective diagnosis; Gaussian Mixture Models; Maximum a Posteriori adaption; Acoustic variability; Acoustic volume

Funding

  1. National ICT Australia - Australian Government as represented by the Department of Broadband, Communication and the Digital Economy
  2. Australian Research Council through the ICT Centre of Excellence program
  3. Australian Research Council through Discovery Project [DP110105240, DP120100641]
  4. German Research Foundation [KR3698/4-1]
  5. United States National Institute of Mental Health [R43MH068950]

Ask authors/readers for more resources

has resulted in spectral and energy based features being a key component in many speech-based classification and prediction systems. However there has been no in-depth investigation into understanding how acoustic models of spectral features are affected by depression. This paper investigates the hypothesis that the effects of depression in speech manifest as a reduction in the spread of phonetic events in acoustic space as modelled by Gaussian Mixture Models (GMM) in combination with Mel Frequency Cepstral Coefficients (mFcc). Our investigation uses three measures of acoustic variability: Average Weighted Variance (AWV), Acoustic Movement (AM) and Acoustic Volume, which attempt to model depression specific acoustic variations (AWV and Acoustic Volume), or the trajectory of a speech in the acoustic space (AM). Within our analysis we present the Probabilistic Acoustic Volume (PAV) a novel method for robustly estimating Acoustic Volume using a Monte Carlo sampling of the feature distribution being modelled. We show that using an array of PAV points we gain insights into how the concentration of the feature vectors in the feature space changes with depression. Key results found on two commonly used depression corpora consistently indicate that as a speaker's level of depression increases there are statistically significantly reductions in both AWV (-0.44 <= r(s)<= 0.18 with p<.05) and AM (-0.26 <= r(s) <= 0.19 with p < .05) values, indicating a decrease in localised acoustic variance and smoothing in acoustic trajectory respectively. Further there are also statistically significant reductions (-0.32 <= r(s) <= -0.20 with p < .05) in Acoustic Volume measures and strong statistical evidence (-0.48 <= r(s), <= 0.23 with p < .05) that the MFCC feature space becomes more concentrated. Quantifying these effects is expected to be a key step towards building an objective classification or prediction system which is robust to many of the unwanted in terms of depression analysis sources of variability modulated into a speech signal. (C) 2015 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available