Vocal differences between healthy and depressed people are present in 12 scenarios, with acoustic features such as loudness having the potential to indicate depression through voice analysis in both situation-specific and cross-situational patterns, according to a study published in BMC Psychiatry.

Although the presence of abnormalities in vocal expression has frequently been reported during depressive episodes, not much is known about the possible situational influence on these abnormalities. This study was part of a clinical research project investigating the behavioral and biological indicators of major depressive disorder (MDD). Participants with MDD were recruited from Beijing Anding Hospitals of Capital Medical University in China (n=47) along with 57 healthy controls to compare vocal differences under a variety of situations and determine whether vocal abnormalities in depression are only present in special circumstances. Four of the tasks compared the negative, positive, and neutral voice expressions of depressed and healthy people. Multiple analysis of covariance (MANCOVA) was used to evaluate the main effects of depression on acoustic features compared with the healthy controls. The significance of these features was evaluated for both the magnitude of effect size and statistical significance.

In this study, task and emotion were both regarded as 2 situational conditions forming diverse speech scenarios. Four tasks were designed based on answering questions, watching videos, describing pictures, and reading text. Each task involved 3 emotional states: negative (sadness), positive (happy), and neutral, resulting in a controlled experiment involving 12 speech scenarios (4 tasks × 3 emotions). OpenSMILE software was used to extract 25 acoustic features: loudness, fundamental frequency (F0), zero-crossing rate, F0 envelope, 12 Mel-frequency cepstrum coefficients (MFCCs), voicing probability, and 8 line spectral pairs (LSPs).

MANCOVA showed significant between-group differences for all 12 scenarios. Three scenarios in particular — loudness, MFCC5, and MFCC7 — were consistently different between both the depressed and nondepressed participants, with a large effect magnitude, where partial square of Eta (ηp2) 0.14 was considered large and ηp2 0.01 and ηp2 0.06 were considered small and moderate, respectively. No matter which task or emotion was involved in the scenario, the acoustic features of loudness, MFCC5, and MFCC7 were all higher and consistent in healthy participants compared with depressed participants.

Study limitations included the small sample size, a high level of education in the study participants, and a potential lack of generalizability owing to the language used. Despite these limitations, the study investigators concluded, “Our results pointed out that the vocal differences between depressed and healthy people follow both cross-situational and situation-specific patterns, and loudness, MFCC5 and MFCC7 are effective indicators that could be utilized for identifying depression. These findings supported that there are no special requirements on testing environment while identifying depression via voice analysis, but it is better to utilize loudness, MFCC5 and MFCC7 for modelling.”


Wang J, Zhang L, Liu T, Pan W, Hu B, Zhu T. Acoustic differences between healthy and depressed people: a cross-situation studyBMC Psychiatry. 2019;19(1):300.