Background: Major depressive disorder (MDD) affects millions of people worldwide, but timely treatment is not often received owing in part to inaccurate subjective recall and variability in the symptom course. Objective and frequent MDD monitoring can improve subjective recall and help to guide treatment selection. Attempts have been made, with varying degrees of success, to explore the relationship between the measures of depression and passive digital phenotypes (features) extracted from smartphones and wearables devices to remotely and continuously monitor changes in symptomatology. However, a number of challenges exist for the analysis of these data. These include maintaining participant engagement over extended time periods and therefore understanding what constitutes an acceptable threshold of missing data; distinguishing between the cross-sectional and longitudinal relationships for different features to determine their utility in tracking within-individual longitudinal variation or screening individuals at high risk; and understanding the heterogeneity with which depression manifests itself in behavioral patterns quantified by the passive features. Objective: We aimed to address these 3 challenges to inform future work in stratified analyses. Methods: Using smartphone and wearable data collected from 479 participants with MDD, we extracted 21 features capturing mobility, sleep, and smartphone use. We investigated the impact of the number of days of available data on feature quality using the intraclass correlation coefficient and Bland-Altman analysis. We then examined the nature of the correlation between the 8-item Patient Health Questionnaire (PHQ-8) depression scale (measured every 14 days) and the features using the individual-mean correlation, repeated measures correlation, and linear mixed effects model. Furthermore, we stratified the participants based on their behavioral difference, quantified by the features, between periods of high (depression) and low (no depression) PHQ-8 scores using the Gaussian mixture model. Results: We demonstrated that at least 8 (range 2-12) days were needed for reliable calculation of most of the features in the 14-day time window. We observed that features such as sleep onset time correlated better with PHQ-8 scores cross-sectionally than longitudinally, whereas features such as wakefulness after sleep onset correlated well with PHQ-8 longitudinally but worse cross-sectionally. Finally, we found that participants could be separated into 3 distinct clusters according to their behavioral difference between periods of depression and periods of no depression. Conclusions: This work contributes to our understanding of how these mobile health–derived features are associated with depression symptom severity to inform future work in stratified analyses.

Original languageEnglish
Article numbere45233
JournalJournal of Medical Internet Research
Publication statusPublished - 2023


  • behavioral patterns
  • depression
  • digital phenotypes
  • missing data
  • mobile health
  • mobile phone
  • smartphones
  • wearable devices


Dive into the research topics of 'Challenges in Using mHealth Data From Smartphones and Wearable Devices to Predict Depression Symptom Severity: Retrospective Analysis'. Together they form a unique fingerprint.

Cite this