Journaling experiments enjoy a venerable place in the history of psychiatric research. The insight they provide into a patient’s life outside of therapy is rare and critical, especially considering that the crux of mental health diagnosis and treatment is related to the patient’s integration with society. Digitally mediated communication, social networks, and smartphones open the aperture of information available from the patient’s life between therapy sessions. Data can now be collected at a consumer price point — wearables and smartphone apps are able to gather data relevant to mental health.1
Perhaps the most interesting data, though, are the abundance of language data generated by the patient and available in a form suitable for automated analysis. We are experiencing the largest journaling experiment in the ubiquity of social media. The nexus of computational linguistics and clinical psychology, in particular, has been the focus of a nascent but energized community that has generated interesting findings, a fraction of which are detailed here. For more on Computational Linguistics and Clinical Psychology,2,3 see http://clpsych.org.
Data have not yet had a significant impact on psychiatry, in part because machine learning and statistics require many examples of a phenomenon in order to find correlations. Here, the type of data to which we refer is a person’s social media data paired with their diagnosis, which facilitates finding correlations between social media usage and mental health. Generally, hundreds of these ”labeled pairs” are required for algorithms to detect weakly associated correlations with a diagnosis. Each one of these correlates on its own is weak, but in aggregate they produce a powerful signal, much like the strength of a braided rope.
In order to find sufficient data, we turned to an unconventional source: publicly available self-report data. We then conducted validation experiments to demonstrate the many ways in which these data lined up with more traditional data-gathering methods, such as administering a depression inventory paired with a writing prompt or social media data.4,5