Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access

Nazli Goharian, Philip Resnik, Andrew Yates, Molly Ireland, Kate Niederhoffer, Rebecca Resnik (Editors)

Anthology ID:
Association for Computational Linguistics
Bib Export formats:

pdf bib
Proceedings of the Seventh Workshop on Computational Linguistics and Clinical Psychology: Improving Access
Nazli Goharian | Philip Resnik | Andrew Yates | Molly Ireland | Kate Niederhoffer | Rebecca Resnik

pdf bib
Understanding who uses Reddit : Profiling individuals with a self-reported bipolar disorder diagnosisReddit: Profiling individuals with a self-reported bipolar disorder diagnosis
Glorianna Jagfeld | Fiona Lobban | Paul Rayson | Steven Jones

Recently, research on mental health conditions using public online data, including Reddit, has surged in NLP and health research but has not reported user characteristics, which are important to judge generalisability of findings. This paper shows how existing NLP methods can yield information on clinical, demographic, and identity characteristics of almost 20 K Reddit users who self-report a bipolar disorder diagnosis. This population consists of slightly more feminine- than masculine-gendered mainly young or middle-aged US-based adults who often report additional mental health diagnoses, which is compared with general Reddit statistics and epidemiological studies. Additionally, this paper carefully evaluates all methods and discusses ethical issues.

pdf bib
Individual Differences in the Movement-Mood Relationship in Digital Life Data
Glen Coppersmith | Alex Fine | Patrick Crutchley | Joshua Carroll

Our increasingly digitized lives generate troves of data that reflect our behavior, beliefs, mood, and wellbeing. Such digital life data provides crucial insight into the lives of patients outside the healthcare setting that has long been lacking, from a better understanding of mundane patterns of exercise and sleep routines to harbingers of emotional crisis. Moreover, information about individual differences and personalities is encoded in digital life data. In this paper we examine the relationship between mood and movement using linguistic and biometric data, respectively. Does increased physical activity (movement) have an effect on a person’s mood (or vice-versa)? We find that weak group-level relationships between movement and mood mask interesting and often strong relationships between the two for individuals within the group. We describe these individual differences, and argue that individual variability in the relationship between movement and mood is one of many such factors that ought be taken into account in wellbeing-focused apps and AI systems.

pdf bib
Suicide Risk Prediction by Tracking Self-Harm Aspects in Tweets : NUS-IDS at the CLPsych 2021 Shared TaskNUS-IDS at the CLPsych 2021 Shared Task
Sujatha Das Gollapalli | Guilherme Augusto Zagatti | See-Kiong Ng

We describe our system for identifying users at-risk for suicide based on their tweets developed for the CLPsych 2021 Shared Task. Based on research in mental health studies linking self-harm tendencies with suicide, in our system, we attempt to characterize self-harm aspects expressed in user tweets over a period of time. To this end, we design SHTM, a Self-Harm Topic Model that combines Latent Dirichlet Allocation with a self-harm dictionary for modeling daily tweets of users. Next, differences in moods and topics over time are captured as features to train a deep learning model for suicide prediction.

pdf bib
Using Psychologically-Informed Priors for Suicide Prediction in the CLPsych 2021 Shared TaskCLPsych 2021 Shared Task
Avi Gamoran | Yonatan Kaplan | Almog Simchon | Michael Gilead

This paper describes our approach to the CLPsych 2021 Shared Task, in which we aimed to predict suicide attempts based on Twitter feed data. We addressed this challenge by emphasizing reliance on prior domain knowledge. We engineered novel theory-driven features, and integrated prior knowledge with empirical evidence in a principled manner using Bayesian modeling. While this theory-guided approach increases bias and lowers accuracy on the training set, it was successful in preventing over-fitting. The models provided reasonable classification accuracy on unseen test data (0.68 = AUC= 0.84). Our approach may be particularly useful in prediction tasks trained on a relatively small data set.

pdf bib
Analysis of Behavior Classification in Motivational Interviewing
Leili Tavabi | Trang Tran | Kalin Stefanov | Brian Borsari | Joshua Woolley | Stefan Scherer | Mohammad Soleymani

Analysis of client and therapist behavior in counseling sessions can provide helpful insights for assessing the quality of the session and consequently, the client’s behavioral outcome. In this paper, we study the automatic classification of standardized behavior codes (annotations) used for assessment of psychotherapy sessions in Motivational Interviewing (MI). We develop models and examine the classification of client behaviors throughout MI sessions, comparing the performance by models trained on large pretrained embeddings (RoBERTa) versus interpretable and expert-selected features (LIWC). Our best performing model using the pretrained RoBERTa embeddings beats the baseline model, achieving an F1 score of 0.66 in the subject-independent 3-class classification. Through statistical analysis on the classification results, we identify prominent LIWC features that may not have been captured by the model using pretrained embeddings. Although classification using LIWC features underperforms RoBERTa, our findings motivate the future direction of incorporating auxiliary tasks in the classification of MI codes.

pdf bib
Automatic Detection and Prediction of Psychiatric Hospitalizations From Social Media Posts
Zhengping Jiang | Jonathan Zomick | Sarah Ita Levitan | Mark Serper | Julia Hirschberg

We address the problem of predicting psychiatric hospitalizations using linguistic features drawn from social media posts. We formulate this novel task and develop an approach to automatically extract time spans of self-reported psychiatric hospitalizations. Using this dataset, we build predictive models of psychiatric hospitalization, comparing feature sets, user vs. post classification, and comparing model performance using a varying time window of posts. Our best model achieves an F1 of.718 using 7 days of posts. Our results suggest that this is a useful framework for collecting hospitalization data, and that social media data can be leveraged to predict acute psychiatric crises before they occur, potentially saving lives and improving outcomes for individuals with mental illness.

pdf bib
Automated coherence measures fail to index thought disorder in individuals at risk for psychosis
Kasia Hitczenko | Henry Cowan | Vijay Mittal | Matthew Goldrick

Thought disorder linguistic disturbances including incoherence and derailment of topic is seen in individuals both with and at risk for psychosis. Methods from computational linguistics have increasingly sought to quantify thought disorder to detect group differences between clinical populations and healthy controls. While previous work has been quite successful at these classification tasks, the lack of interpretability of the computational metrics has made it unclear whether they are in fact measuring thought disorder. In this paper, we dive into these measures to try to better understand what they reflect. While we find group differences between at-risk and healthy control populations, we also find that the measures mostly do not correlate with existing measures of thought disorder symptoms (what they are intended to measure), but rather correlate with surface properties of the speech (e.g., sentence length) and sociodemographic properties of the speaker (e.g., race). These results highlight the importance of considering interpretability and front and center as the field continues to grow. Ethical use of computational measures like those studied here especially in the high-stakes context of clinical care requires us to devote substantial attention to potential biases in our measures.

pdf bib
Detecting Cognitive Distortions from Patient-Therapist Interactions
Sagarika Shreevastava | Peter Foltz

An important part of Cognitive Behavioral Therapy (CBT) is to recognize and restructure certain negative thinking patterns that are also known as cognitive distortions. The aim of this project is to detect these distortions using natural language processing. We compare and contrast different types of linguistic features as well as different classification algorithms and explore the limitations of applying these techniques on a small dataset. We find that pre-trained Sentence-BERT embeddings to train an SVM classifier yields the best results with an F1-score of 0.79. Lastly, we discuss how this work provides insights into the types of linguistic features that are inherent in cognitive distortions.

pdf bib
Evaluating Automatic Speech Recognition Quality and Its Impact on Counselor Utterance Coding
Do June Min | Verónica Pérez-Rosas | Rada Mihalcea

Automatic speech recognition (ASR) is a crucial step in many natural language processing (NLP) applications, as often available data consists mainly of raw speech. Since the result of the ASR step is considered as a meaningful, informative input to later steps in the NLP pipeline, it is important to understand the behavior and failure mode of this step. In this work, we analyze the quality of ASR in the psychotherapy domain, using motivational interviewing conversations between therapists and clients. We conduct domain agnostic and domain-relevant evaluations using standard evaluation metrics and also identify domain-relevant keywords in the ASR output. Moreover, we empirically study the effect of mixing ASR and manual data during the training of a downstream NLP model, and also demonstrate how additional local context can help alleviate the error introduced by noisy ASR transcripts.

pdf bib
Safeguarding against spurious AI-based predictions : The case of automated verbal memory assessmentAI-based predictions: The case of automated verbal memory assessment
Chelsea Chandler | Peter Foltz | Alex Cohen | Terje Holmlund | Brita Elvevåg

A growing amount of psychiatric research incorporates machine learning and natural language processing methods, however findings have yet to be translated into actual clinical decision support systems. Many of these studies are based on relatively small datasets in homogeneous populations, which has the associated risk that the models may not perform adequately on new data in real clinical practice. The nature of serious mental illness is that it is hard to define, hard to capture, and requires frequent monitoring, which leads to imperfect data where attribute and class noise are common. With the goal of an effective AI-mediated clinical decision support system, there must be computational safeguards placed on the models used in order to avoid spurious predictions and thus allow humans to review data in the settings where models are unstable or bound not to generalize. This paper describes two approaches to implementing safeguards : (1) the determination of cases in which models are unstable by means of attribute and class based outlier detection and (2) finding the extent to which models show inductive bias. These safeguards are illustrated in the automated scoring of a story recall task via natural language processing methods. With the integration of human-in-the-loop machine learning in the clinical implementation process, incorporating safeguards such as these into the models will offer patients increased protection from spurious predictions.

pdf bib
Towards Understanding the Role of Gender in Deploying Social Media-Based Mental Health Surveillance Models
Eli Sherman | Keith Harrigian | Carlos Aguirre | Mark Dredze

Spurred by advances in machine learning and natural language processing, developing social media-based mental health surveillance models has received substantial recent attention. For these models to be maximally useful, it is necessary to understand how they perform on various subgroups, especially those defined in terms of protected characteristics. In this paper we study the relationship between user demographics focusing on gender and depression. Considering a population of Reddit users with known genders and depression statuses, we analyze the degree to which depression predictions are subject to biases along gender lines using domain-informed classifiers. We then study our models’ parameters to gain qualitative insight into the differences in posting behavior across genders.