Skip to main navigation Skip to search Skip to main content

It Is the Journey, Not the Destination: Moving From End Points to Trajectories When Assessing Chatbot Mental Health Safety

  • Department of Psychosis Studies
  • King's College London
  • Department of Psychological Medicine
  • South London and Maudsley NHS Foundation Trust
  • Natural England
  • King's College Hospital
  • Nuraxi AI
  • Department of Biostatistics and Health Informatics
  • The Collective Intelligence Project
  • Aarhus University

Research output: Contribution to journalArticlepeer-review

6 Downloads (Pure)

Abstract

Large language models are rapidly becoming embedded in everyday life through artificial intelligence (AI) chatbots that people use for practical assistance and companionship, as well as for support with mental health and emotional well-being. Alongside clear benefits, clinicians and public reports increasingly describe a minority of users whose interactions seem to drift over days or weeks toward strongly questionable convictions, delusions, or suicidal crises. Importantly, clinically meaningful deterioration can occur even without overtly unsafe text outputs, via more insidious processes, such as compulsive use, sleep disruption, withdrawal from human contact, and progressive narrowing of attention around the chatbot relationship. In this Viewpoint, we argue that risk often arises not at a single tipping point but through trajectory effects that accumulate across extended dialogue and that prevailing safety evaluation approaches are misaligned with this reality because they primarily score risk at discrete conversational end points often reached through scripted dialogues lasting just a single turn or several turns. Mental health benchmarks and safety suites (including clinician-informed efforts) have advanced the field by testing refusal behavior, toxicity, and adversarial prompting. However, they often treat the last message as the unit of analysis and, therefore, miss when risk-relevant relational cues, signs of validation, contradiction handling, and shifts in certainty first emerge and how they compound. We propose that mental health safety assessment should shift from end points to trajectories by (1) treating the whole dialogue, not just the end result, as the focus of evaluation; (2) reporting turn-by-turn dynamics, such as delusion confirmation and harm enablement, and timing and persistence of safety interventions; and (3) calibrating short multiturn tests against longer, clinically realistic interaction sequences that can reveal context-length effects and drift. We further argue that transcript-only evaluation is insufficient in mental health contexts. Similar language can reflect very different internal states, and the relationship between expressed psychopathology and real-world harm is nonlinear. Therefore, safety research should incorporate proximal human outcomes following interactions (eg, shifts in certainty, openness to counterevidence, arousal, urge to continue, and subsequent sleep or behavior) and build a prospective clinical surveillance infrastructure that supports transcript donation with consent and linkage to health outcomes. Together, these steps would enable benchmarks that are clinically relevant and better aligned with the types of harms now being observed in real-world chatbot use.

Original languageEnglish
Article numbere91454
Number of pages9
JournalJMIR Mental Health
Volume13
Early online date6 Apr 2026
DOIs
Publication statusE-pub ahead of print - 6 Apr 2026

Keywords

  • Humans
  • Artificial Intelligence
  • Mental Health
  • Patient Safety
  • Generative Artificial Intelligence

Fingerprint

Dive into the research topics of 'It Is the Journey, Not the Destination: Moving From End Points to Trajectories When Assessing Chatbot Mental Health Safety'. Together they form a unique fingerprint.

Cite this