Abstract
Large language models are rapidly becoming embedded in everyday life through artificial intelligence (AI) chatbots that people use for practical assistance and companionship, as well as for support with mental health and emotional well-being. Alongside clear benefits, clinicians and public reports increasingly describe a minority of users whose interactions seem to drift over days or weeks toward strongly questionable convictions, delusions, or suicidal crises. Importantly, clinically meaningful deterioration can occur even without overtly unsafe text outputs, via more insidious processes, such as compulsive use, sleep disruption, withdrawal from human contact, and progressive narrowing of attention around the chatbot relationship. In this Viewpoint, we argue that risk often arises not at a single tipping point but through trajectory effects that accumulate across extended dialogue and that prevailing safety evaluation approaches are misaligned with this reality because they primarily score risk at discrete conversational end points often reached through scripted dialogues lasting just a single turn or several turns. Mental health benchmarks and safety suites (including clinician-informed efforts) have advanced the field by testing refusal behavior, toxicity, and adversarial prompting. However, they often treat the last message as the unit of analysis and, therefore, miss when risk-relevant relational cues, signs of validation, contradiction handling, and shifts in certainty first emerge and how they compound. We propose that mental health safety assessment should shift from end points to trajectories by (1) treating the whole dialogue, not just the end result, as the focus of evaluation; (2) reporting turn-by-turn dynamics, such as delusion confirmation and harm enablement, and timing and persistence of safety interventions; and (3) calibrating short multiturn tests against longer, clinically realistic interaction sequences that can reveal context-length effects and drift. We further argue that transcript-only evaluation is insufficient in mental health contexts. Similar language can reflect very different internal states, and the relationship between expressed psychopathology and real-world harm is nonlinear. Therefore, safety research should incorporate proximal human outcomes following interactions (eg, shifts in certainty, openness to counterevidence, arousal, urge to continue, and subsequent sleep or behavior) and build a prospective clinical surveillance infrastructure that supports transcript donation with consent and linkage to health outcomes. Together, these steps would enable benchmarks that are clinically relevant and better aligned with the types of harms now being observed in real-world chatbot use.
| Original language | English |
|---|---|
| Article number | e91454 |
| Number of pages | 9 |
| Journal | JMIR Mental Health |
| Volume | 13 |
| Early online date | 6 Apr 2026 |
| DOIs | |
| Publication status | E-pub ahead of print - 6 Apr 2026 |
Keywords
- Humans
- Artificial Intelligence
- Mental Health
- Patient Safety
- Generative Artificial Intelligence
Fingerprint
Dive into the research topics of 'It Is the Journey, Not the Destination: Moving From End Points to Trajectories When Assessing Chatbot Mental Health Safety'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver