Deakin University
Browse

Charting the evolution of artificial intelligence mental health chatbots from rule‐based systems to large language models: a systematic review

journal contribution
posted on 2025-10-01, 05:59 authored by Yining Hua, Steve Siddals, Zilin Ma, Isaac Galatzer‐Levy, Winna Xia, Christine Hau, Hongbin Na, Matthew Flathers, Jake LinardonJake Linardon, Cyrus Ayubcha, John Torous
The rapid evolution of artificial intelligence (AI) chatbots in mental health care presents a fragmented landscape with variable clinical evidence and evaluation rigor. This systematic review of 160 studies (2020‐2024) classifies chatbot architectures – rule‐based, machine learning‐based, and large language model (LLM)‐based – and proposes a three‐tier evaluation framework: foundational bench testing (technical validation), pilot feasibility testing (user engagement), and clinical efficacy testing (symptom reduction). While rule‐based systems dominated until 2023, LLM‐based chatbots surged to 45% of new studies in 2024. However, only 16% of LLM studies underwent clinical efficacy testing, with most (77%) still in early validation. Overall, only 47% of studies focused on clinical efficacy testing, exposing a critical gap in robust validation of therapeutic benefit. Discrepancies emerged between marketed claims (“AI‐powered”) and actual AI architectures, with many interventions relying on simple rule‐based scripts. LLM‐based chatbots are increasingly studied for emotional support and psychoeducation, yet they pose unique ethical concerns, including incorrect responses, privacy risks, and unverified therapeutic effects. Despite their generative capabilities, LLMs remain largely untested in high‐stakes mental health contexts. This paper emphasizes the need for standardized evaluation and benchmarking aligned with medical AI certification to ensure safe, transparent and ethical deployment. The proposed framework enables clearer distinctions between technical novelty and clinical efficacy, offering clinicians, researchers and regulators ordered steps to guide future standards and benchmarks. To ensure that AI chatbots enhance mental health care, future research must prioritize rigorous clinical efficacy trials, transparent architecture reporting, and evaluations that reflect real‐world impact rather than the well‐known potential.

Funding

Funder: Argosy Foundation

History

Related Materials

  1. 1.

Location

London, Eng.

Open access

  • No

Language

eng

Journal

World Psychiatry

Volume

24

Pagination

383-394

ISSN

1723-8617

eISSN

2051-5545

Issue

3

Publisher

Wiley