Deakin University
Browse

Dual‐stream algorithms for dementia detection: Harnessing structured and unstructured electronic health record data, a novel approach to prevalence estimation

Version 2 2025-06-02, 05:04
Version 1 2025-05-28, 02:21
journal contribution
posted on 2025-06-02, 05:04 authored by Taya A Collyer, Ming LiuMing Liu, Richard Beare, Nadine E Andrew, David Ung, Alison Carver, Jenni Ilomaki, J Simon Bell, Amanda G Thrift, Walter A Rocca, Jennifer L St Sauver, Alicia Lu, Kristy Siostrom, Chris Moran, Helene Roberts, Trevor T‐J Chong, Anne Murray, Tanya Ravipati, Bridget O'Bree, Velandai K Srikanth
AbstractINTRODUCTIONIdentifying individuals with dementia is crucial for prevalence estimation and service planning, but reliable, scalable methods are lacking. We developed novel set algorithms using both structured and unstructured electronic health record (EHR) data, applying Diagnostic and Statistical Manual of Mental Disorders criteria for dementia case identification.METHODSOur cohort (n = 1082) included individuals aged ≥ 60 with dementia identified through specialist clinics and a comparison group without dementia. Clinicians from Australia and the United States informed predictor selection. We developed algorithms through a biostatistics stream for structured data and a natural language processing (NLP) stream for text, synthesizing results via logistic regression.RESULTSThe final structured model retained 16 variables (area under the receiver operating characteristic curve [AUC] 0.853, specificity 72.2%, sensitivity 80.6%). NLP classifiers (logistic regression, support vector machine, and random forest models) performed comparably. The final, combined model outperformed all others (AUC = 0.951, P < 0.001 for comparison to structured model).DISCUSSIONEmbedding text‐derived insights within algorithms trained on structured medical data significantly enhances dementia identification capacity.Highlights Algorithmic tools for detection of individuals with dementia are available; however, previous work has used heterogeneous case definitions which are not clinically meaningful, and has relied on proxies such as diagnostic codes or medications for case ascertainment. We used a novel, dual‐stream algorithmic development approach, simultaneously and separately modeling a clinically meaningful outcome (diagnosis of dementia according to specialized clinical impression) using structured and unstructured electronic health record datasets. Our clinically grounded case definition supported the inclusion of key structured variables (such as dementia International Classification of Disease codes and medications) as modeling predictors rather than outcomes. Our algorithms, published in detail to support validation and replication, represent a major step forward in the use of routinely collected data for detection of diagnosed dementia.

History

Journal

Alzheimer's & Dementia

Volume

21

Article number

e70132

Pagination

1-13

Location

London, Eng.

Open access

  • Yes

ISSN

1552-5260

eISSN

1552-5279

Language

eng

Publication classification

C1.1 Refereed article in a scholarly journal

Issue

5

Publisher

Wiley