Web search activity data accurately predict population chronic disease risk in the USA

Nguyen, Thin, Tran, Truyen, Luo, Wei, Gupta, Sunil, Rana, Santu, Phung, Dinh, Nichols, Melanie, Millar, Lynne, Venkatesh, Svetha and Allender, Steve 2015, Web search activity data accurately predict population chronic disease risk in the USA, Journal of epidemiology and community health, vol. 69, no. 7, pp. 693-699, doi: 10.1136/jech-2014-204523.

Attached Files
Name Description MIMEType Size Downloads

Title Web search activity data accurately predict population chronic disease risk in the USA
Author(s) Nguyen, ThinORCID iD for Nguyen, Thin orcid.org/0000-0003-3467-8963
Tran, TruyenORCID iD for Tran, Truyen orcid.org/0000-0001-6531-8907
Luo, WeiORCID iD for Luo, Wei orcid.org/0000-0002-4711-7543
Gupta, SunilORCID iD for Gupta, Sunil orcid.org/0000-0002-3308-1930
Rana, SantuORCID iD for Rana, Santu orcid.org/0000-0003-2247-850X
Phung, DinhORCID iD for Phung, Dinh orcid.org/0000-0002-9977-8247
Nichols, MelanieORCID iD for Nichols, Melanie orcid.org/0000-0002-7834-5899
Millar, Lynne
Venkatesh, SvethaORCID iD for Venkatesh, Svetha orcid.org/0000-0001-8675-6631
Allender, SteveORCID iD for Allender, Steve orcid.org/0000-0002-4842-3294
Journal name Journal of epidemiology and community health
Volume number 69
Issue number 7
Start page 693
End page 699
Total pages 7
Publisher BMJ Publishing Group
Place of publication London, Eng.
Publication date 2015
ISSN 1470-2738
Science & Technology
Life Sciences & Biomedicine
Public, Environmental & Occupational Health
Summary BACKGROUND: The WHO framework for non-communicable disease (NCD) describes risks and outcomes comprising the majority of the global burden of disease. These factors are complex and interact at biological, behavioural, environmental and policy levels presenting challenges for population monitoring and intervention evaluation. This paper explores the utility of machine learning methods applied to population-level web search activity behaviour as a proxy for chronic disease risk factors. METHODS: Web activity output for each element of the WHO's Causes of NCD framework was used as a basis for identifying relevant web search activity from 2004 to 2013 for the USA. Multiple linear regression models with regularisation were used to generate predictive algorithms, mapping web search activity to Centers for Disease Control and Prevention (CDC) measured risk factor/disease prevalence. Predictions for subsequent target years not included in the model derivation were tested against CDC data from population surveys using Pearson correlation and Spearman's r. RESULTS: For 2011 and 2012, predicted prevalence was very strongly correlated with measured risk data ranging from fruits and vegetables consumed (r=0.81; 95% CI 0.68 to 0.89) to alcohol consumption (r=0.96; 95% CI 0.93 to 0.98). Mean difference between predicted and measured differences by State ranged from 0.03 to 2.16. Spearman's r for state-wise predicted versus measured prevalence varied from 0.82 to 0.93. CONCLUSIONS: The high predictive validity of web search activity for NCD risk has potential to provide real-time information on population risk during policy implementation and other population-level NCD prevention efforts.
Language eng
DOI 10.1136/jech-2014-204523
Field of Research 080109 Pattern Recognition and Data Mining
111706 Epidemiology
111711 Health Information Systems (incl Surveillance)
110201 Cardiology (incl Cardiovascular Diseases)
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category C1 Refereed article in a scholarly journal
ERA Research output type C Journal article
Copyright notice ©2015, BMJ Publishing Group
Persistent URL http://hdl.handle.net/10536/DRO/DU:30074199

Connect to link resolver
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 10 times in TR Web of Science
Scopus Citation Count Cited 11 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 644 Abstract Views, 3 File Downloads  -  Detailed Statistics
Created: Wed, 23 Sep 2015, 12:31:26 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.