You are not logged in.
Openly accessible

Fusing data mining, machine learning and traditional statistics to detect biomarkers associated with depression

Dipnall, Joanna F., Pasco, Julie A., Berk, Michael, Williams, Lana J., Dodd, Seetal, Jacka, Felice N. and Meyer, Denny 2016, Fusing data mining, machine learning and traditional statistics to detect biomarkers associated with depression, PLoS one, vol. 11, no. 2, Article Number : e0148195, pp. 1-23, doi: 10.1371/journal.pone.0148195.

Attached Files
Name Description MIMEType Size Downloads
jacka-fusingdata-2016.pdf Published version application/pdf 474.29KB 16

Title Fusing data mining, machine learning and traditional statistics to detect biomarkers associated with depression
Author(s) Dipnall, Joanna F.
Pasco, Julie A.ORCID iD for Pasco, Julie A. orcid.org/0000-0002-8968-4714
Berk, MichaelORCID iD for Berk, Michael orcid.org/0000-0002-5554-6946
Williams, Lana J.
Dodd, SeetalORCID iD for Dodd, Seetal orcid.org/0000-0002-7918-4636
Jacka, Felice N.ORCID iD for Jacka, Felice N. orcid.org/0000-0002-9825-0328
Meyer, Denny
Journal name PLoS one
Volume number 11
Issue number 2
Season Article Number : e0148195
Start page 1
End page 23
Total pages 23
Publisher Public Library of Science (PLOS)
Place of publication San Francisco, Calif.
Publication date 2016
ISSN 1932-6203
Keyword(s) Science & Technology
Multidisciplinary Sciences
Science & Technology - Other Topics
CELL DISTRIBUTION WIDTH
MULTIPLE IMPUTATION
REGRESSION SHRINKAGE
MAJOR DEPRESSION
DISEASE
CLASSIFICATION
BILIRUBIN
RISK
ANTIOXIDANT
PERSPECTIVES
Summary BACKGROUND: Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study.

METHODS: The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators.

RESULTS: After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001).

CONCLUSION: The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and was demonstrated to be a useful tool for detecting three biomarkers associated with depression for future hypothesis generation: red cell distribution width, serum glucose and total bilirubin.
Language eng
DOI 10.1371/journal.pone.0148195
Field of Research 110319 Psychiatry (incl Psychotherapy)
Socio Economic Objective 920410 Mental Health
HERDC Research category C1 Refereed article in a scholarly journal
ERA Research output type C Journal article
Copyright notice ©2016, The Authors
Free to Read? Yes
Use Rights Creative Commons Attribution licence
Persistent URL http://hdl.handle.net/10536/DRO/DU:30082049

Document type: Journal Article
Collections: Faculty of Health
School of Medicine
Open Access Collection
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 5 times in TR Web of Science
Scopus Citation Count Cited 6 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 153 Abstract Views, 16 File Downloads  -  Detailed Statistics
Created: Wed, 09 Mar 2016, 13:21:23 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.