Deakin University
Browse

A comparison of penalised regression methods for informing the selection of predictive markers

Version 3 2024-06-19, 00:18
Version 2 2024-06-05, 11:10
Version 1 2020-11-23, 10:00
journal contribution
posted on 2024-06-19, 00:18 authored by Christopher GreenwoodChristopher Greenwood, George Youssef, Primrose LetcherPrimrose Letcher, Jacqui MacdonaldJacqui Macdonald, LJ Hagg, A Sanson, J McIntosh, Delyse HutchinsonDelyse Hutchinson, John ToumbourouJohn Toumbourou, Matthew Fuller-TyszkiewiczMatthew Fuller-Tyszkiewicz, Craig OlssonCraig Olsson
Background Penalised regression methods are a useful atheoretical approach for both developing predictive models and selecting key indicators within an often substantially larger pool of available indicators. In comparison to traditional methods, penalised regression models improve prediction in new data by shrinking the size of coefficients and retaining those with coefficients greater than zero. However, the performance and selection of indicators depends on the specific algorithm implemented. The purpose of this study was to examine the predictive performance and feature (i.e., indicator) selection capability of common penalised logistic regression methods (LASSO, adaptive LASSO, and elastic-net), compared with traditional logistic regression and forward selection methods. Design Data were drawn from the Australian Temperament Project, a multigenerational longitudinal study established in 1983. The analytic sample consisted of 1,292 (707 women) participants. A total of 102 adolescent psychosocial and contextual indicators were available to predict young adult daily smoking. Findings Penalised logistic regression methods showed small improvements in predictive performance over logistic regression and forward selection. However, no single penalised logistic regression model outperformed the others. Elastic-net models selected more indicators than either LASSO or adaptive LASSO. Additionally, more regularised models included fewer indicators, yet had comparable predictive performance. Forward selection methods dismissed many indicators identified as important in the penalised logistic regression models. Conclusions Although overall predictive accuracy was only marginally better with penalised logistic regression methods, benefits were most clear in their capacity to select a manageable subset of indicators. Preference to competing penalised logistic regression methods may therefore be guided by feature selection capability, and thus interpretative considerations, rather than predictive performance alone.

History

Journal

PLoS ONE

Volume

15

Article number

ARTN e0242730

Pagination

1 - 14

Location

United States

ISSN

1932-6203

eISSN

1932-6203

Language

English

Publication classification

C1 Refereed article in a scholarly journal

Copyright notice

2020, Greenwood et al.

Issue

11 November

Publisher

PUBLIC LIBRARY SCIENCE

Usage metrics

    Research Publications

    Categories

    No categories selected

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC