Stabilizing l1-norm prediction models by supervised feature grouping

Kamkar, Iman, Gupta, Sunil Kumar, Phung, Dinh and Venkatesh, Svetha 2016, Stabilizing l1-norm prediction models by supervised feature grouping, Journal of biomedical informatics, vol. 59, pp. 149-168, doi: 10.1016/j.jbi.2015.11.012.

Attached Files
Name Description MIMEType Size Downloads

Title Stabilizing l1-norm prediction models by supervised feature grouping
Formatted title Stabilizing l1-norm prediction models by supervised feature grouping
Author(s) Kamkar, Iman
Gupta, Sunil KumarORCID iD for Gupta, Sunil Kumar
Phung, DinhORCID iD for Phung, Dinh
Venkatesh, SvethaORCID iD for Venkatesh, Svetha
Journal name Journal of biomedical informatics
Volume number 59
Start page 149
End page 168
Total pages 20
Publisher Elsevier
Place of publication Amsterdam, The Netherlands
Publication date 2016-02
ISSN 1532-0464
Keyword(s) feature selection
supervised feature grouping
Summary Emerging Electronic Medical Records (EMRs) have reformed the modern healthcare. These records have great potential to be used for building clinical prediction models. However, a problem in using them is their high dimensionality. Since a lot of information may not be relevant for prediction, the underlying complexity of the prediction models may not be high. A popular way to deal with this problem is to employ feature selection. Lasso and l1-norm based feature selection methods have shown promising results. But, in presence of correlated features, these methods select features that change considerably with small changes in data. This prevents clinicians to obtain a stable feature set, which is crucial for clinical decision making. Grouping correlated variables together can improve the stability of feature selection, however, such grouping is usually not known and needs to be estimated for optimal performance. Addressing this problem, we propose a new model that can simultaneously learn the grouping of correlated features and perform stable feature selection. We formulate the model as a constrained optimization problem and provide an efficient solution with guaranteed convergence. Our experiments with both synthetic and real-world datasets show that the proposed model is significantly more stable than Lasso and many existing state-of-the-art shrinkage and classification methods. We further show that in terms of prediction performance, the proposed method consistently outperforms Lasso and other baselines. Our model can be used for selecting stable risk factors for a variety of healthcare problems, so it can assist clinicians toward accurate decision making.
Language eng
DOI 10.1016/j.jbi.2015.11.012
Field of Research 080109 Pattern Recognition and Data Mining
06 Biological Sciences
08 Information And Computing Sciences
11 Medical And Health Sciences
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category C1 Refereed article in a scholarly journal
ERA Research output type C Journal article
Copyright notice ©2016, Elsevier
Persistent URL

Connect to link resolver
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 2 times in TR Web of Science
Scopus Citation Count Cited 6 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 424 Abstract Views, 2 File Downloads  -  Detailed Statistics
Created: Mon, 07 Mar 2016, 18:23:04 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact