Sample subset optimization for classifying imbalanced biological data

Yang, Pengyi, Zhang, Zili, Zhou, Bing B. and Zomaya, Albert Y. 2011, Sample subset optimization for classifying imbalanced biological data, in Advances in knowledge discovery and data mining : 15th Pacific-Asia Conference, PAKDD 2011, Shenzhen, China, May 24-27, 2011, proceedings, part II, Springer-Verlag, Berlin, Germany, pp.333-344.

Attached Files
Name Description MIMEType Size Downloads

Title Sample subset optimization for classifying imbalanced biological data
Author(s) Yang, Pengyi
Zhang, Zili
Zhou, Bing B.
Zomaya, Albert Y.
Title of book Advances in knowledge discovery and data mining : 15th Pacific-Asia Conference, PAKDD 2011, Shenzhen, China, May 24-27, 2011, proceedings, part II
Editor(s) Huang, Joshua Zhexue
Cao, Longbing
Srivastava, Jaideep
Publication date 2011
Series Lecture notes in artificial intelligence : 6635
Chapter number 27
Total chapters 45
Start page 333
End page 344
Total pages 12
Publisher Springer-Verlag
Place of Publication Berlin, Germany
Keyword(s) data
biology
optimization
sampling
Summary Data in many biological problems are often compounded by imbalanced class distribution. That is, the positive examples may largely outnumbered by the negative examples. Many classification algorithms such as support vector machine (SVM) are sensitive to data with imbalanced class distribution, and result in a suboptimal classification. It is desirable to compensate the imbalance effect in model training for more accurate classification. In this study, we propose a sample subset optimization technique for classifying biological data with moderate and extremely high imbalanced class distributions. By using this optimization technique with an ensemble of SVMs, we build multiple roughly balanced SVM base classifiers, each trained on an optimized sample subset. The experimental results demonstrate that the ensemble of SVMs created by our sample subset optimization technique can achieve higher area under the ROC curve (AUC) value than popular sampling approaches such as random over-/under-sampling; SMOTE sampling, and those in widely used ensemble approaches such as bagging and boosting.
ISBN 9783642208478
9783642208461
ISSN 0302-9743
Language eng
Field of Research 089999 Information and Computing Sciences not elsewhere classified
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category B1 Book chapter
HERDC collection year 2011
Copyright notice ©2011, Springer-Verlag Berlin Heidelberg
Persistent URL http://hdl.handle.net/10536/DRO/DU:30043179

Document type: Book Chapter
Collection: School of Information Technology
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 1 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 51 Abstract Views, 4 File Downloads  -  Detailed Statistics
Created: Tue, 13 Mar 2012, 09:51:48 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.