In named entity recognition (NER) for biomedical literature, approaches based on combined classifiers have demonstrated great performance improvement compared to a single (best) classifier. This is mainly owed to sufficient level of diversity exhibited among classifiers, which is a selective property of classifier set. Given a large number of classifiers, how to select different classifiers to put into a classifier-ensemble is a crucial issue of multiple classifier-ensemble design. With this observation in mind, we proposed a generic genetic classifier-ensemble method for the classifier selection in biomedical NER. Various diversity measures and majority voting are considered, and disjoint feature subsets are selected to construct individual classifiers. A basic type of individual classifier – Support Vector Machine (SVM) classifier is adopted as SVM-classifier committee. A multi-objective Genetic algorithm (GA) is employed as the classifier selector to facilitate the ensemble classifier to improve the overall sample classification accuracy. The proposed approach is tested on the benchmark dataset – GENIA version 3.02 corpus, and compared with both individual best SVM classifier and SVM-classifier ensemble algorithm as well as other machine learning methods such as CRF, HMM and MEMM. The results show that the proposed approach outperforms other classification algorithms and can be a useful method for the biomedical NER problem.
History
Title of book
Advances in knowledge discovery and data mining
Series
Lecture notes in artificial intelligence; vol. 7301
Chapter number
8
Pagination
86 - 97
Publisher
Springer-Verlag
Place of publication
Berlin, Germany
ISSN
0302-9743
eISSN
1611-3349
ISBN-13
9783642302176
ISBN-10
3642302173
Language
eng
Notes
Presented at the 16th Pacific-Asia Conference, PAKDD 2012 Kuala Lumpur, Malaysia, May 29 – June 1, 2012 Proceedings, Part I