Openly accessible

A Two-Phase Approach for Semi-Supervised Feature Selection

Saxena, Amit, Pare, Shreya, Meena, Mahendra Singh, Gupta, Deepak, Gupta, Akshansh, Razzak, Imran, Lin, Chin-Teng and Prasad, Mukesh 2020, A Two-Phase Approach for Semi-Supervised Feature Selection, Algorithms, vol. 13, no. 9, pp. 1-20, doi: 10.3390/a13090215.

Attached Files
Name Description MIMEType Size Downloads

Title A Two-Phase Approach for Semi-Supervised Feature Selection
Author(s) Saxena, Amit
Pare, Shreya
Meena, Mahendra Singh
Gupta, Deepak
Gupta, Akshansh
Razzak, ImranORCID iD for Razzak, Imran orcid.org/0000-0002-3930-6600
Lin, Chin-Teng
Prasad, Mukesh
Journal name Algorithms
Volume number 13
Issue number 9
Article ID 215
Start page 1
End page 20
Total pages 20
Publisher MDPI AG
Place of publication Basel, Switzerland
Publication date 2020
ISSN 1999-4893
Keyword(s) feature selection
semi-supervised datasets
classification
clustering
correlation
Summary This paper proposes a novel approach for selecting a subset of features in semi-supervised datasets where only some of the patterns are labeled. The whole process is completed in two phases. In the first phase, i.e., Phase-I, the whole dataset is divided into two parts: The first part, which contains labeled patterns, and the second part, which contains unlabeled patterns. In the first part, a small number of features are identified using well-known maximum relevance (from first part) and minimum redundancy (whole dataset) based feature selection approaches using the correlation coefficient. The subset of features from the identified set of features, which produces a high classification accuracy using any supervised classifier from labeled patterns, is selected for later processing. In the second phase, i.e., Phase-II, the patterns belonging to the first and second part are clustered separately into the available number of classes of the dataset. In the clusters of the first part, take the majority of patterns belonging to a cluster as the class for that cluster, which is given already. Form the pairs of cluster centroids made in the first and second part. The centroid of the second part nearest to a centroid of the first part will be paired. As the class of the first centroid is known, the same class can be assigned to the centroid of the cluster of the second part, which is unknown. The actual class of the patterns if known for the second part of the dataset can be used to test the classification accuracy of patterns in the second part. The proposed two-phase approach performs well in terms of classification accuracy and number of features selected on the given benchmarked datasets.
Language eng
DOI 10.3390/a13090215
Indigenous content off
Field of Research 01 Mathematical Sciences
08 Information and Computing Sciences
09 Engineering
HERDC Research category C1 Refereed article in a scholarly journal
Free to Read? Yes
Persistent URL http://hdl.handle.net/10536/DRO/DU:30141567

Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 0 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 12 Abstract Views, 1 File Downloads  -  Detailed Statistics
Created: Mon, 07 Sep 2020, 17:50:10 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.