File(s) under permanent embargo
Fuzzy-based information decomposition for incomplete and imbalanced data learning
journal contributionposted on 2017-12-01, 00:00 authored by Shigang Liu, Jun Zhang, Yang Xiang, Wanlei Zhou
IEEE Class imbalance and missing values are two critical problems in pattern classification. Researchers have proposed a number of techniques to address each of the problems. However, no single technique can solve the two problems. Moreover, the simple combination approach cannot accurately classify the imbalanced data with missing values. This paper develops a fuzzy-based information decomposition (FID) method to simultaneously address these two problems. In the new FID method, the two different problems are treated as the same missing data estimation problem. In particular, FID rebalances the training data by creating synthetic samples for the minority class. The proposed scheme has two steps: weighting and recovery. In the weighting step, the weights produced by the fuzzy membership functions are used to quantify the contribution of the observed data to the missing estimation. In the recovery step, missing values will be estimated by taking into account different contribution of the observed data. To evaluate the performance of the new FID method, a large number of classification experiments have been carried out on 27 well-known datasets. The results show that the FID method significantly outperforms other 10 state-of-the-art individual methods and 8 combination methods when missing values and imbalanced data present at the same time.