Missing value estimation for mixed-attribute data sets

Zhu, X; Zhang, Shichao; Jin, Z; Zhang, Zili; Xu, Z

Missing value estimation for mixed-attribute data sets

journal contribution

posted on 2011-01-01, 00:00 authored by X Zhu, Shichao Zhang, Z Jin, Zili ZhangZili Zhang, Z Xu

Missing data imputation is a key issue in learning from incomplete data. Various techniques have been developed with great successes on dealing with missing values in data sets with homogeneous attributes (their independent attributes are all either continuous or discrete). This paper studies a new setting of missing data imputation, i.e., imputing missing data in data sets with heterogeneous attributes (their independent attributes are of different types), referred to as imputing mixed-attribute data sets. Although many real applications are in this setting, there is no estimator designed for imputing mixed-attribute data sets. This paper first proposes two consistent estimators for discrete and continuous missing target values, respectively. And then, a mixture-kernel-based iterative estimator is advocated to impute mixed-attribute data sets. The proposed method is evaluated with extensive experiments compared with some typical algorithms, and the result demonstrates that the proposed approach is better than these existing imputation methods in terms of classification accuracy and root mean square error (RMSE) at different missing ratios.

History

Journal

IEEE transactions on knowledge and data engineering

Volume

23

Issue

1

Pagination

110 - 121

Publisher

IEEE

Location

Piscataway, NJ

Publisher DOI

https://doi.org/10.1109/TKDE.2010.99

ISSN

1041-4347

eISSN

1558-2191

Language

eng

Publication classification

C1 Refereed article in a scholarly journal

Copyright notice

2011, IEEE

Usage metrics

Keywords

classification data mining methodologies machine learning Science & Technology Technology Computer Science, Artificial Intelligence Computer Science, Information Systems Engineering, Electrical & Electronic Computer Science Engineering BANDWIDTH SELECTION IMPUTATION LIKELIHOOD MIXTURES SVM

Missing value estimation for mixed-attribute data sets

History

Journal

Volume

Issue

Pagination

Publisher

Location

Publisher DOI

ISSN

eISSN

Language

Publication classification

Copyright notice

Usage metrics

Categories

Keywords

Licence

Exports