File(s) under permanent embargo
Succinct contrast sets via false positive controlling with an application in clinical process redesign
journal contribution
posted on 2020-12-01, 00:00 authored by Dang NguyenDang Nguyen, Wei LuoWei Luo, B Vo, W PedryczMany applications of intelligent systems involve understanding a group of contrastively different outcome (e.g., all survivors of a deadly cancer, a top performing team in a large corporation). The intelligent system needs to identify attributes (features) which best describe or explain the group versus its alternatives. In data mining, this problem is studied under the framework of contrast set mining (CSM). Although CSM is not new, the era of big data has produced new computational and statistical challenges. In particular, existing algorithms fail (1) to perform efficiently in terms of runtime on large-scale datasets and (2) to accommodate simultaneous inference on an overwhelming array of features which are often repetitive and collinear. In this paper, we develop a CSM algorithm which addresses both challenges. The computational challenge is addressed with a tree structure and two theorems while the statistical challenge is addressed with the application of false discovery rate for multiple testing. The computational and statistical advantages of the proposed algorithm over three state-of-the-art algorithms are demonstrated with comprehensive experiments. In addition, we also show the effectiveness of our proposed method in an intelligence-system application involving hospital process redesign. The proposed method not only improves the performance of machine learning systems, but also generates succinct and insightful patterns directly relevant to clinical decision-making.
History
Journal
Expert systems with applicationsVolume
161Article number
113670Pagination
1 - 17Publisher
Elsevier, The NetherlandsPublisher DOI
ISSN
0957-4174Language
engPublication classification
C1 Refereed article in a scholarly journalUsage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorks
BibTeX
Ref. manager
Endnote
DataCite
NLM
DC