chen-findingcoverageusing-2009.pdf (874.71 kB)
Finding coverage using incremental attribute combinations
journal contributionposted on 2009-05-01, 00:00 authored by Jiyuan An, Yi-Ping Phoebe Chen
Coverage is the range that covers only positive samples in attribute (or feature) space. Finding coverage is the kernel problem in induction algorithms because of the fact that coverage can be used as rules to describe positive samples. To reflect the characteristic of training samples, it is desirable that the large coverage that cover more positive samples. However, it is difficult to find large coverage, because the attribute space is usually very high dimensionality. Many heuristic methods such as ID3, AQ and CN2 have been proposed to find large coverage. A robust algorithm also has been proposed to find the largest coverage, but the complexities of time and space are costly when the dimensionality becomes high. To overcome this drawback, this paper proposes an algorithm that adopts incremental feature combinations to effectively find the largest coverage. In this algorithm, the irrelevant coverage can be pruned away at early stages because potentially large coverage can be found earlier. Experiments show that the space and time needed to find the largest coverage has been significantly reduced.