Discovery of cluster characteristics and interesting rules describing smokers' clusters and the behavioural patterns of smokers' quitting intentions is an important task in the development of an effective tobacco control systems. In this paper, we attempt to determine the characteristics of smokers' clusters and simplified rule for predicting smokers' quitting behaviour that can provide feedback to build a scientific evidence-based adaptive tobacco control systems. Standard clustering algorithm groups the data based on there inherent pattern. However, they seldom provide human understandable easy description of the clusters'. Again, standard decision tree (SDT) based rule discovery depends on decision boundaries in the feature space. This may limit the ability of SDT to learn intermediate concepts for high dimensional large datasets such as tobacco control. In this paper, we propose a cluster-based rule discovery model (CRDM) that builds conceptual groups from which a set of decision trees (a decision forest) are constructed to find smokers' quitting rules. We also employ a re-labelling of unsupervised cluster (RLUC) approach to determine the characteristics of the clusters. RLUC approach uses re-labelling and decision tree approach to find the characteristics of the smokers' clusters. Experimental results on the tobacco control data set show that decision rules from the decision forest constructed by CRDM are simpler and can predict smokers' quitting intention more accurately than a single decision tree. RLUC approach finds textbased characteristics of the smokers' clusters which are easily understandable for policy makers in the tobacco control systems.
History
Location
Taipei, Taiwan
Publication classification
EN.1 Other conference paper
Pagination
672-683
Start date
2010-07-09
End date
2010-07-12
Title of proceedings
PACIS 2010 - 14th Pacific Asia Conference on Information Systems