A maximal frequent itemset approach for web document clustering

Zhuang, Ling and Dai, Honghua 2004, A maximal frequent itemset approach for web document clustering, in Fourth International Conference on Computer and Information Technology : proceedings : September 14-16, 2004, Wuhan, China, IEEE Computer Society, Los Alamitos, Calif., pp. 970-977.

Attached Files
Name Description MIMEType Size Downloads

Title A maximal frequent itemset approach for web document clustering
Author(s) Zhuang, Ling
Dai, Honghua
Conference name Computer and Information Technology. Conference (4th : 2004 : Wuhan, China)
Conference location Wuhan, China
Conference dates 14-16 Sep. 2004
Title of proceedings Fourth International Conference on Computer and Information Technology : proceedings : September 14-16, 2004, Wuhan, China
Editor(s) Wei, Daming
Wang, Hui
Peng, Zhiyong
Kara, Atsushi
He, Yanxiang
Publication date 2004
Conference series Computer and Information Technology Conference
Start page 970
End page 977
Publisher IEEE Computer Society
Place of publication Los Alamitos, Calif.
Summary To efficiently and yet accurately cluster Web documents is of great interests to Web users and is a key component of the searching accuracy of a Web search engine. To achieve this, this paper introduces a new approach for the clustering of Web documents, which is called maximal frequent itemset (MFI) approach. Iterative clustering algorithms, such as K-means and expectation-maximization (EM), are sensitive to their initial conditions. MFI approach firstly locates the center points of high density clusters precisely. These center points then are used as initial points for the K-means algorithm. Our experimental results tested on 3 Web document sets show that our MFI approach outperforms the other methods we compared in most cases, particularly in the case of large number of categories in Web document sets.
ISBN 0769522165
9780769522166
Language eng
Field of Research 080699 Information Systems not elsewhere classified
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category E1 Full written paper - refereed
Copyright notice ©2004, IEEE
Persistent URL http://hdl.handle.net/10536/DRO/DU:30005533

Document type: Conference Paper
Collection: School of Information Technology
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Access Statistics: 385 Abstract Views, 0 File Downloads  -  Detailed Statistics
Created: Mon, 07 Jul 2008, 09:50:54 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.