This paper proposes a matrix approach for hierarchical web page clustering with two algorithms using hyperlink information among pages. One clustering algorithm clusters web pages without considering cluster overlapping. Another one takes cluster overlapping into account. These algorithms take advantage of intrinsic relationships among the pages, and are independent of the order in which the pages are presented to the algorithms. Furthermore, the proposed algorithms do not require a predefined similarity threshold for clustering. They are easy to be implemented for web applications. The primary evaluations show the effectiveness of the proposed algorithms, as well as a promising application.
History
Pagination
207-216
Location
Singapore
Start date
2002-12-11
End date
2002-12-11
ISBN-13
9780769518138
ISBN-10
0769518133
Language
eng
Publication classification
E1.1 Full written paper - refereed
Copyright notice
2002, IEEE
Editor/Contributor(s)
Huang B, Ling TW, Mohania M, Ng WK, Wen J-R, Gupta SK
Title of proceedings
WISE 2002 : Proceedings of the 3rd International Conference on Web Information Systems Engineering Workshops 2002