The rapid increase of web complexity and size makes web searched results far from satisfaction in many cases due to a huge amount of information returned by search engines. How to find intrinsic relationships among the web pages at a higher level to implement efficient web searched information management and retrieval is becoming a challenge problem. In this paper, we propose an approach to measure web page similarity. This approach takes hyperlink transitivity and page importance into consideration. From this new similarity measurement, an effective hierarchical web page clustering algorithm is proposed. The primary evaluations show the effectiveness of the new similarity measurement and the improvement of web page clustering. The proposed page similarity, as well as the matrix-based hyperlink analysis methods, could be applied to other web-based research areas..
History
Pagination
49-57
Location
Adelaide, S. Aust.
Start date
2003-02-04
End date
2003-02-07
ISBN-13
9780909925956
ISBN-10
090992595X
Language
eng
Publication classification
E1 Full written paper - refereed
Copyright notice
2003, Australian Computer Society
Editor/Contributor(s)
Dieter-Schewe K
Title of proceedings
Database technologies 2003 : proceedings of the fourteenth Australasian Database Conference
Event
Australasian Database Conference (14th : 2003 : Adelaide, S. Aust.)
Publisher
Australian Computer Society
Place of publication
Sydney, N.S.W.
Series
Australian computer science communications ; v. 25, no. 2