Text clustering with important words using normalization

Wu, Shunyao, Wang, Jinlong, Vu, Huy Quan and Li, Gang 2010, Text clustering with important words using normalization, in JCDL 2010 : Proceedings of the 2010 ACM/IEEE International Joint Conference on Digital Libraries : Vision 2020 - Beyond Digital Libraries, Association for Computing Machinery, New York, N.Y., pp. 393-393.

Attached Files
Name Description MIMEType Size Downloads

Title Text clustering with important words using normalization
Author(s) Wu, Shunyao
Wang, Jinlong
Vu, Huy Quan
Li, Gang
Conference name ACM/IEEE Joint Conference on Digital Libraries (10th : 2010 : Surfers Paradise, Qld.)
Conference location Surfers Paradise, Qld.
Conference dates 21-25 Jun. 2010
Title of proceedings JCDL 2010 : Proceedings of the 2010 ACM/IEEE International Joint Conference on Digital Libraries : Vision 2020 - Beyond Digital Libraries
Publication date 2010
Start page 393
End page 393
Total pages 1
Publisher Association for Computing Machinery
Place of publication New York, N.Y.
Keyword(s) document clustering
important words
normalization
Summary Important words, which usually exist in part of Title, Subject and Keywords, can briefly reflect the main topic of a document. In recent years, it is a common practice to exploit the semantic topic of documents and utilize important words to achieve document clustering, especially for short texts such as news articles. This paper proposes a novel method to extract important words from Subject and Keywords of articles, and then partition documents only with those important words. Considering the fact that frequencies of important words are usually low and the scale matrix dataset for important words is small, a normalization method is then proposed to normalize the scale dataset so that more accurate results can be achieved by sufficiently exploiting the limited information. The experiments validate the effectiveness of our method.
ISBN 9781450300858
Language eng
Field of Research 080109 Pattern Recognition and Data Mining
Socio Economic Objective 810107 National Security
HERDC Research category E2 Full written paper - non-refereed / Abstract reviewed
HERDC collection year 2010
Copyright notice ©2010, by the Association for Computing Machinery, Inc.
Persistent URL http://hdl.handle.net/10536/DRO/DU:30034432

Document type: Conference Paper
Collection: School of Information Technology
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Access Statistics: 170 Abstract Views, 20 File Downloads  -  Detailed Statistics
Created: Fri, 29 Apr 2011, 16:56:41 EST by Sandra Dunoon

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.