Deakin University
Browse

Depth first rule heneration for text categorization

journal contribution
posted on 2006-01-01, 00:00 authored by Jiyuan An, Yi-Ping Phoebe Chen
Classification methods are usually used to categorize text documents, such as, Rocchio method, Naïve bayes based method, and SVM based text classification method. These methods learn labeled text documents and then construct classifiers. The generated classifiers can predict which category is located for a new coming text document. The keywords in the document are often used to form rules to categorize text documents, for example “kw = computer” can be a rule for the IT documents category. However, the number of keywords is very large. To select keywords from the large number of keywords is a challenging work. Recently, a rule generation method based on enumeration of all possible keywords combinations has been proposed [2]. In this method, there remains a crucial problem: how to prune irrelevant combinations at the early stages of the rule generation procedure. In this paper, we propose a method than can effectively prune irrelative keywords at an early stage.

History

Journal

Frontiers in artificial intelligence and applications: advances in intelligent IT: active media technology

Volume

138

Pagination

302 - 306

Publisher

IOS Press

Location

Amsterdam, Netherlands

ISSN

0922-6389

eISSN

1879-8314

Language

eng

Publication classification

C1 Refereed article in a scholarly journal

Copyright notice

2006, The authors

Usage metrics

    Research Publications

    Categories

    No categories selected

    Keywords

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC