Deakin University
Browse

Using corpus analysis to inform research into opinion detection in blogs

journal contribution
posted on 2007-12-01, 00:00 authored by D Osman, John YearwoodJohn Yearwood, P Vamplew
Opinion detection research relies on labeled docu-ments for training data, either by assumptions based on the document's origin or by using human asses-sors to categorise the documents. In recent years, blogs have become a source for opinion identification research (TREC Blog06). This study analyses the part-of-speech proportion and the words used within various corpora, determining key differences and sim-ilarities useful when preparing for opinion identifica-tion research. The resulting comparisons between the characteristics of the various corpora is detailed and discussed. In particular, opinion-bearing and non-opinion Blog06 documents were found to display a high level of similarity, indicating that blog docu-ments assessed at the document level cannot be used as training data in opinion identification research. © 2007, Australian Computer Society, Inc.

History

Journal

Conferences in Research and Practice in Information Technology Series

Volume

70

Pagination

65-75

ISSN

1445-1336

Language

eng

Publication classification

CN.1 Other journal article

Publisher

Australian Computer Society Inc.

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC