Using corpus analysis to inform research into opinion detection in blogs
journal contribution
posted on 2007-12-01, 00:00 authored by D Osman, John YearwoodJohn Yearwood, P VamplewOpinion detection research relies on labeled docu-ments for training data, either by assumptions based on the document's origin or by using human asses-sors to categorise the documents. In recent years, blogs have become a source for opinion identification research (TREC Blog06). This study analyses the part-of-speech proportion and the words used within various corpora, determining key differences and sim-ilarities useful when preparing for opinion identifica-tion research. The resulting comparisons between the characteristics of the various corpora is detailed and discussed. In particular, opinion-bearing and non-opinion Blog06 documents were found to display a high level of similarity, indicating that blog docu-ments assessed at the document level cannot be used as training data in opinion identification research. © 2007, Australian Computer Society, Inc.
History
Journal
Conferences in Research and Practice in Information Technology SeriesVolume
70Pagination
65-75ISSN
1445-1336Language
engPublication classification
CN.1 Other journal articlePublisher
Australian Computer Society Inc.Publication URL
Usage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorksRefWorks
BibTeXBibTeX
Ref. managerRef. manager
EndnoteEndnote
DataCiteDataCite
NLMNLM
DCDC