Deakin University
Browse

File(s) under permanent embargo

Applications of machine learning for linguistic analysis of texts

chapter
posted on 2012-12-01, 00:00 authored by R Torney, John YearwoodJohn Yearwood, P Vamplew, A Kelarev
This chapter describes a novel multistage method for linguistic clustering of large collections of texts available on the Internet as a precursor to linguistic analysis of these texts. This method addresses the practicalities of applying clustering operations to a very large set of text documents by using a combination of unsupervised clustering and supervised classification. The method relies on creating a multitude of independent clusterings of a randomized sample selected from the International Corpus of Learner English. Several consensus functions and sophisticated algorithms are applied in two substages to combine these independent clusterings into one final consensus clustering, which is then used to train fast classifiers in order to enable them to perform the profiling of very large collections of text and web data. This approach makes it possible to apply advanced highly accurate and sophisticated clustering techniques by combining them with fast supervised classification algorithms. For the effectiveness of this multistage method it is crucial to determine how well the supervised classification algorithms are going to perform at the final stage, when they are used to process large data sets available on the Internet. This performance may also serve as an indication of the quality of the combined consensus clustering obtained in the preceding stages. The authors' experimental results compare the performance of several classification algorithms incorporated in this multistage scheme and demonstrate that several of these classification algorithms achieve very high precision and recall and can be used in practical implementations of their method.

History

Title of book

Machine Learning Algorithms for Problem Solving in Computational Applications: Intelligent Techniques

Chapter number

8

Pagination

133 - 148

Publisher

IGI Global

Place of publication

Hershey, Pa.

ISBN-13

9781466618336

Language

eng

Publication classification

B Book chapter; B1.1 Book chapter

Copyright notice

2012, IGI Global

Extent

21

Editor/Contributor(s)

S Kulkarni

Usage metrics

    Research Publications

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC