Deakin University
Browse

File(s) under permanent embargo

Semantic entity identification in large scale data via statistical features and DT-SVM

conference contribution
posted on 2013-11-18, 00:00 authored by D Wang, Xiao LiuXiao Liu, H Luo, J Fan
Semantic entities carry the most important semantics of text data. However, traditional approaches such as named entity recognition and new word identification may only detect some specific types of entities. In addition, they generally adopt sequence annotation algorithms such as Hidden Markov Model (HMM) and Conditional Random Field (CRF) which can only utilize limited context information. As a result, they are inefficient on the extraction of semantic entities that were never shown in the training data. In this paper we propose a strategy to extract unknown text semantic entities by integrating statistical features, Decision Tree (DT), and Support Vector Machine (SVM) algorithms. With the proposed statistical features and novel classification approach, our strategy can detect more semantic entities than traditional approaches such as CRF and Bootstrapping-SVM methods. It is very sensitive to new entities that just appear in fresh data. Our experimental results have shown that the precision, recall rate and F-One rate of our strategy are about 23.6%, 21.5% and 25.8% higher than that of the representative approaches on average.

History

Volume

8180

Pagination

354-367

Location

Nanjing, China

Start date

2013-10-13

End date

2013-10-15

ISSN

0302-9743

eISSN

1611-3349

ISBN-13

9783642412295

Language

eng

Publication classification

E Conference publication, E1.1 Full written paper - refereed

Copyright notice

2013, Springer Verlag

Editor/Contributor(s)

Lin X, Manolopoulos Y, Srivastava D, Huang G

Title of proceedings

WISE 2013 : Proceedings of the 14th International Conference on Web Information Systems Engineering, Part 1

Event

Web Information Systems Engineering. International Conference (14th : 2013 : Nanjing, China)

Issue

PART 1

Publisher

Springer Verlag

Place of publication

Berlin, Germany

Series

Lecture Notes in Computer Science

Usage metrics

    Research Publications

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC