Deakin University
Browse

File(s) under permanent embargo

A novel framework for semantic entity identification and relationship integration in large scale text data

journal contribution
posted on 2016-11-01, 00:00 authored by D Wang, Xiao LiuXiao Liu, H Luo, J Fan
Semantic entities carry the most important semantics of text data. Therefore, the identification and the relationship integration of semantic entities are very important for applications requiring semantics of text data. However, current strategies are still facing many problems such as semantic entity identification, new word identification and relationship integration among semantic entities. To address these problems, a two-phase framework for semantic entity identification with relationship integration in large scale text data is proposed in this paper. In the first semantic entities identification phase, we propose a novel strategy to extract unknown text semantic entities by integrating statistical features, Decision Tree (DT), and Support Vector Machine (SVM) algorithms. Compared with traditional approaches, our strategy is more effective in detecting semantic entities and more sensitive to new entities that just appear in the fresh data. After extracting the semantic entities, the second phase of our framework is for the integration of Semantic Entities Relationships (SER) which can help to cluster the semantic entities. A novel classification method using features such as similarity measures and co-occurrence probabilities is applied to tackle the clustering problem and discover the relationships among semantic entities. Comprehensive experimental results have shown that our framework can beat state-of-the-art strategies in semantic entity identification and discover over 80% relationship pairs among related semantic entities in large scale text data.

History

Journal

Future Generation Computer Systems

Volume

64

Pagination

198-210

Location

Amsterdam, The Netherlands

ISSN

0167-739X

eISSN

1872-7115

Language

English

Publication classification

C1.1 Refereed article in a scholarly journal, C Journal article

Copyright notice

2015, Elsevier B.V.

Publisher

ELSEVIER