Deakin University
Browse

File(s) under permanent embargo

Linked data triples enhance document relevance classification

journal contribution
posted on 2023-10-23, 23:39 authored by Dinesh Nagumothu, Peter EklundPeter Eklund, Bahadorreza OfoghiBahadorreza Ofoghi, Mohamed Reda BouadjenekMohamed Reda Bouadjenek
Standardized approaches to relevance classification in information retrieval use generative statistical models to identify the presence or absence of certain topics that might make a document relevant to the searcher. These approaches have been used to better predict relevance on the basis of what the document is “about”, rather than a simple-minded analysis of the bag of words contained within the document. In more recent times, this idea has been extended by using pre-trained deep learning models and text representations, such as GloVe or BERT. These use an external corpus as a knowledge-base that conditions the model to help predict what a document is about. This paper adopts a hybrid approach that leverages the structure of knowledge embedded in a corpus. In particular, the paper reports on experiments where linked data triples (subject-predicate-object), constructed from natural language elements are derived from deep learning. These are evaluated as additional latent semantic features for a relevant document classifier in a customized news-feed website. The research is a synthesis of current thinking in deep learning models in NLP and information retrieval and the predicate structure used in semantic web research. Our experiments indicate that linked data triples increased the F-score of the baseline GloVe representations by 6% and show significant improvement over state-of-the art models, like BERT. The findings are tested and empirically validated on an experimental dataset and on two standardized pre-classified news sources, namely the Reuters and 20 News groups datasets.

History

Journal

Applied sciences

Volume

11

Season

Special Issue: Advances in Artificial Intelligence: Machine Learning, Data Mining and Data Sciences

Article number

6636

Pagination

Jan-21

Location

Basel, Switzerland

eISSN

2076-3417

Language

eng

Publication classification

C1 Refereed article in a scholarly journal