Openly accessible

Linked data triples enhance document relevance classification

Nagumothu, Dinesh, Eklund, Peter, Ofoghi, Bahadorreza and Bouadjenek, Mohamed Reda 2021, Linked data triples enhance document relevance classification, Applied sciences, vol. 11, no. 14, Special Issue: Advances in Artificial Intelligence: Machine Learning, Data Mining and Data Sciences, pp. 1-21, doi: 10.3390/app11146636.

Attached Files
Name Description MIMEType Size Downloads

Title Linked data triples enhance document relevance classification
Author(s) Nagumothu, Dinesh
Eklund, PeterORCID iD for Eklund, Peter orcid.org/0000-0003-2313-8603
Ofoghi, BahadorrezaORCID iD for Ofoghi, Bahadorreza orcid.org/0000-0003-0579-8018
Bouadjenek, Mohamed RedaORCID iD for Bouadjenek, Mohamed Reda orcid.org/0000-0003-1807-430X
Journal name Applied sciences
Volume number 11
Issue number 14
Season Special Issue: Advances in Artificial Intelligence: Machine Learning, Data Mining and Data Sciences
Article ID 6636
Start page 1
End page 21
Total pages 21
Publisher MDPI AG
Place of publication Basel, Switzerland
Publication date 2021
ISSN 2076-3417
Summary Standardized approaches to relevance classification in information retrieval use generative statistical models to identify the presence or absence of certain topics that might make a document relevant to the searcher. These approaches have been used to better predict relevance on the basis of what the document is “about”, rather than a simple-minded analysis of the bag of words contained within the document. In more recent times, this idea has been extended by using pre-trained deep learning models and text representations, such as GloVe or BERT. These use an external corpus as a knowledge-base that conditions the model to help predict what a document is about. This paper adopts a hybrid approach that leverages the structure of knowledge embedded in a corpus. In particular, the paper reports on experiments where linked data triples (subject-predicate-object), constructed from natural language elements are derived from deep learning. These are evaluated as additional latent semantic features for a relevant document classifier in a customized news-feed website. The research is a synthesis of current thinking in deep learning models in NLP and information retrieval and the predicate structure used in semantic web research. Our experiments indicate that linked data triples increased the F-score of the baseline GloVe representations by 6% and show significant improvement over state-of-the art models, like BERT. The findings are tested and empirically validated on an experimental dataset and on two standardized pre-classified news sources, namely the Reuters and 20 News groups datasets.
Language eng
DOI 10.3390/app11146636
Indigenous content off
HERDC Research category C1 Refereed article in a scholarly journal
Free to Read? Yes
Persistent URL http://hdl.handle.net/10536/DRO/DU:30153700

Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 0 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 20 Abstract Views, 0 File Downloads  -  Detailed Statistics
Created: Tue, 20 Jul 2021, 22:21:44 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.