Linked data triples enhance document relevance classification
Nagumothu, Dinesh, Eklund, Peter, Ofoghi, Bahadorreza and Bouadjenek, Mohamed Reda 2021, Linked data triples enhance document relevance classification, Applied sciences, vol. 11, no. 14, Special Issue: Advances in Artificial Intelligence: Machine Learning, Data Mining and Data Sciences, pp. 1-21, doi: 10.3390/app11146636.
Attached Files
Name
Description
MIMEType
Size
Downloads
Title
Linked data triples enhance document relevance classification
Special Issue: Advances in Artificial Intelligence: Machine Learning, Data Mining and Data Sciences
Article ID
6636
Start page
1
End page
21
Total pages
21
Publisher
MDPI AG
Place of publication
Basel, Switzerland
Publication date
2021
ISSN
2076-3417
Summary
Standardized approaches to relevance classification in information retrieval use generative statistical models to identify the presence or absence of certain topics that might make a document relevant to the searcher. These approaches have been used to better predict relevance on the basis of what the document is “about”, rather than a simple-minded analysis of the bag of words contained within the document. In more recent times, this idea has been extended by using pre-trained deep learning models and text representations, such as GloVe or BERT. These use an external corpus as a knowledge-base that conditions the model to help predict what a document is about. This paper adopts a hybrid approach that leverages the structure of knowledge embedded in a corpus. In particular, the paper reports on experiments where linked data triples (subject-predicate-object), constructed from natural language elements are derived from deep learning. These are evaluated as additional latent semantic features for a relevant document classifier in a customized news-feed website. The research is a synthesis of current thinking in deep learning models in NLP and information retrieval and the predicate structure used in semantic web research. Our experiments indicate that linked data triples increased the F-score of the baseline GloVe representations by 6% and show significant improvement over state-of-the art models, like BERT. The findings are tested and empirically validated on an experimental dataset and on two standardized pre-classified news sources, namely the Reuters and 20 News groups datasets.
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.
Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO.
If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.
Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.