Deakin University
Browse

File(s) under permanent embargo

Distributed pregel-based provenance-aware regular path query processing on RDF knowledge graphs

journal contribution
posted on 2020-01-01, 00:00 authored by X Wang, S Wang, Y Xin, Y Yang, Jianxin LiJianxin Li
With the proliferation of knowledge graphs, massive RDF graphs have been published on the Web. As an essential type of queries for RDF graphs, Regular Path Queries (RPQs) have been attracting increasing research efforts. However, the existing query processing approaches mainly focus on RPQs under the standard semantics, which cannot provide the provenance of the answer sets. We propose a distributed Pregel-based approach DP2RPQ to evaluating provenance-aware RPQs over big RDF graphs. Our method employs Glushkov automata to keep track of matching processes of RPQs in parallel. Meanwhile, three optimization strategies are devised according to the cost model, including vertex-computation optimization, message-communication reduction, and counting-paths alleviation, which can reduce the intermediate results of the basic DP2RPQ algorithm dramatically and overcome the counting-paths problem to some extent. The proposed algorithms are verified by extensive experiments on both synthetic and real-world datasets, which show that our approach can efficiently answer the provenance-aware RPQs over large RDF graphs. Furthermore, the RPQ semantics of DP2RPQ is richer than that of RDFPath, and the performance of DP2RPQ is still far better than that of RDFPath.

History

Journal

World wide web

Volume

23

Pagination

1465 - 1496

Publisher

Springer

Location

Cham, Switzerland

ISSN

1386-145X

eISSN

1573-1413

Language

eng

Publication classification

C1 Refereed article in a scholarly journal