Deakin University
Browse

File(s) under permanent embargo

Fast ELCA computation for keyword queries on XML data

conference contribution
posted on 2010-01-01, 00:00 authored by R Zhou, C Liu, Jianxin LiJianxin Li
Keyword search is integrated in many applications on account of the convenience to convey users' query intention. Recently, answering keyword queries on XML data has drawn the attention of web and database communities, because the success of this research will relieve users from learning complex XML query languages, such as XPath/XQuery, and/or knowing the underlying schema of the queried XML data. As a result, information in XML data can be discovered much easier. To model the result of answering keyword queries on XML data, many LCA (lowest common ancestor) based notions have been proposed. In this paper, we focus on ELCA (Exclusive LCA) semantics, which is first proposed by Guo et al. and afterwards named by Xu and Papakonstantinou. We propose an algorithm named Hash Count to find ELCAs efficiently. Our analysis shows the complexity of Hash Count algorithm is O(kd|S1|), where k is the number of keywords, d is the depth of the queried XML document and |S1| is the frequency of the rarest keyword. This complexity is the best result known so far. We also evaluate the algorithm on a real DBLP dataset, and compare it with the state-of-the-art algorithms. The experimental results demonstrate the advantage of Hash Count algorithm in practice.

History

Pagination

549-560

Location

Lausanne, Switzerland

Start date

2010-03-22

End date

2010-03-26

ISBN-13

9781605589459

Language

eng

Publication classification

E1.1 Full written paper - refereed

Copyright notice

2010, ACM

Editor/Contributor(s)

Manolescu I, Spaccapietra S, Teubner J, Kitsuregawa M, Leger A, Naumann F, Ailamaki A, Ozcan F

Title of proceedings

EDBT 2010 : Proceedings of the 13th International Conference on Extending Database Technology

Event

Association for Computing Machinery. Conference (13th : 2010 : Lausanne, Switzerland)

Publisher

Association for Computing Machinery

Place of publication

New York, N.Y.

Series

Association for Computing Machinery Conference

Usage metrics

    Research Publications

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC