Computing structural similarity of source XML schemas against domain XML schema

Li, Jianxin; Liu, Chengfei; Yu, Jeffrey Xu; Liu, Jixue; Wang, Guoren; Yang, Chi

File(s) under permanent embargo

Computing structural similarity of source XML schemas against domain XML schema

conference contribution

posted on 2008-01-01, 00:00 authored by Jianxin LiJianxin Li, Chengfei Liu, Jeffrey Xu Yu, Jixue Liu, Guoren Wang, Chi Yang

In this paper, we study the problem of measuring structural similarities of large number of source schemas against a single domain schema, which is useful for enhancing the quality of searching and ranking big volume of source documents on the Web with the help of structural information. After analyzing the improperness of adopting existing edit-distance based methods, we propose a new similarity measure model that caters for the requirements of the problem. Given the asymmetric nature of the similarity comparisons of source schemas with a domain schema, similarity preserving rules and algorithm are designed to filter out uninteresting elements in source schemas for the purpose of optimizing the similarity computation. Based on the model, a basic algorithm and an improved algorithm are developed for structural similarity computation. The improved algorithm makes full use of a new coding scheme that is devised to reduce the number of comparisons. Complexities of both algorithms are analyzed and extensive experiments are conducted showing the significant performance gain achieved by the improved algorithm.

History

Event

Australian Computer Society. Conference (19th : 2008 : Wollongong, N.S.W.)

Series

Australian Computer Society Conference

Pagination

155 - 164

Publisher

Australian Computer Society

Location

Wollongong, N.S.W.

Place of publication

Sydney, N.S.W.

Start date

2008-01-22

End date

2008-01-25

Language

eng

Publication classification

E1.1 Full written paper - refereed

Copyright notice

2008, Australian Computer Society, Inc.

Editor/Contributor(s)

A Fekete, X Lin

Title of proceedings

ADC 2008 : Proceedings of the Nineteenth Australasian Database Conference 2008

Usage metrics

Keywords

Structural Similarity XML Schema

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) under permanent embargo

Computing structural similarity of source XML schemas against domain XML schema

History

Event

Series

Pagination

Publisher

Location

Place of publication

Start date

End date

Language

Publication classification

Copyright notice

Editor/Contributor(s)

Title of proceedings

Usage metrics

Categories

Keywords

Licence

Exports