OWDEAH: Online Web Data Extraction based on Access History

Li, Zhao, Ng, Wee-Keong and Ong, Kok-Leong 2004, OWDEAH: Online Web Data Extraction based on Access History, Lecture notes in computer science, vol. 3181, pp. 269-278.

Attached Files
Name Description MIMEType Size Downloads

Title OWDEAH: Online Web Data Extraction based on Access History
Author(s) Li, Zhao
Ng, Wee-Keong
Ong, Kok-Leong
Journal name Lecture notes in computer science
Volume number 3181
Start page 269
End page 278
Publisher Springer-Verlag
Place of publication Heidelberg, Germany
Publication date 2004
ISSN 0302-9743
1611-3349
Summary Web data extraction systems are the kernel of information mediators between users and heterogeneous Web data resources. How to extract structured data from semi-structured documents has been a problem of active research. Supervised and unsupervised methods have been devised to learn extraction rules from training sets. However, trying to prepare training sets (especially to annotate them for supervised methods), is very time-consuming. We propose a framework for Web data extraction, which logged usersrsquo access history and exploit them to assist automatic training set generation. We cluster accessed Web documents according to their structural details; define criteria to measure the importance of sub-structures; and then generate extraction rules. We also propose a method to adjust the rules according to historical data. Our experiments confirm the viability of our proposal.
Language eng
Field of Research 080604 Database Management
HERDC Research category C1 Refereed article in a scholarly journal
Copyright notice ©2004, Springer-Verlag
Persistent URL http://hdl.handle.net/10536/DRO/DU:30008664

Document type: Journal Article
Collection: School of Information Technology
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Access Statistics: 347 Abstract Views, 0 File Downloads  -  Detailed Statistics
Created: Mon, 13 Oct 2008, 15:38:28 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.