You are not logged in.

Leveraging visual features and hierarchical dependencies for conference information extraction

You, Yue, Xu, Guandong, Cao, Jian, Zhang, Yanchun and Huang, Guangyan 2013, Leveraging visual features and hierarchical dependencies for conference information extraction, in Web Technologies and Applications: 15th Asia-Pacific Web Conference, APWeb 2013, Sydney, Australia, April 4-6, 2013. Proceedings, Springer, Berlin, Germany, pp. 404-416, doi: 10.1007/978-3-642-37401-2_41.

Attached Files
Name Description MIMEType Size Downloads

Title Leveraging visual features and hierarchical dependencies for conference information extraction
Author(s) You, Yue
Xu, Guandong
Cao, Jian
Zhang, Yanchun
Huang, Guangyan
Conference name Asia-Pacific Web Conference on Web Technologies and Applications (15th : 2013 : Sydney, N.S.W.)
Conference location Sydney, N.S.W.
Conference dates 4-6 Apr. 2013
Title of proceedings Web Technologies and Applications: 15th Asia-Pacific Web Conference, APWeb 2013, Sydney, Australia, April 4-6, 2013. Proceedings
Publication date 2013
Series Lecture Notes in Computer Science v.7808
Start page 404
End page 416
Total pages 13
Publisher Springer
Place of publication Berlin, Germany
Summary Traditional information extraction methods mainly rely on visual feature assisted techniques; but without considering the hierarchical dependencies within the paragraph structure, some important information is missing. This paper proposes an integrated approach for extracting academic information from conference Web pages. Firstly, Web pages are segmented into text blocks by applying a new hybrid page segmentation algorithm which combines visual feature and DOM structure together. Then, these text blocks are labeled by a Tree-structured Random Fields model, and the block functions are differentiated using various features such as visual features, semantic features and hierarchical dependencies. Finally, an additional post-processing is introduced to tune the initial annotation results. Our experimental results on real-world data sets demonstrated that the proposed method is able to effectively and accurately extract the needed academic information from conference Web pages. © 2013 Springer-Verlag.
ISBN 9783642374005
ISSN 0302-9743
1611-3349
Language eng
DOI 10.1007/978-3-642-37401-2_41
Field of Research 080109 Pattern Recognition and Data Mining
Socio Economic Objective 970108 Expanding Knowledge in the Information and Computing Sciences
HERDC Research category E1.1 Full written paper - refereed
ERA Research output type E Conference publication
Copyright notice ©2013, Springer
Persistent URL http://hdl.handle.net/10536/DRO/DU:30083693

Document type: Conference Paper
Collection: School of Information Technology
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 3 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 63 Abstract Views, 3 File Downloads  -  Detailed Statistics
Created: Mon, 30 May 2016, 15:57:41 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.