Extracting the semantic content of web pages via repeated structures

He, Zheng, Luo, Hangzai, Fan, Jianping and Liu, Xiao 2013, Extracting the semantic content of web pages via repeated structures, in ICME 2013 : Proceedings of the IEEE International Conference on Multimedia and Expo, IEEE, Piscataway, N.J., pp. 1-6, doi: 10.1109/ICMEW.2013.6618450.

Attached Files
Name Description MIMEType Size Downloads

Title Extracting the semantic content of web pages via repeated structures
Author(s) He, Zheng
Luo, Hangzai
Fan, Jianping
Liu, XiaoORCID iD for Liu, Xiao orcid.org/0000-0001-8400-5754
Conference name Multimedia and Expo. IEEE International Conference (2013 : San Jose, California)
Conference location San Jose, California
Conference dates 2013/07/15 - 2013/07/19
Title of proceedings ICME 2013 : Proceedings of the IEEE International Conference on Multimedia and Expo
Publication date 2013
Conference series IEEE International Conference on Multimedia and Expo
Start page 1
End page 6
Total pages 6
Publisher IEEE
Place of publication Piscataway, N.J.
Keyword(s) semantic modeling
web page
repeated structure
Summary Web pages may carry semantics that are very important to the authors and the readers. Due to many reasons, the authors may insert contents that are irrelevant to the underlying semantics of the page to different positions of the page, such as advertizements, guide bars, links. As a result, it may not lead good effect by using all the data of a web page to model its semantics. In this paper, we propose a framework that can extract the real semantic content from web pages via repeated structures of the HTML data. Our algorithm first detect the real semantic blocks in web pages via repeated structure segmentation, then extracts the real semantic content of the pages from real semantic blocks.
ISBN 9781479916047
Language eng
DOI 10.1109/ICMEW.2013.6618450
HERDC Research category E1.1 Full written paper - refereed
ERA Research output type E Conference publication
Copyright notice ©2013, Institute of Electrical and Electronics Engineers
Persistent URL http://hdl.handle.net/10536/DRO/DU:30097655

Document type: Conference Paper
Collection: School of Information Technology
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 0 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 70 Abstract Views, 4 File Downloads  -  Detailed Statistics
Created: Fri, 27 Oct 2017, 16:46:52 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.