Leveraging visual features and hierarchical dependencies for conference information extraction
conference contribution
posted on 2013-04-10, 00:00 authored by Y You, G Xu, J Cao, Y Zhang, Guangyan HuangGuangyan HuangTraditional information extraction methods mainly rely on visual feature assisted techniques; but without considering the hierarchical dependencies within the paragraph structure, some important information is missing. This paper proposes an integrated approach for extracting academic information from conference Web pages. Firstly, Web pages are segmented into text blocks by applying a new hybrid page segmentation algorithm which combines visual feature and DOM structure together. Then, these text blocks are labeled by a Tree-structured Random Fields model, and the block functions are differentiated using various features such as visual features, semantic features and hierarchical dependencies. Finally, an additional post-processing is introduced to tune the initial annotation results. Our experimental results on real-world data sets demonstrated that the proposed method is able to effectively and accurately extract the needed academic information from conference Web pages. © 2013 Springer-Verlag.
History
Volume
7808Pagination
404-416Location
Sydney, N.S.W.Start date
2013-04-04End date
2013-04-06ISSN
0302-9743eISSN
1611-3349ISBN-13
9783642374012Language
engPublication classification
E Conference publication, E1.1 Full written paper - refereedCopyright notice
2013, SpringerEditor/Contributor(s)
Ishikawa Y, Li J, Wang W, Zhang RTitle of proceedings
Web Technologies and ApplicationsEvent
Asia-Pacific Web Conference on Web Technologies and Applications (15th : 2013 : Sydney, N.S.W.)Publisher
SpringerPlace of publication
Berlin, GermanySeries
Lecture Notes in Computer ScienceUsage metrics
Categories
No categories selectedKeywords
Licence
Exports
RefWorksRefWorks
BibTeXBibTeX
Ref. managerRef. manager
EndnoteEndnote
DataCiteDataCite
NLMNLM
DCDC