Query-driven frequent co-occurring term computation over relational data using MapReduce

Li, Jianxin, Liu, Chengfei, Zhou, Rui and Yu, Jeffrey Xu 2015, Query-driven frequent co-occurring term computation over relational data using MapReduce, Computer journal, vol. 58, no. 3, pp. 497-513, doi: 10.1093/comjnl/bxu090.

Attached Files
Name Description MIMEType Size Downloads

Title Query-driven frequent co-occurring term computation over relational data using MapReduce
Author(s) Li, JianxinORCID iD for Li, Jianxin orcid.org/0000-0002-9059-330X
Liu, Chengfei
Zhou, Rui
Yu, Jeffrey Xu
Journal name Computer journal
Volume number 58
Issue number 3
Start page 497
End page 513
Total pages 17
Publisher Oxford University Press
Place of publication Oxford, Eng.
Publication date 2015
Keyword(s) data analytics
term co-occurrence
structured data
MapReduce
Science & Technology
Technology
Computer Science, Hardware & Architecture
Computer Science, Information Systems
Computer Science, Software Engineering
Computer Science, Theory & Methods
Computer Science
Summary Given a keyword query q and a large structured, traditional keyword search may return a large number of relevant results to users, which leads to a frustrating procedure for the users to select their interesting results. To help users understand the data to be searched, in this work we investigate the problem of frequent co-occurring terms (FCTs) in large relational data. By returning a set of most FCTs with the given keywords, we can provide a chance for users to see a big picture of relevant data information. The investigation of FCT problem is also one of the fundamental building blocks of data mining because the discovered FCTs can be employed to analyze the topics or contexts of user interest. Although the problem of FCTs computation was proposed and investigated in Tao and Yu [(2009) Finding Frequent Co-Occurring Terms in Relational Keyword Search. 12th Int. Conf. Extending Database Technology EDBT, Saint-Petersburg, Russia, March 23–26, pp. 839–850. ACM, New York, USA], further investigation is needed to improve the performance because FCT computation is very expensive. Especially for the increasing volume of data, the centralized approach in Tao and Yu [(2009) Finding Frequent Co-Occurring Terms in Relational Keyword Search. 12th Int. Conf. Extending Database Technology EDBT, Saint-Petersburg, Russia, March 23–26, pp. 839–850. ACM, New York, USA] may incur a big challenge on the efficiency of performing an FCT computation. To do this, we investigate how to perform parallel FCT computation using MapReduce which is a well-accepted framework for data-intensive applications over clusters of computers. We design an effective mapping mechanism that exploits the approximately maximal workload of FCT computation for balancing the computational cost of each processor, while reducing the shuffling cost and avoiding the data-skewness. Analytical and experimental evaluations demonstrate the efficiency and scalability of our proposed approach using TPC-H benchmark datasets with different sizes.
Language eng
DOI 10.1093/comjnl/bxu090
Field of Research 08 Information and Computing Sciences
HERDC Research category C1.1 Refereed article in a scholarly journal
Copyright notice ©2014, The British Computer Society
Persistent URL http://hdl.handle.net/10536/DRO/DU:30116208

Document type: Journal Article
Collection: School of Information Technology
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 0 times in TR Web of Science
Scopus Citation Count Cited 1 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 7 Abstract Views, 2 File Downloads  -  Detailed Statistics
Created: Thu, 20 Dec 2018, 18:11:06 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.