A general communication cost optimization framework for big data stream processing in geo-distributed data centers

Gu, Lin, Zeng, Deze, Guo, Song, Xiang, Yong and Hu, Jiankun 2016, A general communication cost optimization framework for big data stream processing in geo-distributed data centers, IEEE transactions on computers, vol. 65, no. 1, pp. 19-29, doi: 10.1109/TC.2015.2417566.

Attached Files
Name Description MIMEType Size Downloads

Title A general communication cost optimization framework for big data stream processing in geo-distributed data centers
Author(s) Gu, Lin
Zeng, Deze
Guo, Song
Xiang, YongORCID iD for Xiang, Yong orcid.org/0000-0003-3545-7863
Hu, Jiankun
Journal name IEEE transactions on computers
Volume number 65
Issue number 1
Start page 19
End page 29
Total pages 11
Publisher IEEE
Place of publication Piscataway, N.J.
Publication date 2016-01-01
ISSN 0018-9340
Keyword(s) Science & Technology
Technology
Computer Science, Hardware & Architecture
Engineering, Electrical & Electronic
Computer Science
Engineering
Big data
stream processing
network cost minimization
VM placement
geo-distributed data centers
VIRTUAL MACHINE PLACEMENT
EFFICIENCY
Summary With the explosion of big data, processing large numbers of continuous data streams, i.e., big data stream processing (BDSP), has become a crucial requirement for many scientific and industrial applications in recent years. By offering a pool of computation, communication and storage resources, public clouds, like Amazon's EC2, are undoubtedly the most efficient platforms to meet the ever-growing needs of BDSP. Public cloud service providers usually operate a number of geo-distributed datacenters across the globe. Different datacenter pairs are with different inter-datacenter network costs charged by Internet Service Providers (ISPs). While, inter-datacenter traffic in BDSP constitutes a large portion of a cloud provider's traffic demand over the Internet and incurs substantial communication cost, which may even become the dominant operational expenditure factor. As the datacenter resources are provided in a virtualized way, the virtual machines (VMs) for stream processing tasks can be freely deployed onto any datacenters, provided that the Service Level Agreement (SLA, e.g., quality-of-information) is obeyed. This raises the opportunity, but also a challenge, to explore the inter-datacenter network cost diversities to optimize both VM placement and load balancing towards network cost minimization with guaranteed SLA. In this paper, we first propose a general modeling framework that describes all representative inter-task relationship semantics in BDSP. Based on our novel framework, we then formulate the communication cost minimization problem for BDSP into a mixed-integer linear programming (MILP) problem and prove it to be NP-hard. We then propose a computation-efficient solution based on MILP. The high efficiency of our proposal is validated by extensive simulation based studies.
Language eng
DOI 10.1109/TC.2015.2417566
Field of Research 080199 Artificial Intelligence and Image Processing not elsewhere classified
0803 Computer Software
0805 Distributed Computing
1006 Computer Hardware
Socio Economic Objective 890201 Application Software Packages (excl. Computer Games)
HERDC Research category C1 Refereed article in a scholarly journal
ERA Research output type C Journal article
Copyright notice ©2016, IEEE
Persistent URL http://hdl.handle.net/10536/DRO/DU:30080752

Document type: Journal Article
Collections: School of Information Technology
2018 ERA Submission
Connect to link resolver
 
Unless expressly stated otherwise, the copyright for items in DRO is owned by the author, with all rights reserved.

Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 32 times in TR Web of Science
Scopus Citation Count Cited 32 times in Scopus
Google Scholar Search Google Scholar
Access Statistics: 335 Abstract Views, 6 File Downloads  -  Detailed Statistics
Created: Wed, 20 Jan 2016, 10:13:52 EST

Every reasonable effort has been made to ensure that permission has been obtained for items included in DRO. If you believe that your rights have been infringed by this repository, please contact drosupport@deakin.edu.au.