Deakin University
Browse

File(s) under permanent embargo

Aggregation on the fly: Reducing traffic for big data in the cloud

Version 2 2024-06-13, 09:38
Version 1 2016-01-18, 14:56
journal contribution
posted on 2024-06-13, 09:38 authored by H Ke, P Li, S Guo, I Stojmenovic
As a leading framework for processing and analyzing big data, MapReduce is leveraged by many enterprises to parallelize their data processing on distributed computing systems. Unfortunately, the all-to-all data forwarding from map tasks to reduce tasks in the traditional MapReduce framework would generate a large amount of network traffic. The fact that the intermediate data generated by map tasks can be combined with significant traffic reduction in many applications motivates us to propose a data aggregation scheme for MapReduce jobs in cloud. Specifically, we design an aggregation architecture under the existing MapReduce framework with the objective of minimizing the data traffic during the shuffle phase, in which aggregators can reside anywhere in the cloud. Some experimental results also show that our proposal outperforms existing work by reducing the network traffic significantly.

History

Journal

IEEE Network

Volume

29

Pagination

17-23

Location

New York, N.Y.

ISSN

0890-8044

Language

eng

Publication classification

C Journal article, C1.1 Refereed article in a scholarly journal

Copyright notice

2015, IEEE

Issue

5

Publisher

IEEE