A data placement strategy in scientific cloud workflows

Yuan, D; Yang, Y; Liu, Xiao; Chen, J

A data placement strategy in scientific cloud workflows

journal contribution

posted on 2010-10-01, 00:00 authored by D Yuan, Y Yang, Xiao LiuXiao Liu, J Chen

In scientific cloud workflows, large amounts of application data need to be stored in distributed data centres. To effectively store these data, a data manager must intelligently select data centres in which these data will reside. This is, however, not the case for data which must have a fixed location. When one task needs several datasets located in different data centres, the movement of large volumes of data becomes a challenge. In this paper, we propose a matrix based k-means clustering strategy for data placement in scientific cloud workflows. The strategy contains two algorithms that group the existing datasets in k data centres during the workflow build-time stage, and dynamically clusters newly generated datasets to the most appropriate data centresbased on dependenciesduring the runtime stage. Simulations show that our algorithm can effectively reduce data movement during the workflow's execution.

History

Journal

Future generation computer systems

Volume

26

Pagination

1200-1214

Location

Amsterdam, The Netherlands

Publisher DOI

https://doi.org/10.1016/j.future.2010.02.004

ISSN

0167-739X

Language

eng

Publication classification

C1.1 Refereed article in a scholarly journal

Copyright notice

2010, Elsevier

Issue

8

Publisher

Elsevier

Usage metrics

Keywords

data management scientific workflow cloud computing

A data placement strategy in scientific cloud workflows

History

Journal

Volume

Pagination

Location

Publisher DOI

ISSN

Language

Publication classification

Copyright notice

Issue

Publisher

Usage metrics

Categories

Keywords

Licence

Exports