Deakin University
Browse

File(s) under permanent embargo

A data dependency based strategy for intermediate data storage in scientific cloud workflow systems

journal contribution
posted on 2012-06-01, 00:00 authored by D Yuan, Y Yang, Xiao LiuXiao Liu, G Zhang, J Chen
Many scientific workflows are data intensive where large volumes of intermediate data are generated during their execution. Some valuable intermediate data need to be stored for sharing or reuse. Traditionally, they are selectively stored according to the system storage capacity, determined manually. As doing science in the cloud has become popular nowadays, more intermediate data can be stored in scientific cloud workflows based on a pay-for-use model. In this paper, we build an intermediate data dependency graph (IDG) from the data provenance in scientific workflows. With the IDG, deleted intermediate data can be regenerated, and as such we develop a novel intermediate data storage strategy that can reduce the cost of scientific cloud workflow systems by automatically storing appropriate intermediate data sets with one cloud service provider. The strategy has significant research merits, i.e. it achieves a cost-effective trade-off of computation cost and storage cost and is not strongly impacted by the forecasting inaccuracy of data sets' usages. Meanwhile, the strategy also takes the users' tolerance of data accessing delay into consideration. We utilize Amazon's cost model and apply the strategy to general random as well as specific astrophysics pulsar searching scientific workflows for evaluation. The results show that our strategy can reduce the overall cost of scientific cloud workflow execution significantly.

History

Journal

Concurrency computation : practice and experience

Volume

24

Season

Special issue

Pagination

956-976

Location

Chichester, Eng.

ISSN

1532-0626

eISSN

1532-0634

Language

eng

Publication classification

C1.1 Refereed article in a scholarly journal, C Journal article

Copyright notice

2010, John Wiley & Sons Ltd

Issue

9

Publisher

Wiley-Blackwell