Cloud data management for scientific workflows: research issues, methodologies, and state-of-the-art
conference contribution
posted on 2014-01-01, 00:00authored byD Yuan, L Cui, Xiao LiuXiao Liu
Data-intensive scientific applications are posing many challenges in distributed computing systems. In the scientific field, the application data are expected to double every year over the next decade and further. With this continuing data explosion, high performance computing systems are needed to store and process data efficiently, and workflow technologies are facilitated to automate these scientific applications. Scientific workflows are typically very complex. They usually have a large number of tasks and need a long time for execution. Running scientific workflow applications usually need not only high performance computing resources but also massive storage. The emergence of cloud computing technologies offers a new way to develop scientific workflow systems. Scientists can upload their data and launch their applications on the scientific cloud workflow systems from everywhere in the world via the Internet, and they only need to pay for the resources that they use for their applications. As all the data are managed in the cloud, it is easy to share data among scientists. This kind of model is very convenient for users, but remains a big challenge to the system. This paper proposes several research topics of data management in scientific cloud workflow systems, and discusses their research methodologies and state-of-the-art solutions.