Transparent and autonomic rollback-recovery in cluster systems
Maloney, Andrew and Goscinski, Andrzej 2008, Transparent and autonomic rollback-recovery in cluster systems, in ICPADS 2008 : Proceedings of the 14th International Conference on Parallel and Distributed Systems, IEEE Computer Society, Piscataway, N.J., pp. 541-548.
Attached Files
(Some files may be inaccessible until you login with your Deakin Research Online credentials)
Name
Description
MIMEType
Size
Downloads
Title
Transparent and autonomic rollback-recovery in cluster systems
Cluster systems provide an excellent environment to run computation hungry applications. However, due to being created using commodity components they are prone to failures. To overcome these failures we propose to use rollback-recovery, which consists of the checkpointing and recovery facilities. Checkpointing facilities have been the focus of many previous studies; however, the recovery facilities have been overlooked. This paper focuses on the requirements, concept and architecture of recovery facilities. The synthesized fault tolerant system was implemented in the GENESIS system and evaluated. The results show that the synthesized system is efficient and scalable.