Deakin University
Browse

File(s) under permanent embargo

Transparent and autonomic rollback-recovery in cluster systems

conference contribution
posted on 2008-01-01, 00:00 authored by Andrew Scott Maloney, Andrzej GoscinskiAndrzej Goscinski
Cluster systems provide an excellent environment to run computation hungry applications. However, due to being created using commodity components they are prone to failures. To overcome these failures we propose to use rollback-recovery, which consists of the checkpointing and recovery facilities. Checkpointing facilities have been the focus of many previous studies; however, the recovery facilities have been overlooked. This paper focuses on the requirements, concept and architecture of recovery facilities. The synthesized fault tolerant system was implemented in the GENESIS system and evaluated. The results show that the synthesized system is efficient and scalable.

History

Event

IEEE International Conference on Parallel and Distributed Systems (14th : 2008 : Melbourne, Vic.)

Pagination

541 - 548

Publisher

IEEE Computer Society

Location

Melbourne, Vic.

Place of publication

Piscataway, N.J.

Start date

2008-12-08

End date

2008-12-10

ISBN-13

9780769534343

Language

eng

Publication classification

E1 Full written paper - refereed

Copyright notice

2008, IEEE

Editor/Contributor(s)

M Hobbs, Y Xiang, W Zhou

Title of proceedings

ICPADS 2008 : Proceedings of the 14th International Conference on Parallel and Distributed Systems