Deakin University
Browse

File(s) under permanent embargo

The development of an efficient checkpointing facility exploiting operating systems services of the GENESIS cluster operating system

journal contribution
posted on 2004-05-03, 00:00 authored by Justin Rough, Andrzej GoscinskiAndrzej Goscinski
Recent research efforts of parallel processing on non-dedicated clusters have focused on high execution performance, parallelism management, transparent access to resources, and making clusters easy to use. However, as a collection of independent computers used by multiple users, clusters are susceptible to failure. This paper shows the development of a coordinated checkpointing facility for the GENESIS cluster operating system. This facility was developed by exploiting existing operating system services. High performance and low overheads are achieved by allowing the processes of a parallel application to continue executing during the creation of checkpoints, while maintaining low demands on cluster resources by using coordinated checkpointing.

History

Journal

Future generation computer systems

Volume

20

Issue

4

Pagination

523 - 538

Publisher

Elsevier BV

Location

Amsterdam, Netherlands

ISSN

0167-739X

eISSN

1872-7115

Language

eng

Notes

Available online 19 September 2003.

Publication classification

C1 Refereed article in a scholarly journal

Copyright notice

2003, Elsevier B.V.