Cluster systems are becoming more prevalent in today’s computer society and users are beginning to request that these systems be reliable. Currently, most clusters have been designed to provide high performance at the cost of providing little to no reliability. To combat this, this report looks at how a recovery facility, based on either a centralised or distributed approach could be implemented into a cluster that is supported by a check pointing facility. This recovery facility can then recover failed user processes by using checkpoints of the processes that have been taken during failure free execution.