A survey of checkpoint overhead reduction methods

Document Type : Review Article

Authors

Department of Electrical and Computer Engineering, Faculty of Engineering, Kharazmi University, Tehran, Iran

Abstract

Nowadays, fault tolerance in different systems is a very essential factor. Using checkpointing methods and safe spots for recovery after faults occur can increase the reliability and dependability of systems. The main issue with using checkpointing methods is their overhead. This overhead made as a result of checkpointing execution and it has negative impact on system performance. Therefore, numerous approaches and methods have been introduced to address this problem. These approaches and methods aim to reduce the overhead in order to increase system performance. this paper, thoroughly studied and reviewed various checkpointing methods. These methods organized into distinct groups. Then, determine These groups based on the type of checkpointing execution and the different systems levels. Those are such as: coordinated checkpointing, system-level checkpointing, application-level checkpointing, and distributed system checkpointing. Finally, this paper provides a detailed summary in a Comprehensive graph and conclusion for each of these groups.

Keywords

Main Subjects



Articles in Press, Accepted Manuscript
Available Online from 29 October 2024
  • Receive Date: 27 April 2024
  • Revise Date: 23 June 2024
  • Accept Date: 27 August 2024