A Concurrent Partial Snapshot Algorithm for Large-Scale and Dynamic Distributed Systems

抄録

Checkpoint-rollback recovery, which is a universal method for restoring distributed systems after faults, requires a sophisticated snapshot algorithm especially if the systems are large-scale, since repeatedly taking global snapshots of the whole system requires unacceptable communication cost. As a sophisticated snapshot algorithm, a partial snapshot algorithm has been introduced that takes a snapshot of a subsystem consisting only of the nodes that are communication-related to the initiator instead of a global snapshot of the whole system. In this paper, we modify the previous partial snapshot algorithm to create a new one that can take a partial snapshot more efficiently, especially when multiple nodes concurrently initiate the algorithm. Experiments show that the proposed algorithm greatly reduces the amount of communication needed for taking partial snapshots.

収録刊行物

被引用文献 (3)*注記

もっと見る

参考文献 (11)*注記

もっと見る

関連プロジェクト

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ