A Concurrent Partial Snapshot Algorithm for Large-Scale and Dynamic Distributed Systems
-
- KIM Yonghwan
- Graduate School of Information Science and Technology, Osaka University
-
- ARARAGI Tadashi
- NTT Communication Science Laboratories, NTT Corporation
-
- NAKAMURA Junya
- Graduate School of Information Science and Technology, Osaka University
-
- MASUZAWA Toshimitsu
- Graduate School of Information Science and Technology, Osaka University
抄録
Checkpoint-rollback recovery, which is a universal method for restoring distributed systems after faults, requires a sophisticated snapshot algorithm especially if the systems are large-scale, since repeatedly taking global snapshots of the whole system requires unacceptable communication cost. As a sophisticated snapshot algorithm, a partial snapshot algorithm has been introduced that takes a snapshot of a subsystem consisting only of the nodes that are communication-related to the initiator instead of a global snapshot of the whole system. In this paper, we modify the previous partial snapshot algorithm to create a new one that can take a partial snapshot more efficiently, especially when multiple nodes concurrently initiate the algorithm. Experiments show that the proposed algorithm greatly reduces the amount of communication needed for taking partial snapshots.
収録刊行物
-
- IEICE Transactions on Information and Systems
-
IEICE Transactions on Information and Systems E97.D (1), 65-76, 2014
一般社団法人 電子情報通信学会
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1390282679355811456
-
- NII論文ID
- 130003385485
-
- ISSN
- 17451361
- 09168532
-
- 本文言語コード
- en
-
- データソース種別
-
- JaLC
- Crossref
- CiNii Articles
- KAKEN
-
- 抄録ライセンスフラグ
- 使用不可