Detecting System Failures with GPUs and LLVM

抄録

Since system failures cause a huge financial loss, they should be detected as early and accurately as possible and then be recovered rapidly. To detect system failures, there are mainly two methods: black-box and white-box monitoring. However, external black-box monitoring cannot obtain detailed information on system failures, while internal white-box one is largely affected by system failures. This paper proposes GPUSentinel for more reliable white-box monitoring using general-purpose GPUs. In GPUSentinel, system monitors running in a GPU analyze main memory and indirectly obtain the state of the target system. Since GPUs are isolated from the target system, system monitors are not easily affected by system failures. For easy development of system monitors, GPUSentinel provides a development environment including program transformation with LLVM. In addition, it also provides reliable notification mechanisms to remote hosts. We have implemented GPUSentinel using CUDA and the Linux kernel and confirmed that GPUSentinel could detect three types of system failures.

10th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys 2019), August 19 - 20, 2019 , Hangzhou, China

収録刊行物

詳細情報 詳細情報について

問題の指摘

ページトップへ