An Acceleration Method for GPU-Based Volume Rendering by Localizing Texture Memory Reference

Yuki Sugimoto, Fumihiko Ino, Kenichi Hagihara

This paper presents a cache-aware method for accelerating texture-based volume rendering on the graphics processing unit (GPU). Because GPUs have a hierarchical architecture in terms of processing and memory units, cache optimization is important to maximize their effective performance for this kind of memory-intensive applications. To accomplish this, our method localizes texture memory reference according to the location of the viewpoint. The key idea for this localization is to dynamically select the width and height of thread blocks (TBs) such that each warp, which is a series of 32 threads simultaneously processed on the GPU, can minimize the stride of memory access. We also incorporate transposed indexing of threads to perform TB-level cache optimization for specific viewpoints. Furthermore, we maximize the TB size so that the spatial locality can be exploited with less active TBs. For relatively large stride, we synchronize threads of the same TB at regular intervals to realize synchronous ray propagation. In experiments using a GeForce GTX 580 card, we find that our cache-aware method doubles the worst rendering performance, as compared with the original implementation provided by the CUDA and OpenCL software development kits (SDKs).This paper presents a cache-aware method for accelerating texture-based volume rendering on the graphics processing unit (GPU). Because GPUs have a hierarchical architecture in terms of processing and memory units, cache optimization is important to maximize their effective performance for this kind of memory-intensive applications. To accomplish this, our method localizes texture memory reference according to the location of the viewpoint. The key idea for this localization is to dynamically select the width and height of thread blocks (TBs) such that each warp, which is a series of 32 threads simultaneously processed on the GPU, can minimize the stride of memory access. We also incorporate transposed indexing of threads to perform TB-level cache optimization for specific viewpoints. Furthermore, we maximize the TB size so that the spatial locality can be exploited with less active TBs. For relatively large stride, we synchronize threads of the same TB at regular intervals to realize synchronous ray propagation. In experiments using a GeForce GTX 580 card, we find that our cache-aware method doubles the worst rendering performance, as compared with the original implementation provided by the CUDA and OpenCL software development kits (SDKs).

An Acceleration Method for GPU-Based Volume Rendering by Localizing Texture Memory Reference

この論文をさがす

抄録

収録刊行物

詳細情報詳細情報について

書き出し

問題の指摘

An Acceleration Method for GPU-Based Volume Rendering by Localizing Texture Memory Reference

この論文をさがす

抄録

収録刊行物

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について