正方行列向け特異値分解のCUDAによる高速化

深谷, 猛, 山本, 有作, 畝山, 多加志, 中村, 佳正

本論文では GPGPU 向けの統合開発環境 CUDA を用いた，正方行列の特異値分解の高速化について報告する．正方行列の特異値分解では，計算対象の行列を二重対角行列に変換してから特異値分解を行い，その後逆変換を行うことで，もとの行列の特異値分解を得る．本論文では CUDA の BLAS ライブラリ（CUBLAS）の中の高性能な SGEMM （行列乗算ルーチン）を効率的に利用することで，比較的少ないコストで大幅な高速化を行うことを目指し，演算の大部分が BLAS によって行われる二重対角化と逆変換部分を GPU を用いて高速化した．実装にあたっては，行列乗算を中心に二重対角化が可能な Bischof の手法が GPU 向けに適していることを簡単な性能予測を通して確認し，この手法を採用した．また，各計算ステップにおける CPU と GPU との仕事の適切な分担や計算のオーバラップについても考慮した．GPU として NVIDIA の GeForce8800 GTX を用いた性能評価の結果，CPU （Intel Core2 Duo 1.86GHz 2 コア使用）のみで計算する場合と比べて，5,120 次元の正方行列の特異値分解の計算が約 4 倍高速化できることを確認した．

In this paper, we report the result of acceleration of computing the singular value decomposition (SVD) for a square matrix using CUDA, which is an integrated development environment for GPGPU. Computing of the SVD for a square matrix consists of the following three parts: bidiagonalization of the input matrix, the SVD of the bidiagonal matrix, and inverse transformation. Among them, we accelerate the first and the third step using GPU. This is because it is easy to use the CUBLAS, the BLAS library provided in CUDA, in these two steps. Through simple performance prediction, we assessed that the Bischof's method, in which bidiagonalization can be computed with matrix multiplications, is effective for computation using GPU. Therefore we implemented the algorithm for the SVD based on such method. When computing the SVD of a 5,120×5,120 matrix, we obtained about four times speedup using a GPU over using only a CPU (Intel Core2 Duo, 1.86 GHz, using 2 cores).

正方行列向け特異値分解のCUDAによる高速化

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (1)*注記

関連プロジェクト

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

正方行列向け特異値分解のCUDAによる高速化

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (1)*注記

関連プロジェクト

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について