An execution time prediction analytical model for GPU with instruction-level and thread-level parallelism awareness

Luo Cheng, Reiji Suda

Even with a powerful hardware in parallel execution, it is still difficult to improve the application performance without realizing the performance bottlenecks of parallel programs on GPU architectures. To help programmers have a better insight into the performance bottlenecks of parallel applications on GPU architectures, we propose an analytical model that estimates the execution time of massively parallel programs which take the instruction-level and thread-level parallelism into consideration. Our model contains two components: memory sub-model and computation sub-model. The memory sub-model is estimating the cost of memory instructions by considering the number of active threads and GPU memory bandwidth. Correspondingly, the computation sub-model is estimating the cost of computation instructions by considering the number of active threads and the application's arithmetic intensity. We use ocelot1) to analysis PTX codes to obtain several input parameters for the two sub-models such as the memory transaction number and data size. Basing on the two submodels, the analytical model can estimates the cost of each instruction while considering instruction-level and thread-level parallelism, thereby estimating the overall execution time of an application. We compare the outcome from the model and the actual execution in GTX260; and the results show that the model can reach 90 percentage accuracy in average for the benchmarks we used.Even with a powerful hardware in parallel execution, it is still difficult to improve the application performance without realizing the performance bottlenecks of parallel programs on GPU architectures. To help programmers have a better insight into the performance bottlenecks of parallel applications on GPU architectures, we propose an analytical model that estimates the execution time of massively parallel programs which take the instruction-level and thread-level parallelism into consideration. Our model contains two components: memory sub-model and computation sub-model. The memory sub-model is estimating the cost of memory instructions by considering the number of active threads and GPU memory bandwidth. Correspondingly, the computation sub-model is estimating the cost of computation instructions by considering the number of active threads and the application's arithmetic intensity. We use ocelot1) to analysis PTX codes to obtain several input parameters for the two sub-models such as the memory transaction number and data size. Basing on the two submodels, the analytical model can estimates the cost of each instruction while considering instruction-level and thread-level parallelism, thereby estimating the overall execution time of an application. We compare the outcome from the model and the actual execution in GTX260; and the results show that the model can reach 90 percentage accuracy in average for the benchmarks we used.

An execution time prediction analytical model for GPU with instruction-level and thread-level parallelism awareness

この論文をさがす

抄録

収録刊行物

詳細情報詳細情報について

書き出し

問題の指摘

An execution time prediction analytical model for GPU with instruction-level and thread-level parallelism awareness

この論文をさがす

抄録

収録刊行物

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について