マルチコアアーキテクチャのための密行列LU分解のプログラミング技術  [in Japanese] On-chip Parallel Programming Techniques for Dense LU Decomposition  [in Japanese]

Access this Article

Search this Article

Abstract

近年,シングルコアプロセッサは消費電力と発熱の制限により性能限界に達したため,多数のプロセッサコアによって性能向上を図るマルチコア,メニーコアのプロセッサが主流となっている.マルチコアプロセッサの性能を引き出すためには,すべてのコアを無駄なく動作させるための並列性の確保と,多数のコアが同時アクセスすることで生じるメモリアクセスボトルネックの解消を同時に満たすことが課題となっている.密な線型方程式を効率良く解くLU分解は高性能計算の代表的なベンチマークとして知られている.これまで,LU分解の高速実行アルゴリズムとしては,並列処理を最大限に活用できるright-looking法が適しているといわれていた.しかしながら,マルチコアプロセッサにおいては演算性能に比べメモリ性能が相対的に低いため,データ転送量の多いright-looking法が必ずしも最大性能を示すとは限らない.本論文では,LU分解を題材に,参照局所性が高いleft-looking法が,最大並列性を実現するright-looking法よりも高性能を実現するマルチコアアーキテクチャの条件を,性能予測モデルとCell BEでの評価実験での結果をふまえて報告する.Recently, multicore processor architectures have been getting attention from the viewpoint of the balance of design complexity and CPU performance in the constraint of electronic power consumption and transistor size. In the multicore processor architectures, high performance computing requires not only parallelism to make use a number of cores but also efficient data transfer mechanism to avoid memory access bottleneck. Dense linear algebra LU decomposition is one of the well-known algorithms used for benchmarks in high performance computing. It is usually said that the right-looking method is better than the left-looking method due to the available parallelism in the LU decomposition. However, this is not always true in the multicore architectures due to the memory bandwidth bottleneck. In this paper, architectural conditions in which the left-looking method overperformed the right-looking method are described with performance estimation models and empirical evaluation on Cell BE.

Recently, multicore processor architectures have been getting attention from the viewpoint of the balance of design complexity and CPU performance in the constraint of electronic power consumption and transistor size. In the multicore processor architectures, high performance computing requires not only parallelism to make use a number of cores but also efficient data transfer mechanism to avoid memory access bottleneck. Dense linear algebra LU decomposition is one of the well-known algorithms used for benchmarks in high performance computing. It is usually said that the right-looking method is better than the left-looking method due to the available parallelism in the LU decomposition. However, this is not always true in the multicore architectures due to the memory bandwidth bottleneck. In this paper, architectural conditions in which the left-looking method overperformed the right-looking method are described with performance estimation models and empirical evaluation on Cell BE.

Journal

  • ACS

    ACS 3(3), 199-208, 2010-09-17

    情報処理学会

Cited by:  1

Keywords

Codes

  • NII Article ID (NAID)
    110007990320
  • NII NACSIS-CAT ID (NCID)
    AA11833852
  • Text Lang
    JPN
  • Article Type
    Journal Article
  • ISSN
    1882-7829
  • NDL Article ID
    024301622
  • NDL Call No.
    YH247-812
  • Data Source
    CJPref  NDL  NII-ELS  IPSJ 
Page Top