集約処理を用いたMapReduce最適化手法の提案と実装

小沢, 健史, 鬼塚, 真, 福本, 佳史, 盛合, 敏

本稿では，MapReduceで行う処理のうち，部分集約が可能な処理を高速化する手法を示す．部分集約ができる処理に対して，既存研究では集約処理に特化した処理系を新たに作成することにより高速化を行っていた．しかし，これらの手法はMapReduceの仕組みを大幅に変更する必要があることから，Hadoopに組み込むのは困難であった．そこで本研究では，Hadoopへの実装コストを低く抑え，耐故障性を担保しつつ高速化を行うMap Multi-Reduceの提案を行う．Map Multi-Reduceは，MapReduceに計算機ごとの集約処理を行う機能を追加した，MapReduceの拡張版である．提案手法の実装を行うにあたり行ったHadoopへの変更量は約800行と小さい．このように少ない変更量にもかかわらず，実験により，300GBのWordCountを行う際にMap処理とReduce処理間のデータの受け渡しを削減し，処理速度が1.5倍になることを確認した．

In this paper, we propose a MapReduce optimization by using mapper-side aggregation designed for aggregation queries. The mapper-side aggregation has been applied in different platforms, however, it is difficult for related work to be embedded within existing MapReduce framework like Hadoop, because its mechanism of task scheduling or monitoring is different and MapReduce framework does not provide inter-process communication facilitiy. To solve this problem, we prototype Map Multi-Reduce, while preserving MapReduce semantics with small modification against Hadoop. Map Multi-Reduce is an extension of MapReduce to support node-level aggregation feature with fault tolerance. Map Multi-Reduce aggregates the the outputs of multiple MapTasks in same machines and is implemented in only 800 LOC. Map Multi-Reduce improves 1.5 times faster in WordCount processing against 300GB dataset by cutting down shuffle cost.

集約処理を用いたMapReduce最適化手法の提案と実装

書誌事項

この論文をさがす

抄録

収録刊行物

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

集約処理を用いたMapReduce最適化手法の提案と実装

書誌事項

この論文をさがす

抄録

収録刊行物

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について