検索範囲を考慮したクエリスケジューリングによるCassandraの応答性能向上 Improving Cassandra Query Scheduling in Consideration of Query Range

この論文にアクセスする

この論文をさがす

著者

抄録

クラウドサービスの普及により,日々膨大なデータが生成されており,大規模なデータの管理に対する需要が増加している.それにともない複数の計算機でデータを分散して管理するデータストアである分散Key Value Store(KVS)が広く利用されている.分散KVSはデータをKeyとValueという単純な構造で管理するため,容易にスケールアウト可能であるという利点があり,大規模なデータを扱うサービスなどで注目を集めている.分散KVSでは,データのKeyを指定することで対応するValueを取得する検索処理が可能である.また分散KVSの実装によっては,Keyの範囲を指定することによりそのKey間に属するValueを取得する検索処理が可能なものも存在する.しかし,単一検索と範囲検索が混在する環境においては,高速に実行終了が可能な単一検索命令の実行が待たされてしまい,結果として平均応答時間が増加するという問題がある.そこで本研究では複数の検索クエリをスケジューリングし,高速に実行可能なクエリを優先的に実行することにより,検索クエリの平均応答時間を短縮させる手法を提案する.提案手法を分散KVSのCassandra上に実装し,評価を行った結果,クエリのスケジューリングにより,単一検索の平均応答時間を最大で80%,範囲検索の平均応答時間を最大で20%短縮できることを確認した.A management of large-scale data becomes more important, along with the spread of cloud services. Distributed Key Value Store (KVS) is a datastore which manages data across multiple machines. Since distributed KVSs manage data which consists of simple key-value pair, they can achieve scalability easily. Distributed KVSs are widely used in many services managing large-scale data, such as Facebook and Twitter. Distributed KVSs provide interfaces to access key-value pair by simply specifying the key. In this paper, we refer to a query which only obtains a value from a key as a single query. Some distributed KVSs support a range query which obtains multiple values from a key range. However, under mixed query workloads that consist of single and range queries, single queries (which can be executed faster) are forced to wait until preceding range queries are finished. And this results in the increase of average response time. We propose an approach to reduce the average response time by query scheduling. We implemented our method on Cassandra, and evaluation results showed a reduction of the average response time of single queries by 80% and the average response time of range queries by 20%.

A management of large-scale data becomes more important, along with the spread of cloud services. Distributed Key Value Store (KVS) is a datastore which manages data across multiple machines. Since distributed KVSs manage data which consists of simple key-value pair, they can achieve scalability easily. Distributed KVSs are widely used in many services managing large-scale data, such as Facebook and Twitter. Distributed KVSs provide interfaces to access key-value pair by simply specifying the key. In this paper, we refer to a query which only obtains a value from a key as a single query. Some distributed KVSs support a range query which obtains multiple values from a key range. However, under mixed query workloads that consist of single and range queries, single queries (which can be executed faster) are forced to wait until preceding range queries are finished. And this results in the increase of average response time. We propose an approach to reduce the average response time by query scheduling. We implemented our method on Cassandra, and evaluation results showed a reduction of the average response time of single queries by 80% and the average response time of range queries by 20%.

収録刊行物

  • 情報処理学会論文誌

    情報処理学会論文誌 56(2), 492-502, 2015-02-15

    一般社団法人情報処理学会

各種コード

  • NII論文ID(NAID)
    110009877365
  • NII書誌ID(NCID)
    AN00116647
  • 本文言語コード
    JPN
  • 資料種別
    Journal Article
  • ISSN
    1882-7764
  • データ提供元
    NII-ELS  IPSJ 
ページトップへ