Construction of a Test Collection for Spoken Document Retrieval from Lecture Audio Data


Akiba Tomoyosi:Toyohashi University of Technology
Aikawa Kiyoaki:Tokyo University of Technology
Itoh Yoshiaki:Iwate Prefectural University
Kawahara Tatsuya:Kyoto University
Nanjo Hiroaki:Ryukoku University
Nishizaki Hiromitsu:University of Yamanashi
Yasuda Norihito:Nippon Telegraph and Telephone Corporation
Yamashita Yoichi:Ritsumeikan University
Itou Katunobu:Hosei University

The lecture is one of the most valuable genres of audiovisual data. Though spoken document processing is a promising technology for utilizing the lecture in various ways, it is difficult to evaluate because the evaluation require a subjective judgment and/or the verification of large quantities of evaluation data. In this paper, a test collection for the evaluation of spoken lecture retrieval is reported. The test collection consists of the target spoken documents of about 2, 700 lectures (604 hours) taken from the Corpus of Spontaneous Japanese (CSJ), 39 retrieval queries, the relevant passages in the target documents for each query, and the automatic transcription of the target speech data. This paper also reports the retrieval performance targeting the constructed test collection by applying a standard spoken document retrieval (SDR) method, which serves as a baseline for the forthcoming SDR studies using the test collection.

Journal of Information Processing
17 pp.82-94 2009

 論文のタイトル・リンクをメールで送信
 (docomo/au  SoftBank)
 PC向けページを表示

2 ▲ページの先頭へ