階層化言語モデルによる音声ドキュメントの検索 Spoken Document Retrieval by Using a Hierarchical Language Model

Access this Article

Search this Article

Abstract

本稿では、音声ドキュメントに対するキーワード検索について、階層的言語モデルを用いたラティス構造をインデックスに利用する検索手法を提案する。階層的言語モデルは、音声ドキュメントの認識、および検索のインデクシングにおいて利用し、階層は、二階層から構成されている。上位の階層は、既知の単語からなる従来のクラス n-gram の階層であり、下位の階層は、日本人名および地名の、未知語を処理するためのサブワード n-gram の階層である。認識結果として扱うラティスには、未知語のタイプやその候補の位置情報を利用し、FSA により枝狩りされたサブワード列から構成される未知語が含まれる。このようなラティスを簡潔に表現する一つである confusion network (CN) を用いたインデクシング処理を行うことにより、既知語、未知語を同時に処理できる手法となっている。検索の評価は、既知語および未知語を含むものとして評価した。実験結果として、この階層的モデルが認識辞書に未登録の日本人の人名や地名を効果的に検出した。また、既知語の効率は、従来のクラス n-gram 手法に比較して、  ほとんど変化しておらず、悪影響はみられなかった。We propose a new scheme for searching keywords in a speech document. A hierarchical language model is introduced for the recognition and indexing phases. This language model comprises two independently trained layers, an upper layer comprising a conventional class-based n-gram for recognizing in-vocabulary (IV) words, and a lower layer comprising a sub-word n-gram for recognizing two types of out-of-vocabulary words (OOV), Japanese personal names and locations. The recognized sub-word sequences in a lattice are pruned with a finite state automata (FSA). By using the recognized subwords and their position information within the OOV words, the subwords are linked to form potential OOV words. A confusion network is adopted for the indexing phase; the network is built by using the IV words and the pruned OOV words in the lattices. Evaluations on the keyword search, including IV words and OOV words (both personal and location names), are conducted. The experimental results show that the hierarchical language model has a considerable high ability to identify the above proper names that are not registered in recognition lexicon, whereas that for IV words is not significantly improved.

We propose a new scheme for searching keywords in a speech document. A hierarchical language model is introduced for the recognition and indexing phases. This language model comprises two independently trained layers, an upper layer comprising a conventional class-based n-gram for recognizing in-vocabulary (IV) words, and a lower layer comprising a sub-word n-gram for recognizing two types of out-of-vocabulary words (OOV), Japanese personal names and locations. The recognized sub-word sequences in a lattice are pruned with a finite state automata (FSA). By using the recognized subwords and their position information within the OOV words, the subwords are linked to form potential OOV words. A confusion network is adopted for the indexing phase; the network is built by using the IV words and the pruned OOV words in the lattices. Evaluations on the keyword search, including IV words and OOV words (both personal and location names), are conducted. The experimental results show that the hierarchical language model has a considerable high ability to identity the above proper names that are not registered in recognition lexicon, whereas that for IV words is not significantly improved.

Journal

  • IPSJ SIG Notes

    IPSJ SIG Notes 2008(68(2008-SLP-072)), 31-36, 2008-07-11

    Information Processing Society of Japan (IPSJ)

References:  12

Cited by:  3

  • Progress Report of SLP Spoken Document Processing Working Group  [in Japanese]

    AKIBA Tomoyosi , AIKAWA Kiyoaki , ITOH Yoshiaki , KAWAHARA Tatsuya , NANJO Hiroaki , NISHIZAKI Hiromitsu , YASUDA Norihito , YAMASHITA Yoichi , MATSUI Tomoko , HU Xinhui , NAKAGAWA Seiichi , ITOU Katunobu

    IPSJ SIG Notes 2008(123(2008-SLP-074)), 115-120, 2008-12-02

    IPSJ  References (15) Cited by (1)

  • Progress Report of SLP Spoken Document Processing Working Group  [in Japanese]

    AKIBA Tomoyosi , AIKAWA Kiyoaki , ITOH Yoshiaki , KAWAHARA Tatsuya , NANJO Hiroaki , NISHIZAKI Hiromitsu , YASUDA Norihito , YAMASHITA Yoichi , MATSUI Tomoko , HU Xinhui , NAKAGAWA Seiichi , ITOU Katunobu

    IEICE technical report 108(337), 115-120, 2008-12-02

    References (15)

  • Progress Report of SLP Spoken Document Processing Working Group  [in Japanese]

    AKIBA Tomoyosi , AIKAWA Kiyoaki , ITOH Yoshiaki , KAWAHARA Tatsuya , NANJO Hiroaki , NISHIZAKI Hiromitsu , YASUDA Norihito , YAMASHITA Yoichi , MATSUI Tomoko , HU Xinhui , NAKAGAWA Seiichi , ITOU Katunobu

    IEICE technical report 108(338), 115-120, 2008-12-02

    References (15)

Codes

  • NII Article ID (NAID)
    110006862657
  • NII NACSIS-CAT ID (NCID)
    AN10442647
  • Text Lang
    ENG
  • Article Type
    Journal Article
  • ISSN
    09196072
  • NDL Article ID
    9606295
  • NDL Source Classification
    ZM13(科学技術--科学技術一般--データ処理・計算機)
  • NDL Call No.
    Z14-1121
  • Data Source
    CJP  CJPref  NDL  NII-ELS  IPSJ 
Page Top