日本語における略語自動生成法の検討とその音声インタフェースヘの応用  [in Japanese] Automatic Generation Abbriviated Forms of Japanese Expressions and its Apprications to Speech Recognition  [in Japanese]

Access this Article

Search this Article

Author(s)

Abstract

音声認識に,認識対象として未登録の略語を使えるようにする方法を提案している.略語の生成はいくつかの規則に従うことが知られている.本研究では,それらの規則により元の表現(原型)から簡略後の表現(略語)を自動的に生成することを考えている.規則の適用により略語の候補を多数生成し,各候補に対し,どの規則を適用して生成したか,略語の言語モデルに整合しているか,Web 上での使用頻度は多いか,の3つの基準により略語らしさとしてのスコアをつけ,上位からいくつかの候補を選んで認識対象辞書に加えるという方略を提案している.提案法により原型 40 語から略語候補を生成し,各原型につき略語らしい候補を 10 語ずつ選んだところ,約 80%の略語をカバーできている.音声認識システムに提案法を応用したところ,認識語彙の増大による認識率の低下を十分上回る略語認識ができるようになっている.Proposed is a method to generate abbriviated forms of Japan expressions to accept them as words to be recognized even in case they are unregistered for speech recognition. It is known that there are several rules to generate abbriviated forms from original expressions. Proposed is automatic generation of abbriviated forms from an original expression. The proposed method generates several tens or hundreds of candidates of an abbriviated form by applying possible generation rules to the original expression. A scoring system to prune the candidates for each original expression is designed on the following three criteria; which generation rule is adopted, accordance with the language model of abbriviation, and appearance frequency on the Internet. Candidates having score ranked within the top N are registered into the word list for recognition. To evaluate the method, the proposed method is used to generate candidates of abbriviated forms from 40 original expressions, and the system choses 10 candidates for each original expression referring to the score. About 80% of the correct abbriviations were included in the top 10 candidates. The output of the proposed method is fed to a speech recognition system yielding recognition improvement sufficiently compensating decrease of recognition rate due to enlargement of vocabulary size.

Proposed is a method to generate abbriviated forms of Japan expressions to accept them as words to be recognized even in case they are unregistered for speech recognition. It is known that there are several rules to generate abbriviated forms from original expressions. Proposed is automatic generation of abbriviated forms from an original expression. The proposed method generates several tens or hundreds of candidates of an abbriviated form by applying possible generation rules to the original expression. A scoring system to prune the candidates for each original expression is designed on the following three criteria ; which generation rule is adopted, accordance with the language model of abbriviation, and appearance frequency on the Internet. Candidates having score ranked within the top N are registered into the word list for recognition. To evaluate the method, the proposed method is used to generate candidates of abbriviated forms from 40 original expressions, and the system choses 10 candidates for each original expression referring to the score. About 80% of the correct abbriviations were included in the top 10 candidates. The output of the proposed method is fed to a speech recognition system yielding recognition improvement sufficiently compensating decrease of recognition rate due to enlargement of vocabulary size.

Journal

  • IPSJ SIG Notes

    IPSJ SIG Notes 2007(129(2007-SLP-069)), 313-318, 2007-12-21

    Information Processing Society of Japan (IPSJ)

References:  9

Cited by:  2

Codes

  • NII Article ID (NAID)
    110006549593
  • NII NACSIS-CAT ID (NCID)
    AN10442647
  • Text Lang
    JPN
  • Article Type
    Journal Article
  • ISSN
    09196072
  • NDL Article ID
    9333788
  • NDL Source Classification
    ZM13(科学技術--科学技術一般--データ処理・計算機)
  • NDL Call No.
    Z14-1121
  • Data Source
    CJP  CJPref  NDL  NII-ELS  IPSJ 
Page Top