局所的な係り受けの情報を用いた話し言葉の節・文境界の推定  [in Japanese] Clause and Sentence Boundary Detection for Spontaneous Speech Using Local Syntactic Dependency Information  [in Japanese]

Access this Article

Search this Article

Abstract

講演などの話し言葉音声における節·文境界推定のために,局所的な係り受け情報を用いた段階的チャンキング手法を提案する.節·文境界推定を行ううえで係り受け情報は有効な特徴と考えられるが,話し言葉の文は必ずしも文法的ではなく,またフィラーやいいよどみなどの非流暢現象が含まれている.さらに音声認識器の出力には必然的に認識誤りが含まれており,このようなテキストにおける自動的な係り受け解析は困難である.このため,本研究では隣接する文節間の局所的な係り受けに着目し,サポートベクタマシン(SVM)に基づく段階的テキストチャンキングにおいてこの局所的係り受けを用いた節·文境界の候補の絞り込みを導入する.『日本語話し言葉コーパス』(CSJ)の講演音声における評価において,局所的な係り受けの情報が音声認識結果に対しても頑健に機能し,精度の改善が得られることが示された.For robust detection of sentence and clause units in spontaneous speech such as lectures, we propose a novel cascaded chunking strategy which incorporates local syntactic information. Application of general syntactic parsing is difficult for spontaneous speech having ill-formed sentences and disfluencies, especially for erroneous transcripts generated by ASR systems. Therefore, we focus on the local syntactic dependency of adjacent words and phrases, and use this information to limit candidates of clause/sentence boudaries, which are detected by classifiers based on SVM (Support Vector Machines). An experimental evaluation using spontaneous talks of the CSJ (Corpus of Spontaneous Japanese) demonstrates that the proposed dependency analysis can be robustly performed and is effective for clause/sentence unit detection in ASR outputs.

For robust detection of sentence and clause units in spontaneous speech such as lectures, we propose a novel cascaded chunking strategy which incorporates local syntactic information. Application of general syntactic parsing is difficult for spontaneous speech having ill-formed sentences and disfluencies, especially for erroneous transcripts generated by ASR systems. Therefore, we focus on the local syntactic dependency of adjacent words and phrases, and use this information to limit candidates of clause/sentence boudaries, which are detected by classifiers based on SVM (Support Vector Machines). An experimental evaluation using spontaneous talks of the CSJ (Corpus of Spontaneous Japanese) demonstrates that the proposed dependency analysis can be robustly performed and is effective for clause/sentence unit detection in ASR outputs.

Journal

  • 情報処理学会論文誌

    情報処理学会論文誌 50(2), 544-552, 2009-02-15

Codes

  • NII Article ID (NAID)
    110007970350
  • NII NACSIS-CAT ID (NCID)
    AN00116647
  • Text Lang
    JPN
  • Article Type
    Journal Article
  • ISSN
    1882-7764
  • Data Source
    NII-ELS  IPSJ 
Page Top