Comparison of Discriminative Models for Lexicon Optimization for ASR of Agglutinative Language

Access this Article

Search this Article

Abstract

For automatic speech recognition (ASR) of agglutinative languages, selection of lexical unit is not obvious. Morpheme unit is usually adopted to ensure the sufficient coverage, but many morphemes are short, resulting in weak constraints and possible confusions. We have proposed a discriminative approach to select lexical entries which will directly contribute to ASR error reduction, considering not only linguistic constraint but also acoustic-phonetic confusability. It is based on an evaluation function for each word defined by a set of features and their weights, which are optimized by the difference of word error rates (WERs) by the morpheme-based model and those by the word-based model. In this paper, we investigate several discriminative models to realize this scheme. Specifically, we implement with Support Vector Machines (SVM) and Logistic Regression (LR) model as well as simple perceptron. Experimental evaluations on Uyghur LVCSR show that SVM and LR are more robustly trained and SVM results in the best performance with a large dimension of features.For automatic speech recognition (ASR) of agglutinative languages, selection of lexical unit is not obvious. Morpheme unit is usually adopted to ensure the sufficient coverage, but many morphemes are short, resulting in weak constraints and possible confusions. We have proposed a discriminative approach to select lexical entries which will directly contribute to ASR error reduction, considering not only linguistic constraint but also acoustic-phonetic confusability. It is based on an evaluation function for each word defined by a set of features and their weights, which are optimized by the difference of word error rates (WERs) by the morpheme-based model and those by the word-based model. In this paper, we investigate several discriminative models to realize this scheme. Specifically, we implement with Support Vector Machines (SVM) and Logistic Regression (LR) model as well as simple perceptron. Experimental evaluations on Uyghur LVCSR show that SVM and LR are more robustly trained and SVM results in the best performance with a large dimension of features.

Journal

  • 研究報告音声言語情報処理(SLP)

    研究報告音声言語情報処理(SLP) 2012-SLP-92(13), 1-4, 2012-07-12

Codes

  • NII Article ID (NAID)
    110009422510
  • NII NACSIS-CAT ID (NCID)
    AN10442647
  • Text Lang
    ENG
  • Article Type
    Technical Report
  • Data Source
    NII-ELS  IPSJ 
Page Top