Support Vector Machineを用いた日本語固有表現抽出  [in Japanese] Japanese Named Entity Extraction Using Support Vector Machine  [in Japanese]

Access this Article

Search this Article

Author(s)

    • 山田 寛康 YAMADA Hiroyasu
    • 奈良先端科学技術大学院大学情報科学研究科 The Graduate School of Information Science of Nara Institute of Science and Technology
    • 工藤 拓 KUDO Taku
    • 奈良先端科学技術大学院大学情報科学研究科 The Graduate School of Information Science of Nara Institute of Science and Technology
    • 松本 裕治 MATSUMOTO Yuji
    • 奈良先端科学技術大学院大学情報科学研究科 The Graduate School of Information Science of Nara Institute of Science and Technology

Abstract

本稿では,機械学習アルゴリズムSupport Vector Machine(SVM)を用いて日本語固有表現抽出を学習する手法を提案し,抽出実験によりその有効性を検証する.固有表現抽出規則の学習には,単語自身,品詞,文字種などを素性として使用するため,その素性空間は非常に高次元となる.SVMは汎化誤差が素性空間の次元数に依存しないため,固有表現抽出規則の学習においても過学習を起こすことなく汎化性能の高い学習が実現できる.また多項式Kernel関数を適用することで複数の素性の組合せを考慮した学習が計算量を変えることなく実現できる.CRL固有表現データを用いてIREX固有表現抽出タスクに対して実験を行った結果,語彙,品詞,文字種,およびそれら任意の2つの組合せを考慮した場合,交差検定によりF値で約83という高精度の結果が得られた.In this paper, we propose a method for Japanese named entity (NE)extraction using Support Vector Machines (SVM). The generalizationperformance of SVM does not depend on the size of dimensions of thefeature space, even in a high dimensional feature space, such as namedentity extraction task using lexical entries, part-of-speech tags andcharacter types of words as the primitive features. Furthermore, SVMcan induce an optimal classifier which considers the combination offeatures by virtue of polynomial kernel functions. We apply the methodto IREX NE task using CRL Named Entities data. The cross validationresult of the F-value being 83 shows the effectiveness of the method.

In this paper, we propose a method for Japanese named entity(NE) extraction using Support Vector Machines(SVM). The generalization performance of SVM does not depend on the size of dimensions of the feature space, even in a high dimensional feature space, such as named entity extraction task using lexical entries, part-of speech tags and character types of words as the primitive features. Furthermore, SVM can induce an optimal classifier which considers the combination of features by virtue of polynomial kernel functions. We apply the method to IREX NE task using CRL Named Entities data. The cross validation result of the F-value being 83 shows the effectiveness of the method.

Journal

  • Transactions of Information Processing Society of Japan

    Transactions of Information Processing Society of Japan 43(1), 44-53, 2002-01-15

    Information Processing Society of Japan (IPSJ)

References:  16

Cited by:  39

Codes

  • NII Article ID (NAID)
    110002726221
  • NII NACSIS-CAT ID (NCID)
    AN00116647
  • Text Lang
    JPN
  • Article Type
    Journal Article
  • ISSN
    1882-7764
  • NDL Article ID
    6040980
  • NDL Source Classification
    ZM13(科学技術--科学技術一般--データ処理・計算機)
  • NDL Call No.
    Z14-741
  • Data Source
    CJP  CJPref  NDL  NII-ELS  IPSJ 
Page Top