Identification of the Same Languages based on an Integrated Similarity Measure of Language Name and Language Classification

  • Wu Ren
    Department of Information and Media Technologies, Yamaguchi Junior College
  • Inui Hideyuki
    Faculty of Humanities, Yamaguchi University
  • Matsuno Hiroshi
    Graduate School of Science and Engineering, Yamaguchi University

Bibliographic Information

Other Title
  • 言語名と言語系統分類の総合的尺度に基づく言語同一性判定

Abstract

Identification of language correspondences between two different sets of language data, which is individually provided by different researchers, is one of the main problems that should be addressed in the research of the world's languages matching. It will be effective for identifying the same language if a language code is assigned to any language as a unique identifier, but such assignment is not usually available for most cases. A method proposed by Wu and Matsuno enabled this identification by using two measures of language name similarity and language classification similarity, and having succeeded in searching 88% languages included in one set of language data that relate to another set of language data. The aim of this paper is to improve the accuracy of this identification by taking into account brother information in a language classification tree. After giving an overview of the method by Wu and Matsuno, we point out the problem that language name similarity and language classification similarity are not utilized effectively, that is, their method gave an inappropriate decision even if either of these two similarities has a complete matching. To address this problem, we define two kinds of new measures: one is a similarity of languages based on brother information, and the other is a language general similarity that integrates the similarities of language name and language classification. Our experimental result shows that our new method is more effective than the previous one.

Journal

References(5)*help

See more

Related Projects

See more

Details 詳細情報について

Report a problem

Back to top