Construction of a database and search engine on the phonological similarity of two-kanji compound words in Japanese, Korean, Chinese and Vietnamese

DOI IR HANDLE Web Site Open Access

Bibliographic Information

Other Title
  • 日韓中越4言語における2字漢字語の音韻類似性に関するデータベースおよび検索エンジンの構築
  • ニッカン チュウエツ 4 ゲンゴ ニ オケル 2ジ カンジゴ ノ オンイン ルイジセイ ニ カンスル データベース オヨビ ケンサク エンジン ノ コウチク

Search this article

Abstract

Although Chinese, Japanese, Korean and Vietnamese (hereafter, CJKV) share a large number of Chinese-originated cognates, the Han characters (kanji) are currently used only in Chinese and Japanese. As a result, it is difficult to quantify cognate similarities between these four Asian languages by measuring orthographic similarities, which are commonly studied when comparing between European language cognates. Instead, phonological similarity should be a more universal approach to quantify cognate similarities between the four languages of CJKV because of its independence within writing systems. Accordingly, we extracted two-kanji compound words shared by the CJKV languages from a database of 2,058 kanji words. Two objective measures of phonological similarity were computed for each language pair: (1) Phonological Distance, which is based on generalized Levenshtein distance, (2) Phoneme Similarity, which mitigates the bias of word length. All six possible language pairs followed a similar pattern: the distribution of phonological similarity of cognates was near-symmetic, centered on cognates of median similarities. While in European languages, there are language pairs such as German-Dutch that share many phonologically highly similar cognates, no such pair was found among CJKV. In order to make our calculation results accessible to the public, we developed an online search engine that features intuitive and interactive display of phonological similarity measures (http://kanjigodb.herokuapp.com/).

Journal

  • ことばの科学

    ことばの科学 33 75-94, 2019-12-25

    名古屋大学言語文化研究会

Related Projects

See more

Details 詳細情報について

Report a problem

Back to top