辞書見出し語中の複合語を対象とした字種変化特性の分析  [in Japanese] Analysis to Character Type Sequence of Japanese Compound Terms Extracted from Lots of Entry Terms of Several Dictionaries  [in Japanese]

Access this Article

Search this Article

Author(s)

Abstract

近年,理工学分野の学術論文,特許などの文書の用いられる専門用語は,複数の字種で表記される複合語が多くみられる.研究・開発が増加するにつれ,複数字種表記の専門用語が増加する傾向がある.本研究は NL-202 で報告した研究内容を引き継ぎ,辞書見出し語中の多字種複合語を対象に,字種の観点から,字種並びの特性を明らかにすることを意図するものである.本報告により,対象用語集合には字種並びについて以下に挙げる顕著な特性があることが判明した.(a) 字種変化は 2 から 13 で,計 590 種類以上の字種変化パターンが見られた.(b) 全用語数のうち 95% 以上が 2~4 字種変化パターンの形態をとっていた.(c) 2~4 字種変化パターン構成の用語の 95% 以上が,漢字またはカタカナで開始される.(d) 全用語数から得られた字種変化パターンの 95% 以上は,漢字またはカタカナで開始するパターンであった.Lots of Compound terms used in Japanese technical literatures and patent documents are consisted with multi character types. Technical terms written in multi character types in these texts are increasing as New Ideas appear in science, or new technologies are invented in R&D. This research intends to analyze to the sequence of multi character types of compound terms extracted from entry terms in the multiple dictionaries. Specifically, about 11 thousands compound terms starting with one of three types of character (hiragana, katakana, and Chinese character) were analyzed from the pattern matching point of view. The following characteristics to the compound terms set were found by this research. (a) the range the length of character type sequence was 2 to 13. (b) there were 599 patterns of character type sequence in the compound term set. (c) The compound terms by over 95% were consisted of 2 to 4 in the length of character type sequence. (d) The compound terms by over 95% were started with katakana or Chinese character, in other words, compound terms starting with hiragana were very few.

Lots of Compound terms used in Japanese technical literatures and patent documents are consisted with multi character types. Technical terms written in multi character types in these texts are increasing as New Ideas appear in science, or new technologies are invented in R&D. This research intends to analyze to the sequence of multi character types of compound terms extracted from entry terms in the multiple dictionaries. Specifically, about 11 thousands compound terms starting with one of three types of character (hiragana, katakana, and Chinese character) were analyzed from the pattern matching point of view. The following characteristics to the compound terms set were found by this research. (a) the range the length of character type sequence was 2 to 13. (b) there were 599 patterns of character type sequence in the compound term set. (c) The compound terms by over 95% were consisted of 2 to 4 in the length of character type sequence. (d) The compound terms by over 95% were started with katakana or Chinese character, in other words, compound terms starting with hiragana were very few.

Journal

  • IPSJ SIG Notes

    IPSJ SIG Notes 2013-NL-214(16), 1-6, 2013-11-07

    Information Processing Society of Japan (IPSJ)

Codes

  • NII Article ID (NAID)
    110009624076
  • NII NACSIS-CAT ID (NCID)
    AN10115061
  • Text Lang
    JPN
  • Article Type
    Technical Report
  • Data Source
    NII-ELS  IPSJ 
Page Top