Improving semantic similarity measures for word pair comparison 語の比較のための意味的な類似性尺度の改善に関する研究
Improving semantic similarity measures for word pair comparison
Menendez Mora, Raul Ernesto
メネンデス モラ, ラウル エルネスト
The semantic web provides a common framework that allows data to be shared and reusedacross application, enterprise, and community boundaries. In order to achieve the goalsof the semantic web, it have to be able to define and to describe the relations among data(i.e., resources) on the Web. Ontologies are one of the formal representation for organizinginformation in the semantic web and they are also used in artificial intelligence,systems engineering, software engineering, biomedical informatics, library science, enterprisebookmarking, and information architecture as a form of knowledge representationabout the world or some part of it. In the semantic web context, since many actors providetheir own ontologies, ontology matching or ontology alignment has taken a criticalrole for helping heterogeneous resources to inter-operate . Ontology matching tools find classes of data that are "semantically equivalent". Thisprocess determine correspondences between concepts which are called alignments .Finding those correspondences imply a semantic similarity assessment between the involvedconcepts. Semantic similarity of words pairs is often represented by the similarity between theconcepts associated with the words. Several methods have been developed to computewords similarity, most of them operating on taxonomic dictionaries like WordNet or external corpus like the Brown Corpus. However the majority of them suffer from aserious limitation. They only focus on the semantic information shared by those words, orin the semantic differences, but they have been rarely combined in a broader perspective. In this thesis we developed and applied a model of semantic similarity computation forword pair comparison. This model consider the semantic commonalities and the semanticdifferences as the core of its approach. By applying the model five new WordNet-basedsemantic similarity measures for word pair comparison were created. Four of this semanticsimilarity measures obtained higher values of correlation with human judgment than theiroriginal expressions, while the fifth one remained as competitive as their original version. We also studyWordNet taxonomic properties to extend a corpus-independent informationcontent metric. The application of this new metric in one of the previously developednode-based semantic similarity allowed us to obtain the highest value of correlation withrespect to human judgment. This thesis provides a general an extensible approach ofsemantic similarity computation for word pair comparison.