Automatic extraction of logically consistent ontologies from text corpora

この論文にアクセスする

機関リポジトリ総合研究大学院大学

この論文をさがす

NDL ONLINE

著者

- McCrae, John Philip マックレイ, ジョンフィリップ

書誌事項

タイトル: Automatic extraction of logically consistent ontologies from text corpora

タイトル別名: テキストコーパスからの論理的に無矛盾なオントロジーの自動抽出

著者名: McCrae, John Philip

著者別名: マックレイ, ジョンフィリップ

学位授与大学: 総合研究大学院大学

取得学位: 博士 (情報学)

学位授与番号: 甲第1288号

学位授与年月日: 2009-09-30

注記・抄録

博士論文

Ontologies provide a structured description of the concepts and terminology used in a particular domain and provide valuable knowledge for a range of natu- ral language processing applications. However, for many domains and languages ontologies do not exist and manual creation is a difficult and resource-intensive process. As such, automatic methods to extract, expand or aid the construction of these resources is of significant interest. 　　There are a number of methods for extracting semantic information about how terms are related from raw text, most notably the approach of Hearst [1992], who used patterns to extract hypernym information. This method was manual and it is not clear how to automatically generate patterns, which are specific to a given relationship and domain. I present a novel method for de- veloping patterns based on the use of alignments between patterns. Alignment works well as it is closely related to the concept of a join-set of patterns, which minimally generalise over-fitting patterns. I show that join-sets can be viewed as an reduction on the search space of patterns, while resulting in no loss of accuracy. I then show the results can be combined by a support vector machine to a obtain a classifier, which can decide if a pair of terms are related. I applied this to several data sets and conclude that this method produces a precise result, with reasonable recall. 　　The system I developed, like many semantic relation systems, produces only a binary decision of whether a term pair is related. Ontologies have a structure, that limits the forms of networks they represent. As the relation extraction is generally noisy and incomplete, it is unlikely that the extracted relations will match the structure of the ontology. As such I represent the structure of ontol- ogy as a set of logical statements, and form a consistent ontology by finding the network closest to the relation extraction system's output, which is consistent with these restrictions. This gives a novel NP-hard optimisation problem, for which I develop several algorithms. I present simple greedy approaches, and branch and bound approaches, which my results show are not sufficient for this problem. I then use resolution to show how this problem can be stated as an integer programming problem, which can be efficiently solved by relaxing it to a linear programming problem. I show that this result can efficiently solve the problem, and furthermore when applied to the result of the relation extraction system, this improves the quality of the extraction as well as converting it to an ontological structure.

application/pdf

総研大甲第1288号

Automatic extraction of logically consistent ontologies from text corpora テキストコーパスからの論理的に無矛盾なオントロジーの自動抽出

この論文にアクセスする

この論文をさがす

著者

書誌事項

注記・抄録

各種コード

書き出し