Automatic extraction of logically consistent ontologies from text corpora テキストコーパスからの論理的に無矛盾なオントロジーの自動抽出
この論文にアクセスする
この論文をさがす
著者
書誌事項
- タイトル
-
Automatic extraction of logically consistent ontologies from text corpora
- タイトル別名
-
テキストコーパスからの論理的に無矛盾なオントロジーの自動抽出
- 著者名
-
McCrae, John Philip
- 著者別名
-
マックレイ, ジョン フィリップ
- 学位授与大学
-
総合研究大学院大学
- 取得学位
-
博士 (情報学)
- 学位授与番号
-
甲第1288号
- 学位授与年月日
-
2009-09-30
注記・抄録
博士論文
Ontologies provide a structured description of the concepts and terminology<br />used in a particular domain and provide valuable knowledge for a range of natu-<br />ral language processing applications. However, for many domains and languages<br />ontologies do not exist and manual creation is a difficult and resource-intensive<br />process. As such, automatic methods to extract, expand or aid the construction<br />of these resources is of significant interest.<br /> There are a number of methods for extracting semantic information about<br />how terms are related from raw text, most notably the approach of Hearst<br />[1992], who used <i>patterns</i> to extract hypernym information. This method was<br />manual and it is not clear how to automatically generate patterns, which are<br />specific to a given relationship and domain. I present a novel method for de-<br />veloping patterns based on the use of alignments between patterns. Alignment<br />works well as it is closely related to the concept of a <i>join-set</i> of patterns, which<br />minimally generalise over-fitting patterns. I show that join-sets can be viewed<br />as an reduction on the search space of patterns, while resulting in no loss of<br />accuracy. I then show the results can be combined by a <i>support vector machine</i><br />to a obtain a classifier, which can decide if a pair of terms are related. I applied<br />this to several data sets and conclude that this method produces a precise result,<br />with reasonable recall.<br /> The system I developed, like many semantic relation systems, produces only<br />a binary decision of whether a term pair is related. Ontologies have a structure,<br />that limits the forms of networks they represent. As the relation extraction is<br />generally noisy and incomplete, it is unlikely that the extracted relations will<br />match the structure of the ontology. As such I represent the structure of ontol-<br />ogy as a set of logical statements, and form a consistent ontology by finding the<br />network closest to the relation extraction system's output, which is consistent<br />with these restrictions. This gives a novel <i>NP-hard</i> optimisation problem, for<br />which I develop several algorithms. I present simple greedy approaches, and<br />branch and bound approaches, which my results show are not sufficient for this<br />problem. I then use resolution to show how this problem can be stated as an<br /><i>integer programming problem,</i> which can be efficiently solved by relaxing it to<br />a <i>linear programming problem</i>. I show that this result can efficiently solve the<br />problem, and furthermore when applied to the result of the relation extraction<br />system, this improves the quality of the extraction as well as converting it to an<br />ontological structure.
application/pdf
総研大甲第1288号