Automatic extraction of logically consistent ontologies from text corpora テキストコーパスからの論理的に無矛盾なオントロジーの自動抽出

この論文にアクセスする

この論文をさがす

著者

    • McCrae, John Philip マックレイ, ジョン フィリップ

書誌事項

タイトル

Automatic extraction of logically consistent ontologies from text corpora

タイトル別名

テキストコーパスからの論理的に無矛盾なオントロジーの自動抽出

著者名

McCrae, John Philip

著者別名

マックレイ, ジョン フィリップ

学位授与大学

総合研究大学院大学

取得学位

博士 (情報学)

学位授与番号

甲第1288号

学位授与年月日

2009-09-30

注記・抄録

博士論文

Ontologies provide a structured description of the concepts and terminology<br />used in a particular domain and provide valuable knowledge for a range of natu-<br />ral language processing applications. However, for many domains and languages<br />ontologies do not exist and manual creation is a difficult and resource-intensive<br />process. As such, automatic methods to extract, expand or aid the construction<br />of these resources is of significant interest.<br />  There are a number of methods for extracting semantic information about<br />how terms are related from raw text, most notably the approach of Hearst<br />[1992], who used <i>patterns</i> to extract hypernym information. This method was<br />manual and it is not clear how to automatically generate patterns, which are<br />specific to a given relationship and domain. I present a novel method for de-<br />veloping patterns based on the use of alignments between patterns. Alignment<br />works well as it is closely related to the concept of a <i>join-set</i> of patterns, which<br />minimally generalise over-fitting patterns. I show that join-sets can be viewed<br />as an reduction on the search space of patterns, while resulting in no loss of<br />accuracy. I then show the results can be combined by a <i>support vector machine</i><br />to a obtain a classifier, which can decide if a pair of terms are related. I applied<br />this to several data sets and conclude that this method produces a precise result,<br />with reasonable recall.<br />  The system I developed, like many semantic relation systems, produces only<br />a binary decision of whether a term pair is related. Ontologies have a structure,<br />that limits the forms of networks they represent. As the relation extraction is<br />generally noisy and incomplete, it is unlikely that the extracted relations will<br />match the structure of the ontology. As such I represent the structure of ontol-<br />ogy as a set of logical statements, and form a consistent ontology by finding the<br />network closest to the relation extraction system's output, which is consistent<br />with these restrictions. This gives a novel <i>NP-hard</i> optimisation problem, for<br />which I develop several algorithms. I present simple greedy approaches, and<br />branch and bound approaches, which my results show are not sufficient for this<br />problem. I then use resolution to show how this problem can be stated as an<br /><i>integer programming problem,</i> which can be efficiently solved by relaxing it to<br />a <i>linear programming problem</i>. I show that this result can efficiently solve the<br />problem, and furthermore when applied to the result of the relation extraction<br />system, this improves the quality of the extraction as well as converting it to an<br />ontological structure.

application/pdf

総研大甲第1288号

11アクセス

各種コード

  • NII論文ID(NAID)
    500000501868
  • NII著者ID(NRID)
    • 8000000503509
  • 本文言語コード
    • eng
  • NDL書誌ID
    • 000010879264
  • データ提供元
    • 機関リポジトリ
    • NDL ONLINE
ページトップへ