A Hybrid Method for Open Information Extraction Based on Shallow and Deep Linguistic Analysis

  • RESHADAT Vahideh
    Faculty of Information and Communication Technology, Malek-Ashtar University of Technology
  • HOORALI Maryam
    Faculty of Information and Communication Technology, Malek-Ashtar University of Technology
  • FAILI Heshaam
    School of Electrical and Computer Engineering, College of Engineering, University of Tehran

抄録

Open Information Extraction is a relation-independent extraction paradigm that extracts assertions from massive and heterogeneous corpora such as the Web. Light relation extractors focus on efficiency by restricting analysis to some shallow linguistic tools such as part-of-speech tagging. Although these methods are fast and scalable, they are unable to deal with complex sentences (such as complicated and long distance relations) due to using only shallow syntactic features. This paper presents two novel hybrid methods, TextRunner-DepOE (TR-DOE) and ReVerb-DepOE (RV-DOE) which combine high-performance subset of shallow Open IE systems with the strengths of a deep Open IE system. We detect the best trade-off between precision and recall by tuning two combination parameters: sentence length and confidence measure. Since the focus is on using time efficiently, we used a fast and robust deep extractor. Experiments indicate that the proposed hybrid methods obtain significantly higher performance than their constituent systems. The best result was for TR-DOE which had an F-measure almost twice that of TextRunner.

収録刊行物

参考文献 (22)*注記

もっと見る

詳細情報

問題の指摘

ページトップへ