Improved Method for Organizing Information Contained in Multiple Documents into a Table

DOI Web Site 参考文献12件 オープンアクセス

抄録

<p>Okazaki et al. (2018) have proposed a method for organizing the information contained in multiple documents into a table without limiting the information to be extracted. In this study, we propose a method for improving the accuracy of these tables. In our proposed method, information is first clustered hierarchically. Next, for the results of hierarchical clustering (with the number of clusters ranging from 1 to n), the degree of filling and the information density of the resulting table are calculated. The number of clusters when the balance between these two indicators is optimal is chosen as the optimal number of clusters. The results of the method using the chosen number of clusters are organized into a table. In the conventional method, the number of clusters estimated by the X-means method tends to be too small. As demonstrated by the results of experiments using 15 types of multiple documents, the proposed method improves this problem, with its estimated number of clusters being closer to the optimum. The average evaluation result in the tables (F-measure) when applying the conventional method was 0.43; the proposed method improves this to 0.65. We therefore confirm the effectiveness of the proposed method. </p>

収録刊行物

  • 自然言語処理

    自然言語処理 28 (3), 802-823, 2021

    一般社団法人 言語処理学会

参考文献 (12)*注記

もっと見る

関連プロジェクト

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ