Document Separation between Native English and Nonnative English Using Long POS Strings

Yukino Kensei, Aoki Sayaka, Tanigawa Ryuji, Tomiura Yoichi

doi:10.15017/1516865

【Created on October 31, 2023】 Integration of CiNii Dissertations and CiNii Books into CiNii Research

Impact of the Release of the New "NDL Search" on CiNii Services

Document Separation between Native English and Nonnative English Using Long POS Strings

DOI HANDLE Web Site Open Access

Yukino Kensei

Department of Intelligent Systems, Graduate School of Information Science and Electrical Engineering, Kyushu University : Doctoral Program
Aoki Sayaka

Department of Intelligent Systems, Graduate School of Information Science and Electrical Engineering, Kyushu University : Master's Program
Tanigawa Ryuji

Department of Intelligent Systems, Graduate School of Information Science and Electrical Engineering, Kyushu University : Master's Program
Tomiura Yoichi

Department of Intelligent Systems, Faculty of Information Science and Electrical Engineering, Kyushu University

Bibliographic Information

Other Title

長い品詞列を文書特徴とした母語話者英文書・非母語話者英文書の判別
ナガイヒンシレツオブンショトクチョウトシタボゴワシャエイブンショヒボゴワシャエイブンショノハンベツ

Search this article

Abstract

We propose using long and low-frequency part of speech (POS) strings for document separation between native English documents and non-native English documents. The long POS strings were ignored in previous works because their frequencies in training data are too small to estimate their probabilities. Meanwhile, a research of language identification showed that the long and low-frequency byte strings were useful for language identification among similar languages. There are some similarity between language identification and document separation between native English documents and non-native English documents, for example long POS strings are more peculiar to one class than short ones, though there is a difference between POS and byte. Therefore, we can expect higher accuracy by using long and low-frequency POS strings. Some experiments are described in this paper. These experiments show that the proposed method has higher accuracy than previous ones.

Journal

九州大学大学院システム情報科学紀要

九州大学大学院システム情報科学紀要 11 (2), 115-119, 2006-09-26

Faculty of Information Science and Electrical Engineering, Kyushu University

Keywords

Details 詳細情報について

CRID

1390853649773708672
NII Article ID

110005207704
NII Book ID

AN10569524
DOI

10.15017/1516865
ISSN

21880891

13423819
HANDLE

2324/1516865
NDL BIB ID

8536634
Web Site

http://id.ndl.go.jp/bib/8536634

https://ndlsearch.ndl.go.jp/books/R000000004-I8536634
Text Lang

ja
Data Source
- JaLC
- IRDB
- NDL
- CiNii Articles
Abstract License Flag
Allowed

Document Separation between Native English and Nonnative English Using Long POS Strings

Bibliographic Information

Search this article

Abstract

Journal

Keywords

Details 詳細情報について

Export

Report a problem

Document Separation between Native English and Nonnative English Using Long POS Strings

Bibliographic Information

Search this article

Abstract

Journal

Keywords

Details 詳細情報について

Export

Report a problem

Project list