Estimating an author’s gender using a random forest for offender profiling

Bibliographic Information

Other Title
  • ランダムフォレストによる著者の性別推定 -犯罪者プロファイリング実現に向けた検討-
  • ランダムフォレスト ニ ヨル チョシャ ノ セイベツ スイテイ : ハンザイシャ プロファイリング ジツゲン ニ ムケタ ケントウ

Search this article

Abstract

<p> Offender profiling is a method used to assist criminal investigation teams by estimating an offender’s gender, age, or job, on the basis of analyzing the crime scene using statistical and psychological methods. If only printed documents or e-mails are available, however, analysts are powerless to estimate the offenders’ characteristics until now, because there is no crime scene. This study aims to estimate gender by applying a random forest technique to texts on Blog. The results indicated that the following stylometric features were effective in estimating gender: rate of usage of Kanji, Hiragana, Katakana, nouns. Moreover, the frequency of certain parts of speech (verb, adjective, postpositional particle, and interjection), conjunctive particle 「し」, auxiliary verb 「なかっ」, comma, and letters (「私」「僕」「っ」「ゃ」) also were effective. The results of Leave-One-Out-Cross-Validation (LOOCV) showed that the highest rate of accuracy was 86.0%: 84.6% for male and 87.5% for female in the rate of precision. Furthermore, support vector machine showed lower accuracy, 75.0%, comparing with random forest: 69.2% for male and 85.7% for female in the rate of precision</p>

Journal

Citations (2)*help

See more

References(4)*help

See more

Details 詳細情報について

Report a problem

Back to top