認識情報を利用した英数字混在文書からの文字切出しと認識

書誌事項

タイトル別名
  • Character Segmentation and Recognition of Alphanumeric-mixed Documents Based on Pattern Recognition Information
  • ニンシキ ジョウホウ オ リヨウ シタ エイスウジ コンザイ ブンショ カラ ノ モジ キリダシ ト ニンシキ

この論文をさがす

抄録

Generally speaking, Japanese OCR cannot easily read Japanese documents that also contain alphanumeric data, bacause of the proportional pitch setting of alphanumeric characters displaced in the fixed pitch setting of the Japanese document.<br> This paper describes how to extract character candidates from combinations of small patterns that may be components of separable Japanese characters or slim patterns as alphanumeric characters, and how to select true character patterns from character candidates. We propose a new segmentation and recognition method for alphanumeric-mixed documents based on pattern recognition information such as similarities, pattern sizes and character kinds.<br> The method was tested on alphanumeric-mixed documents, which were 51 pages of technical journals and transactions containing 68, 867 characters. The resulting segmentation rate was 99.75% and the recognition rate was 99.05%, so we conclude that this method may be applied to Japanese OCR.

収録刊行物

参考文献 (18)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ