Character Segmentation and Recognition of Alphanumeric-mixed Documents Based on Pattern Recognition Information

Bibliographic Information

Other Title
  • 認識情報を利用した英数字混在文書からの文字切出しと認識
  • ニンシキ ジョウホウ オ リヨウ シタ エイスウジ コンザイ ブンショ カラ ノ モジ キリダシ ト ニンシキ

Search this article

Abstract

Generally speaking, Japanese OCR cannot easily read Japanese documents that also contain alphanumeric data, bacause of the proportional pitch setting of alphanumeric characters displaced in the fixed pitch setting of the Japanese document.<br> This paper describes how to extract character candidates from combinations of small patterns that may be components of separable Japanese characters or slim patterns as alphanumeric characters, and how to select true character patterns from character candidates. We propose a new segmentation and recognition method for alphanumeric-mixed documents based on pattern recognition information such as similarities, pattern sizes and character kinds.<br> The method was tested on alphanumeric-mixed documents, which were 51 pages of technical journals and transactions containing 68, 867 characters. The resulting segmentation rate was 99.75% and the recognition rate was 99.05%, so we conclude that this method may be applied to Japanese OCR.

Journal

References(18)*help

See more

Details 詳細情報について

Report a problem

Back to top