An Efficient Method for Solving Broken Characters Problem in Recognition of Vietnamese Degraded Text

  • Nguyen Thi Thanh Tan
  • Ngo Quoc Tao
  • Luong Chi Mai

Abstract

This paper presents anefficient method for solving the broken characters problem in the recognition of Vietnamese degraded text. Basically, the broken characters restoration process consists of three main steps: 1) analyzing and grouping connected components into connected areas; 2) building directed graph from connected areas; 3) applying a best first search A* to all its possible sub-graphs in order to an optimal strategy to rejoin the appropriate connected areas. Our experiments were carried out on the testing dataset, consists of 21690 low quality word images which are exported from 925 different quality of document pages. This methodcorrectly finds 94.37% of the dataset.
Published
2009-10-28
Section
Regular Articles