An Integrated Approach for Table Detection and Structure Recognition
Abstract
Detecting and identifying the table structure is
an important issue in document digitization. Although there
have been many great strides based on current deep learning
techniques, table structure identification is still a difficult
and arduous problem, especially when solving the problem
of digitizing text in practice. The paper proposes a solution
to digitize table documents based on the Cascade R-CNN
HRNet network to detect, classify tables and integrate image
processing algorithms to improve table data identification
results. The proposed algorithm proved effective on real data
- the hydrometeorological station record book contains tables
including simple and complex structures tables with over 98%
accuracy.