Abstract:
Conventional computer vision applications are highly dependent on handcrafted features for classification and object detection problems. The recent
development in the domain of deep learning has solved many complex problems with automatic features selection. However, deep learning algorithms
are data-hungry and require a large amount of labeled data for training.
Due to huge variations in the types and layouts of tables, table detection
and structure extraction from document images has been remained an inter esting topic for many researchers for the past many decades but until today,
there is no reliable solution available that can fit all layouts and types. In this
paper, we present a heuristics-based approach that utilizes positional features
to extract structure and information from invoice document images. Due to
the nature of the problem, which includes table detection, table structure
recognition, and information extraction, currently, there is no single deep
learning model that detects, recognizes and extracts information from tables
end to end. Also, there is no large labeled data set available for invoice
images that can be utilized for the training of deep learning models. Due
to this complexity, we propose that by building a heuristics and rules-based
approach by thorough data analysis coupled with positional features of the
word bounding boxes generated by OCR will create a foundation for reliable table structured recognition and information extraction. We proposed
a single pipeline instead of three which includes image prepossessing, word
bounding box extraction using OCR, table structure recognition, and information extraction. Table structure recognition and information extraction
are coupled to extract reliable information from invoices.