Information Extraction from Tables in Document Images using Deep Learning

Shahzad, Muhammad Ali

DSpace Home
→
E-Theses
→
SEECS
→
Computer Science
→
MS
→
View Item

Information Extraction from Tables in Document Images using Deep Learning

Shahzad, Muhammad Ali

URI: http://10.250.8.41:8080/xmlui/handle/123456789/37602

Date: 2019

Abstract:

features for tasks such as visual object detection. The recent success of deep learning in automatically extracting representative and powerful features from images has brought a paradigm shift in this area. As a side effect, decades of research into hand-crafted features has largely been shelved. We present a two step approach for table detection and structure extraction. In table detection we leverage a deep learning based table detection model with hand-craft features from a classical table detection method. We demonstrate that by using a suitable encoding of hand-crafted features, the deep learning model is able to outperform on the existing techniques. Experiments on publicly available UNLV dataset demonstrate the effectiveness of our system in comparison with the state-of-the-art methods. Our approach prepares the images for the deep learning pipeline by extracting the hand crafted features and encoding them in the image. These encoded linear visual clues are input into the image and fed to the deep learning pipeline. The pipeline proposes the table regions along with the confidence score. The proposed table re gions are then fed into the table recognition pipeline. In this pipeline, the images are first pre-processed to remove noise and other artifacts. convert the three channel image into a binary image, this enables us to remove the background and noise artifacts from the image. Then we apply morphological processing on the images which connects the individual characters into the blob structures in the images. These images are then provided as an input to the bi-directional Recurrent Neural Network with Gated Recurrent Units (GRU) followed by a fully-connected layer with softmax activation to classify the input as either a row or column. We have trained the system on our own dataset and bench-marked our system on publicly available UNLV dataset on which it outperforms the TRECs state-of-the-art table structure extraction system by a significant margin. It is to be noted that no part of the UNLV test dataset is used in the training of the table detection and structure extraction pipelines.