Abstract:
A large number of comprehensive variety documents are consist of scanned images, in which tables stored summarized facts and valuable information structurally. Detection of tables is the initial step in extracting valuable information from document images. Recently, many researchers used vision-based deep learning techniques for table localization, which requires many labelled training examples. Due to its expensive labelling cost and time consumption, it is necessitating to develop some level of Semi-Supervised learning (SSL) approach. Semi-supervised learning (SSL) trains a model on many unlabeled data to improve predictive performance.
In this thesis, I have used the SSL method "Consistency-based Self Training" to generate the artificial labels for the semanticity preserved augmented unlabeled data and train the model to predict these artificial labels. Generic SSL models are more prone to generate biased predictions because of foreground-background imbalance in the table detection task. Background overfitting is being handled by parent-child shared learning framework, in which different styles augmented images forward to both parent and child. The parent model predicts pseudo labels, and the child updates the parent model weight via Exponential Moving Average. Faster R-CNN ResNet-50 FPN model first trained on comparatively small labelled tables dataset till model convergence. Trained model separately saved as parent and child duplicates to perform high confidence pseudo labels prediction (ground truth) and loss calculation (backward propagation) on different models. Only those parent predictions considered accurate whose confidence is more than 0.7 after the Non-max suppression stage in Faster R-CNN. Performance analysis carried out on the public benchmark data set TableBank: We received a 0.897 F1-score on the TableBank test-set; it shows that our approach generates comparable results to state-of-the-art techniques even with using only a 10% labeled dataset.