Abstract:
Separating overlapping text and non-text from images is one of the challenging tasks in
document analysis and segmentation. It was previously difficult to separate text and
non-text due to a lack of information available in existing signature datasets consisting of
RGB Images. A hyper-spectral image depicts pixel information into a number of chan nels than can overcome the constraint of limited pixel information with reference to RGB
images. There is no public dataset for hyper-spectral document images available, hence
performed experiments on collected HSI document images captured by hyper-spectral
specialized cameras. Different techniques like FIPPI, PPI, NFINDR, and ATGP are
used for extraction of end members e.g. Signature, printed text, etc. The HSI image is
regenerated based on extracted end members using SAM and SID. A process is proposed
in this research which use these end-member extraction methods with a combination of
5 steps. HSI Image is preprocessed in first step and its spectral signature is generated
in the second step using end-member extraction techniques later it is converted into an
image using SAM and SID Classifiers in the third step and post-processed using the
connected component analysis-based post-processing technique in the fourth step. The
post-processed image is evaluated using precision, recall, and f1 score in the last step.
The proposed methodology is producing more than 50% precision. Deep learning pro vides a better approach and better results than traditional processes. Overlapping and
Non-overlapping signatures on printed text can also be separated using deep learning.
A Hybrid SN [1] named deep neural network-based technique is available for extraction
of end members. It uses a combination of 3D and 2D CNN for learning end members
and then extracts end members from HSI Image. It is applied to the current private HSI
dataset for the separation of overlapping signatures on printed texts and a precision of
78% was achieved. The separation of signature and printed text is solved using these
two proposed approaches and it can be improved in upcoming research.