Checking performance of GANs on difficult Captchas

Jamali, Abdul Fareed

DSpace Home
→
E-Theses
→
SEECS
→
Electrical Engineering
→
MS
→
View Item

Checking performance of GANs on difficult Captchas

Jamali, Abdul Fareed

URI: http://10.250.8.41:8080/xmlui/handle/123456789/37916

Date: 2021

Abstract:

The Generative Adversarial Network (GAN) seems to be a hot topic for research where the solution has shown a tremendous amount and capacity to learn and duplicate from sketches to images and then even Network packets. GANs are generative model as they tend to create new things based on their training data, as an example, they can learn to create a human face image which seems like the images belongs to a real person, but they do not exist. The Generator and Discriminator part of GAN keeps improving the solution it is being used for by continuous learning and reforming the output. This approach has been used and praised for the solutions it provides in literature, and it does perform tremendously well in many different aspects it has been used or tested so far. GANs have been used to mimic packets of various applications: that were blocked in a Network and have successfully penetrated the security layer to serve the purpose. The related work conducted so far has only been the applications that GAN can be used for and praised for its versatility, to be used and perform well, for so many different applications. This solution has some limitations that have not yet been exposed to and reported in the literature. The pres ence of GAN can be quite considerable when it starts to consume resources xii LIST OF FIGURES xiii to train when it comes to the Network Security perspective. Not just the resources but also the training time that GAN needs to get properly trained to deceive security devices are considerable. But that is a topic related to being discovered and reported by Network security experts. This research aims to expose the shortcomings in GAN when used to learn CAPTCHAs (Completely Automated Public Turing test to tell Com puters and Humans Apart) in conventional neural networks and deep neural networks. We may have encountered many captchas while accessing a site, placing an online purchase order, signing up, etc. CAPTCHA systems are generally deployed as a security mechanism in web applications. Captchas have different security features that make it harder for GANs to learn, and we aim to exploit the weakness and then provide captchas those set of features that will further increase the difficulty for GAN to learn captchas. Although the complete pipeline that tends to be implemented is more than just fool ing a GAN, the output that a premature trained GAN will produce will be of no use, since it will not be properly trained. That garbage output pro vided to any other system will reduce the performance of the application; that GAN has been primarily used for. The collection of datasets related to text-based captchas is a challenging task that was collected manually, and then the dataset requires labeling, which was a rigorous process. We have shown through our collaborative research that GAN will show different learn ing outputs to the different security features of the captcha and that it will be difficult for GAN to learn the captchas that will be incorporated with the security features that are most problematic for GAN to learn. GANs usually take two input images: a clean image with no security feature and the other LIST OF FIGURES xiv that includes the security features. The pre-processing of clean images for corresponding captchas was again a laborious and time-consuming task. Our research has also contributed to the training of CNN and frcnn based classifiers; different commercial schemes of captchas were collected and la beled for each classifier based on their required pre-processing. Our research has also contributed a novel approach for breaking audio-based captcha. The study of the different noises in the performance of the novel approach was also part of the study. GAN was integrated into the pipeline to study if it can learn all the noise features to either enhance or degrade the performance of our designed system. Since the noises were too hard for the GAN to learn, the system performance ended up downgrading due to the presence of GAN.