Abstract:
Most challenging problem that is faced while applying deep learning solutions to
any problem is to find large amount of data. A recent breakthrough that is being
looked as a solution to this problem is, GANs (Generative adversarial networks),
that produce new data. GANs have many great and wonderful applications but the
most important one is to augment data for many different types of problems where
getting enough amount of data is a challenge. This is a great application and people have started using this technique in many different type areas. But relying too
much on GANs to augment data, can be catastrophic. That is what we are going to
prove in this work. The problem we have chosen to prove this point is Security of
text CAPTCHAs from deep learning attacks. For attacking a CAPTCHA scheme,
the problem is to get enough labelled data. In recent state-of-the-art work, various
CAPTCHA schemes have been broken by using GANs to produce large amount
of augmented data and then using this data to train CAPTCHA solvers. But in
all these works, limitation of GANs to learn has been overlooked. In this work
we are going to prove that given enough random features to the data, GANs can
fail to learn and hence start producing garbage outputs which starts worsening the
efficiency of the model rather than improving it. In this work, we develop new features for text CAPTCHAs that induce huge randomness to the CAPTCHA dataset and hence make it difficult for GAN to learn. We use state of the art GAN named
Pix2Pix in this work to augment CAPTCHA dataset. We test accuracy of vari ous CAPTCHA solvers on our CAPTCHAs with and without using Pix2Pix GAN
and show that efficiency of CAPTCHA solvers significantly drops when GAN is
used. We also develop features for CAPTCHAs that make it difficult to solve for
deep learning detectors even without using GAN and hence propose a CAPTCHA
scheme that is secure from deep learning attacks.