Abstract:
Deep neural networks (DNNs) are extensively being used in multiple fields
ranging from object classification and medical image analysis to self-driving
cars. Vulnerability of such networks to the adversarial perturbations is a
major area of concern due to its critical role. These perturbations may be
imperceptible to human eyes yet they are strong enough to fool state-of-theart DNN classifiers. Contrary to already proposed methods of direct pixel
manipulation, we propose a steganography based technique to generate adversarial perturbations to fool deep models on any image. The proposed perturbations are computed in a transform domain where a single secret image
embedded in any target image makes any deep model misclassify the target
image with high probability. The attack resulting from our perturbation is
ideal for black-box setting, as it does not require any information about the
target model. Moreover, being a non-iterative technique, our perturbation
estimation remains computationally efficient. The computed perturbations
are also imperceptible to humans while they achieve high fooling ratios for the
models trained on large-scale ImageNet dataset. We demonstrate successful
fooling of ResNet-50, VGG-16, Inception-V3 and MobileNet-V2, achieving
up to 89% fooling of these popular classification models.