Abstract:
Although recent advancements in pose and appearance control by denoising diffusion
models have democratized high-quality human image synthesis resulting in comprehensive mode coverage of the learned data distribution and increasing diversity of the
generated samples, they also introduce an exploitable pathway for easily accessible adversarial attacks. This thesis delves in to a critical and previously unexplored aspect of
person image synthesis by denoising diffusion models – their potential vulnerability to
adversarial attacks via pose and appearance control. By studying the pose-guided image
synthesis, we have devised dedicated adversarial attack tailored to various approaches
for handling different modes of inputs and divide them into two distinct groups: Frequency perturbations, Gaussian aberration, Ghosting, Intensity Transformation based
adversarial attack applied to the source image and incorrect mapping based adversarial
attack applied to the target pose). Our proposed pose and appearance control based adversarial attack method can facilitate precision-crafted, highly efficient and low barrier
to entry attacks. By conducting thorough empirical study, we advocate for the adoption
of the frequency-based adversarial attack and incorrect-mapping adversarial attack due
to its perceptual deceptiveness, remarkable effectiveness and strategic finesse.