Abstract:
Generative Adversarial Networks (GANs) have gained importance because of their
tremendous unsupervised learning capability and enormous applications in data
generation, for example, text to image synthesis, synthetic medical data generation, video generation, and artwork generation. Hardware acceleration for GANs
become challenging due to the intrinsic complex computational phases, which require efficient data management during the training and inference. In this work,
we propose an architecture for Generative Adversarial Networks (GANs) which
comprises of a distributed on-chip memory architecture, which aims at efficiently
handling the data for complex computations involved in GANs, such as skipping zeros during strided convolution or inserting zeros in transposed convolution. We also
propose a controller that improves the computational efficiency by pre-arranging
the data from either the off-chip memory or the computational units before storing it in the on-chip memory. Our architectural enhancement supports to achieve
3.65x performance improvement in state-of-the-art.