Abstract:
Finding optimal tuning parameters for Graphics Processing Unit (GPU) Compute Kernels is typically undertaken as an optimization problem solved through search techniques. Due to the unknown nature of the optimization surface, an exhaustive search is required for best results. Such a methodology consumes considerable compute resources as well as time due to which it is impractical for production software. This thesis describes a framework that uses deep learning sequence models to predict the optimal tuning parameters for GPU compute kernels, solely on the basis of input tensor parameter values. The models are first trained on the available dataset, which contains both the input and corresponding optimum output parameter values. The model, from within the large search-space, learns the underlying multi-dimensional optimization surface or manifold and is able to predict, with high accuracy, the optimal tuning configuration even for unseen set of input tensor values. A modified beam search technique has also been proposed and incorporated in the prediction state of the framework which ensures that the predicted output parameters satisfy hardware constraints dictated by the GPU architecture. The framework has been tested on half and full precision modes of four different kernels from the MIOpen dataset. By incorporating beam-search and output parameter constraint satisfaction, the framework can predict with more than 90% accuracy, the optimum parameters for kernels which otherwise take hours to tune. As a result, it is able to substantially reduce the development time and compute resources required to tune unseen input configurations in production environment. This in turn translates to shorter development cycles, reduced costs of development and better user experience.