dc.description.abstract |
This project is to achieve computational acceleration of MATLAB image tool box using
NVIDIA’s advance parallel computing architecture named CUDA (Compute Unified Device
Architecture) on a standard PC hardware. During the last couple of years, graphics
processors have experienced a tremendous performance increase. Further, their
instruction set has become remarkably general purpose. Several research projects have
been started to formulate adequate programming abstractions for using GPUs as coprocessors. One example is the Compute Unified Device Architecture (CUDA) by graphics
card vendor NVIDIA.
The advent of multicore CPUs and manycore GPUs means that mainstream processor
chips are now parallel systems. Furthermore, their parallelism continues to scale with
Moore’s law. The challenge is to develop application software that transparently scales its
parallelism to leverage the increasing number of processor cores, much as 3D graphics
applications transparently scale their parallelism to manycore GPUs with widely varying
numbers of cores.
NVIDIA is world leader in visual computing technologies and the inventor of the GPU
(Graphics Processing Unit). CUDA developed by NVIDIA is a new technology of generalpurpose computing on the GPU, which makes users develop general GPU programs
easily. CUDA gives developers access to the native instruction set and memory of
the parallel computational elements in CUDA GPUs. Using CUDA, the latest NVIDIA
GPUs effectively become open architectures like CPUs. Unlike CPUs however, GPUs
have a parallel "many-core" architecture, each core capable of running thousands of
threads simultaneously - if an application is suited to this kind of an architecture, the GPU
can offer large performance benefits. This approach of solving general purpose problems
on GPUs is known as GPGPU.
We implemented some of the important and widely used image processing algorithms by
CUDA, such as Huff-Transform, RGB to Gray, Convolution, Rapmat, Discrete Cosine
Transform, Optical flow and many others. Implementation of these Matlab functions on
CUDA resulted in significant performance and efficiency increase. These functions got
their applications in different engineering and scientific domains including medical, space
and multimedia among others.
For parallel computing by CUDA, we should pay attention to two points. Allocating data for
each thread is important. So if better allocation algorithms of the input data are found, the
efficiency of the image algorithms would be improved. In addition, the memory bandwidth
of host device is the bottleneck of the whole speed, so the quick read of input data is also
very important and we should attach importance to it. Obviously, CUDA provides us with a
novel massively data-parallel general computing method, and is cheaper in hardware
implementation. In future, we will do more work on some of the other image processing
algorithms of Matlab and their optimization strategies and how to make the best of the
increasing cores of GPUs for parallel computing. |
en_US |