dc.description.abstract |
MATLAB is a high level language and computing environment widely used to write
technical code. MATLAB is extensively used in a number of scientific fields such as digital
image processing, digital signal processing etc. The significance of MATLAB programming is
its rich library of built-in functions and ease of code writing. It offers a simple and easy way to
write programming syntax, hence the learning curve is relatively small. However, MATLAB
uses an interpreter which makes it slower, specially while executing loops. This slowness
becomes a performance bottleneck in programs which use loops massively. In the field of digital
image processing, for example, matrix operations usually involve many nested loop structures.
On the other hand, MATLAB provides by the means of MEX files, interface to other
programming languages like C and FORTRAN to be used in MATLAB.
NVIDIA’s parallel computing architecture CUDA (Compute Unified Device Architecture) is a
revolutionary parallel computing paradigm which utilizes GPUs for general purpose
programming. CUDA uses massively parallel thread architecture and provides a programming
interface in which threads are executed regardless of which CUDA enabled GPU they are
running on. This makes a CUDA program easily scalable to CUDA enable GPUs with higher
number of processing cores. A program in CUDA is written in either in C for CUDA (an
extension of C language with CUDA) or in CUDA Driver API which is a low level programming
paradigm.
NVIDIA provides a MATLAB CUDA plug-in which is used to interface CUDA with MATLAB.
By the means of this plug-in, it is possible to call functions written in CUDA into MATLAB like
any other MATLAB function.
The work carried out in this thesis is to investigate the benefits of CUDA architecture in
MATLAB, in order to accelerate MATLAB’s slower processing. The main focus of our research
is on MATLAB’s loops which are the main reason of MATLAB’s performance bottleneck. We
used CUDA’s parallel processing architecture for simultaneous execution of each iteration of
loops and hence accelerating the overall execution. We take an application example of JPEG
compression and implement it on CUDA. We present a brief analysis of different parts of the
algorithm in order to demonstrate which parts cause the performance bottleneck and how they
can be executed in parallel on GPU. As a proof of concept, we provide different benchmarking
results of CPU execution vs. GPU execution along with a brief discussion on the quality of
x
results. Our results show that CUDA can be successfully used in order to harness the slow
execution |
en_US |