Abstract:
Two-dimensional convolution/correlation is the ubiquitous tool to extract features for
classification in a variety of machine learning and image processing applications such as
CPR (Correlation Pattern Recognition), CNN (Convolution Neural Network), and filter
processing. However, using these tools in Internet-of-Things (IoT)–based applications
face stringent constraints, like limited memory capacity, inadequate computational resources,
and energy resources. The prime objective of this thesis is to propose a set of
algorithms and techniques to reduce the computation workload due to an excessive number
of correlation or convolution operations in CPRs and CNNs respectively. To achieve
this objective, both CPR filters and CNN’s models require their approximated versions
without any accuracy degradation. However, the research focuses on obtaining these approximated
versions for future IoT implementation. This discretion makes the following
contributions: For CNN, (a) to overcome the high computation cost of existing convolution
algorithms, a hybrid algorithm is proposed that integrates the unique computational
advantages of Winograd and spatial convolution, (b) a Particle of Swarm Convolution
Layer Optimization (PSCLO) scaling is proposed that minimize accuracy loss and maximize
the reduction in computational workload to combine both approximations, (c)
an analysis of experimental results of symmetry and tile quantization approximation in
conjunction with PSCLO is performed that finds the trade-off between the intensity of
approximation and accuracy degradation. For CPR, (d) a Weight Quantization Retraining
(WQR) approach is proposed to retrain low-precision quantization weights of the
CPR filter for dynamic fixed point (DFP) and power-of-two(Po2) quantization schemes,
additionally, the Particle of Swarm Optimization technique is employed to fine-tune
performance parameters, (e) pre-processing strategies of log-polar and inverse log-polar
transforms are used to support the low-precision CPR filter quantization, (f) analysis
xi
is performed to compare the advantages of spatially-filters (ST) and frequency-trained
(FT) filters, this analysis is further extended to each domain, either spatially trained or
frequency-trained, to investigate the comparative benefits of Po2 and DFP quantization
schemes, (g) the overall analysis compares the advantages of direct, log-polar, inverse
log-polar, and WQR, which provides a better perspective. For CNN’s, the proposed
techniques and algorithms achieved ∼5.28x fewer multiplication operations without significant
accuracy loss on ResNet-18. For LeNet, that reduction is ∼3.87x and ∼3.93x
on MNIST and Fashion-MNIST respectively. While the additive workload reductions
for the above datasets were ∼2.5x and ∼2.56x respectively. For CIFAR-10 quick network,
the techniques acquire ∼9.28x and ∼8.82x fewer multiplication on CIFAR-10 and
SVHN datasets. The additive workload reductions for these datasets are ∼1.70x and
∼1.33x respectively.For CPR filters, the following results are obtained for a common
dataset. For the direct quantization approach, a compression ratio of 8 achieved 4.37x
speedup without accuracy loss. However, a compression ratio of 4 with a log-polar implementation
achieved 1.12x speedup with 16% accuracy loss. Inverse log-polar with a
compression ratio of 16 acquired 8.90x speedup with 6% accuracy loss. These empirical
investigations demonstrate the effectiveness of the proposed approximation methods for
both CPR and CNN using standard datasets.