Abstract:
This thesis investigates the effective implementation of the YOLO object recognition
model on NVIDIA Jetson through the application of model quantization techniques.
Specifically, the research focuses on Quantization-Aware Training (QAT) and Asymmetric
Quantization to optimize the model's performance on resource inhibited edge computers.
NVIDIA Jetson devices, compatible and aimed at handling AI tasks in edge computing
scenarios, often face limitations in memory, power, and computational capacity. The
research evaluates the baseline performance of the YOLO model on a standard NVIDIA
Jetson device and detail the methodologies of applying QAT and Asymmetric
Quantization, followed by a comparative analysis of their effects. The results indicate that
while quantization techniques lead to a slight decrease in accuracy, they substantially
enhance inference time. This improvement in inference speed underscores the potential for
deploying the quantized YOLO model in real-time scenarios where inference time is
prioritized over accuracy. This thesis contributes to the fields of edge computing and realtime image processing by providing a comprehensive framework for deploying highperformance AI models in constrained environments. The findings demonstrate that model
quantization is a viable strategy for achieving efficient and robust real-time object
recognition on devices that have resource limitations.