Abstract:
Fixed allocation of GPU resources to virtual machines increases idle time in utilization
of GPU resources when the workloads are being executed on a different machine and
increases the cost of hardware as it requires GPUs for every virtual machine. Recent
solutions optimise scheduling algorithms in container orchestration environments to distribute workloads across machines having GPUs directly attached to them. However
if the workloads are distributed across different machines but require GPU for short
periods, GPU resources will stay idle on different machines for remaining time and that
results in increased cost and under-utilization of the available resources. To address this
under-utilization problem we present a framework to arrange available resources in a
way that it allocates GPU to a machine only when required for processing and after
processing that GPU can be shared with other machines for their workloads. The Key
of our framework is to create a pool of all the available GPUs and then reserve a GPU
for workload if it requests processing and add that GPU back to the pool of available
resources once released from workload. Therefore, this framework assures the maximum
utilization of GPUs with minimum available resources that results in significant decrease
in cost as well. Furthermore this framework proposes integration container orchestration
through kubernetes, provisioning resources and managing kubernetes clusters through
Rancher. This provides an end to end infrastructure to deploy workloads in a containerized environment and improve utilization of available resources. Experiment results
show that with our approach there’s very little overhead with time but we do not need
to directly attach GPU on each virtual machine to execute workloads.