Abstract:
Parallel Sorting algorithms are widely used in a diverse range
of High-Performance Computing applications. We consider the
scalability and performance of these algorithms on distributed-
memory clusters. We rst study the scalability and performance
of popular existing parallel sorting algorithms using various data
distributions generated by our application. These include uni-
form distributions, distributions generated by a fractal process,
highly degenerate distributions, and distributions in which data
is strongly clustered around zero. The experimental results show
that while some algorithms perform well for uniform distribu-
tions, they fail to perform up to the mark for the non-uniform
cases. We present an optimized version of parallel histogram
sort, which is deterministic, stable and generally performs well
on a diverse range of data distributions. Our proposed algorithm
is
exible enough to allow the desired degree of load balancing
using a user-speci ed threshold value.