Efficient Hybrid Parallelism using MPJ Express on Cluster of Multicore Processors

Javed, Muhammad Ansar

DSpace Home
→
E-Theses
→
SEECS
→
Computer Science
→
MS
→
View Item

Efficient Hybrid Parallelism using MPJ Express on Cluster of Multicore Processors

Javed, Muhammad Ansar

URI: http://10.250.8.41:8080/xmlui/handle/123456789/6684

Date: 2015

Abstract:

In last decade processor technology made huge advancement after the invention of multicore processor for production of higher processing cycles with less power consumption. Thus, most of the clusters and supercomputers are made up of nodes that have multicore CPUs and low power specialized coprocessors like GPUs and Xeon Phi. Traditionally, clusters and supercomputers were divided into shared memory, where processors talk to each other through memory shared between processes and distributed memory machines, where processors talk to each other through message passing over network. With the clusters of multicore and existence of coprocessor in node, software developer need to build software that utilizes underline resource properly. A hybrid of both shared and distributed programming technique is required that is known as hybrid parallelism. In this work we have added hybrid parallelism support in MPJ Express. MPJ Express is an implementation of mpiJava Bindings. In the previous release of MPJ Express (v0.38), it either supports the pure shared memory model (multicore mode) or distributed memory model (cluster mode). We have added a new device named hybrid device, which takes advantage of both multicore and cluster modes. This new device allows MPJ Express to exploit hybrid parallelism seamlessly and transparent to user. Hybrid device enables existing and new applications build on MPJ Express to exploit hybrid parallelism as it does not require application rewriting e ort. In addition, cost of MPJ Express bu ering layer is evaluated and compared with the performance numbers of other Java MPI libraries. The performance evaluation reveals that the hybrid communication device|without any modi cations at application level|helps parallel applications achieve better speedups and scalability by exploiting multicore architecture. Moreover, quanti cation of the cost incurred by bu ering and its impact on overall performance of software is done. Comparative performance is witnessed as hybrid device improves application performance and achieve upto 90% of the theoretical bandwidth available for point-to-point, collective communication and application benchmarks. ii