Design and Evaluation of Shared Memory Communication Benchmarks on Emerging Architectures using MVAPICH2. Xu, S., Hashmi, J. M., Chakraborty, S., Subramoni, H., & Panda, D. K. In Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM) held in conjunction with SC'19, 2019. IEEE/ACM.
abstract   bibtex   
Recent advances in processor technologies have led to highly multi-threaded and dense multi- and many-core HPC systems. The adoption of such dense multi-core processors is widespread in the Top500 systems. Message Passing Interface (MPI) has been widely used to scale out scientific applications. The communication designs for intra-node communication in MPI are mainly based on shared memory communication. The increased core-density of modern processors warrants the use of efficient shared memory communication designs to achieve optimal performance. While there have been various algorithms and data-structures proposed for the producer-consumer like scenarios in the literature, there is a need to revisit them in the context of MPI communication on modern architectures to find the optimal solutions that work best for modern architectures. In this paper, we first propose a set of low-level benchmarks to evaluate various data-structures such as Lamport queues, Fast-Forward queues, and Fastboxes (FB) for shared memory communication. Then, we bring these designs into the MVAPICH2 MPI library and measure their impact on the MPI intra-node communication for a wide variety of communication patterns. The benchmarking results are carried out on modern multi-/many-core architectures including Intel Xeon CascadeLake and Intel Knights Landing.
@inproceedings{shulei-ipdrm19,
  title={{Design and Evaluation of Shared Memory Communication Benchmarks on Emerging Architectures using MVAPICH2}},
  author={Xu, Shulei and Hashmi, Jahanzeb Maqbool and Chakraborty, Sourav and Subramoni, Hari and Panda, Dhabaleswar K.},
  booktitle={Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM) held in conjunction with SC'19},
  year={2019},
  abstract={Recent advances in processor technologies have led to highly multi-threaded and dense multi- and many-core HPC systems. The adoption of such dense multi-core processors is widespread in the Top500 systems. Message Passing Interface (MPI) has been widely used to scale out scientific applications. The communication designs for intra-node communication in MPI are mainly based on shared memory communication. The increased core-density of modern processors warrants the use of efficient shared memory communication designs to achieve optimal performance. While there have been various algorithms and data-structures proposed for the producer-consumer like scenarios in the literature, there is a need to revisit them in the context of MPI communication on modern architectures to find the optimal solutions that work best for modern architectures. In this paper, we first propose a set of low-level benchmarks to evaluate various data-structures such as Lamport queues, Fast-Forward queues, and Fastboxes (FB) for shared memory communication. Then, we bring these designs into the MVAPICH2 MPI library and measure their impact on the MPI intra-node communication for a wide variety of communication patterns. The benchmarking results are carried out on modern multi-/many-core architectures including Intel Xeon CascadeLake and Intel Knights Landing.},
  organization={IEEE/ACM}
}

Downloads: 0