Performance model for parallel matrix multiplication with dryad: Dataflow graph runtime. Li, H., Fox, G., C., & Qiu, J. In Cloud and Green Computing (CGC), 2012 Second International Conference on, pages 675-683, 2012. IEEE.
doi  abstract   bibtex   
In order to meet the big data challenge of today's society, several parallel execution models on distributed memory architectures have been proposed: MapReduce, Iterative MapReduce, graph processing, and dataflow graph processing. Dryad is a distributed data-parallel execution engine that model program as dataflow graphs. In this paper, we evaluated the runtime and communication overhead of Dryad in realistic settings. We proposed a performance model for Dryad implementation of parallel matrix multiplication (PMM) and extend the model to MPI implementations. We conducted experimental analyses in order to verify the correctness of our analytic model on a Windows cluster with up to 400 cores, Azure with up to 100 instances, and Linux cluster with up to 100 nodes. The final results show that our analytic model produces accurate predictions within 5% of the measured results. We proved some cases that using average communication overhead to model performance of parallel matrix multiplication jobs on common HPC clusters is the practical approach. © 2012 IEEE.
@inproceedings{
 title = {Performance model for parallel matrix multiplication with dryad: Dataflow graph runtime},
 type = {inproceedings},
 year = {2012},
 pages = {675-683},
 publisher = {IEEE},
 id = {50463d40-c5bb-3f3c-a332-59cba6808f14},
 created = {2017-12-18T21:44:04.661Z},
 file_attached = {false},
 profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
 last_modified = {2020-05-11T14:43:45.073Z},
 read = {false},
 starred = {false},
 authored = {true},
 confirmed = {true},
 hidden = {false},
 citation_key = {Li2012},
 source_type = {CONF},
 folder_uuids = {36d8ccf4-7085-47fa-8ab9-897283d082c5},
 private_publication = {false},
 abstract = {In order to meet the big data challenge of today's society, several parallel execution models on distributed memory architectures have been proposed: MapReduce, Iterative MapReduce, graph processing, and dataflow graph processing. Dryad is a distributed data-parallel execution engine that model program as dataflow graphs. In this paper, we evaluated the runtime and communication overhead of Dryad in realistic settings. We proposed a performance model for Dryad implementation of parallel matrix multiplication (PMM) and extend the model to MPI implementations. We conducted experimental analyses in order to verify the correctness of our analytic model on a Windows cluster with up to 400 cores, Azure with up to 100 instances, and Linux cluster with up to 100 nodes. The final results show that our analytic model produces accurate predictions within 5% of the measured results. We proved some cases that using average communication overhead to model performance of parallel matrix multiplication jobs on common HPC clusters is the practical approach. © 2012 IEEE.},
 bibtype = {inproceedings},
 author = {Li, Hui and Fox, Geoffrey Charles and Qiu, Judy},
 doi = {10.1109/CGC.2012.23},
 booktitle = {Cloud and Green Computing (CGC), 2012 Second International Conference on}
}

Downloads: 0