An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems

An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems. de la Asunción, M., Mantas, J. M., Castro-Díaz, M. J., & Fernández-Nieto, E. D. Journal of Parallel and Distributed Computing, 72(9):1065-1072, 2012.

Paper doi abstract bibtex

The numerical solution of two-layer shallow water systems is required to simulate accurately stratified fluids, which are ubiquitous in nature: they appear in atmospheric flows, ocean currents, oil spills, etc. Moreover, the implementation of the numerical schemes to solve these models in realistic scenarios imposes huge demands of computing power. In this paper, we tackle the acceleration of these simulations in triangular meshes by exploiting the combined power of several CUDA-enabled GPUs in a GPU cluster. For that purpose, an improvement of a path conservative Roe-type finite volume scheme which is specially suitable for GPU implementation is presented, and a distributed implementation of this scheme which uses CUDA and MPI to exploit the potential of a GPU cluster is developed. This implementation overlaps MPI communication with CPU�GPU memory transfers and GPU computation to increase efficiency. Several numerical experiments, performed on a cluster of modern CUDA-enabled GPUs, show the efficiency of the distributed solver.

@article{jpdc2012,
author = {de la Asunci\'on, Marc and Jos\'e M. Mantas and Castro-D\'iaz, M. J. and Fern\'andez-Nieto, E. D. },
abstract = {The numerical solution of two-layer shallow water systems is required to simulate accurately stratified fluids, which are ubiquitous in nature: they appear in atmospheric flows, ocean currents, oil spills, etc. Moreover, the implementation of the numerical schemes to solve these models in realistic scenarios imposes huge demands of computing power. In this paper, we tackle the acceleration of these simulations in triangular meshes by exploiting the combined power of several CUDA-enabled GPUs in a GPU cluster. For that purpose, an improvement of a path conservative Roe-type finite volume scheme which is specially suitable for GPU implementation is presented, and a distributed implementation of this scheme which uses CUDA and MPI to exploit the potential of a GPU cluster is developed. This implementation overlaps MPI communication with CPU�GPU memory transfers and GPU computation to increase efficiency. Several numerical experiments, performed on a cluster of modern CUDA-enabled GPUs, show the efficiency of the distributed solver.},
journal = {Journal of Parallel and Distributed Computing},
number = {9},
pages = {1065-1072},
title = {{A}n {MPI}-{CUDA} implementation of an improved {R}oe method for two-layer shallow water systems},
volume = {72},
year = {2012},
doi = {http://dx.doi.org/10.1016/j.jpdc.2011.07.012},
url_Paper = {http://hdl.handle.net/11441/32925},
}

Downloads: 0

{"_id":"YQTScoKB8MvcvrFBC","bibbaseid":"delaasuncin-mantas-castrodiaz-fernndeznieto-anmpicudaimplementationofanimprovedroemethodfortwolayershallowwatersystems-2012","downloads":0,"creationDate":"2017-10-17T11:21:31.906Z","title":"An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems","author_short":["de la Asunción, M.","Mantas, J. M.","Castro-Díaz, M. J.","Fernández-Nieto, E. D."],"year":2012,"bibtype":"article","biburl":"http://personal.us.es/edofer/wp-content/uploads/2018/04/BibtexEnrique-1.txt","bibdata":{"bibtype":"article","type":"article","author":[{"propositions":["de","la"],"lastnames":["Asunción"],"firstnames":["Marc"],"suffixes":[]},{"firstnames":["José","M."],"propositions":[],"lastnames":["Mantas"],"suffixes":[]},{"propositions":[],"lastnames":["Castro-Díaz"],"firstnames":["M.","J."],"suffixes":[]},{"propositions":[],"lastnames":["Fernández-Nieto"],"firstnames":["E.","D."],"suffixes":[]}],"abstract":"The numerical solution of two-layer shallow water systems is required to simulate accurately stratified fluids, which are ubiquitous in nature: they appear in atmospheric flows, ocean currents, oil spills, etc. Moreover, the implementation of the numerical schemes to solve these models in realistic scenarios imposes huge demands of computing power. In this paper, we tackle the acceleration of these simulations in triangular meshes by exploiting the combined power of several CUDA-enabled GPUs in a GPU cluster. For that purpose, an improvement of a path conservative Roe-type finite volume scheme which is specially suitable for GPU implementation is presented, and a distributed implementation of this scheme which uses CUDA and MPI to exploit the potential of a GPU cluster is developed. This implementation overlaps MPI communication with CPU�GPU memory transfers and GPU computation to increase efficiency. Several numerical experiments, performed on a cluster of modern CUDA-enabled GPUs, show the efficiency of the distributed solver.","journal":"Journal of Parallel and Distributed Computing","number":"9","pages":"1065-1072","title":"An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems","volume":"72","year":"2012","doi":"http://dx.doi.org/10.1016/j.jpdc.2011.07.012","url_paper":"http://hdl.handle.net/11441/32925","bibtex":"@article{jpdc2012,\r\nauthor = {de la Asunci\\'on, Marc and Jos\\'e M. Mantas and Castro-D\\'iaz, M. J. and Fern\\'andez-Nieto, E. D. },\r\nabstract = {The numerical solution of two-layer shallow water systems is required to simulate accurately stratified fluids, which are ubiquitous in nature: they appear in atmospheric flows, ocean currents, oil spills, etc. Moreover, the implementation of the numerical schemes to solve these models in realistic scenarios imposes huge demands of computing power. In this paper, we tackle the acceleration of these simulations in triangular meshes by exploiting the combined power of several CUDA-enabled GPUs in a GPU cluster. For that purpose, an improvement of a path conservative Roe-type finite volume scheme which is specially suitable for GPU implementation is presented, and a distributed implementation of this scheme which uses CUDA and MPI to exploit the potential of a GPU cluster is developed. This implementation overlaps MPI communication with CPU�GPU memory transfers and GPU computation to increase efficiency. Several numerical experiments, performed on a cluster of modern CUDA-enabled GPUs, show the efficiency of the distributed solver.},\r\njournal = {Journal of Parallel and Distributed Computing},\r\nnumber = {9},\r\npages = {1065-1072},\r\ntitle = {{A}n {MPI}-{CUDA} implementation of an improved {R}oe method for two-layer shallow water systems},\r\nvolume = {72},\r\nyear = {2012},\r\ndoi = {http://dx.doi.org/10.1016/j.jpdc.2011.07.012},\r\nurl_Paper = {http://hdl.handle.net/11441/32925},\r\n}\r\n","author_short":["de la Asunción, M.","Mantas, J. M.","Castro-Díaz, M. J.","Fernández-Nieto, E. D."],"key":"jpdc2012","id":"jpdc2012","bibbaseid":"delaasuncin-mantas-castrodiaz-fernndeznieto-anmpicudaimplementationofanimprovedroemethodfortwolayershallowwatersystems-2012","role":"author","urls":{" paper":"http://hdl.handle.net/11441/32925"},"metadata":{"authorlinks":{"fernández-nieto, e":"https://bibbase.org/show?bib=http%3A%2F%2Fpersonal.us.es%2Fedofer%2Fwp-content%2Fuploads%2F2018%2F04%2FBibtexEnrique-1.txt&theme=simple"}},"downloads":0,"html":""},"search_terms":["mpi","cuda","implementation","improved","roe","method","two","layer","shallow","water","systems","de la asunción","mantas","castro-díaz","fernández-nieto"],"keywords":[],"authorIDs":["59e5e7bb04a4edb416000032","5a12a7480bcd9ad64a000009","5de97f91768c31df0100018c","5e1e47882e41a7de01000054","5e4bd2a0a6b53fde0100011d","AW7oCLbYkTjeAbpH8","Bx9AZXh7AEh5Gj4AT","ar6sNvEdeafkwJtxP"],"dataSources":["LFemCBxaBpdSBmEqh"]}