Solving shallow-water systems in 2D domains using finite volume methods and multimedia SSE instructions. Díaz, Manuel J., C., García-Rodríguez, J.-A., González-Vida, J., & Parés, C. J. Comput. Appl. Math., 221(1):16–32, 2008.
Solving shallow-water systems in 2D domains using finite volume methods and multimedia SSE instructions [link]Paper  abstract   bibtex   
The goal of this paper is to construct efficient parallel solvers for 2D hyperbolic systems of conservation laws with source terms and nonconservative products. The method of lines is applied: at every intercell a projected Riemann problem along the normal direction is considered which is discretized by means of well-balanced Roe methods. The resulting 2D numerical scheme is explicit and first-order accurate. In [M.J. Castro, J.A. García, J.M. González, C. Pares, A parallel 2D Finite Volume scheme for solving systems of balance laws with nonconservative products: Application to shallow flows, Comput. Methods Appl. Mech. Engrg. 196 (2006) 2788–2815] a domain decomposition method was used to parallelize the resulting numerical scheme, which was implemented in a PC cluster by means of MPI techniques. In this paper, in order to optimize the computations, a new parallelization of SIMD type is performed at each MPI thread, by means of SSE (“Streaming SIMD Extensions”), which are present in common processors. More specifically, as the most costly part of the calculations performed at each processor consists of a huge number of small matrix and vector computations, we use the Intel© Integrated Performance Primitives small matrix library. To make easy the use of this library, which is implemented using assembler and SSE instructions, we have developed a C++ wrapper of this library in an efficient way. Some numerical tests were carried out to validate the performance of the C++ small matrix wrapper. The specific application of the scheme to one-layer Shallow-Water systems has been implemented on a PC’s cluster. The correct behavior of the one-layer model is assessed using laboratory data.
@Article{CastroDiaz2008d,
  author   = {Castro D{\'i}az, Manuel J. and Garc{\'i}a-Rodr{\'i}guez, J.-A. and Gonz{\'a}lez-Vida, J.-M. and Par{\'e}s, Carlos},
  journal  = {J. Comput. Appl. Math.},
  title    = {{S}olving shallow-water systems in 2{D} domains using finite volume methods and multimedia {SSE} instructions},
  year     = {2008},
  number   = {1},
  pages    = {16–32},
  volume   = {221},
  abstract = {The goal of this paper is to construct efficient parallel solvers for 2D hyperbolic systems of conservation laws with source terms and nonconservative products. The method of lines is applied: at every intercell a projected Riemann problem along the normal direction is considered which is discretized by means of well-balanced Roe methods. The resulting 2D numerical scheme is explicit and first-order accurate. In [M.J. Castro, J.A. Garc{\'i}a, J.M. Gonz{\'a}lez, C. Pares, A parallel 2D Finite Volume scheme for solving systems of balance laws with nonconservative products: Application to shallow flows, Comput. Methods Appl. Mech. Engrg. 196 (2006) 2788–2815] a domain decomposition method was used to parallelize the resulting numerical scheme, which was implemented in a PC cluster by means of MPI techniques.

In this paper, in order to optimize the computations, a new parallelization of SIMD type is performed at each MPI thread, by means of SSE (“Streaming SIMD Extensions”), which are present in common processors. More specifically, as the most costly part of the calculations performed at each processor consists of a huge number of small matrix and vector computations, we use the Intel© Integrated Performance Primitives small matrix library. To make easy the use of this library, which is implemented using assembler and SSE instructions, we have developed a C++ wrapper of this library in an efficient way. Some numerical tests were carried out to validate the performance of the C++ small matrix wrapper. The specific application of the scheme to one-layer Shallow-Water systems has been implemented on a PC’s cluster. The correct behavior of the one-layer model is assessed using laboratory data.},
  url      = {http://www.sciencedirect.com/science/article/pii/S0377042707005201},
}

Downloads: 0