A Fast and Generic GPU-Based Parallel Reduction Implementation

A Fast and Generic GPU-Based Parallel Reduction Implementation. Jradi, W. A. R., do Nascimento, H. A. D., & Martins, W. S. In XIX Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD 2018), pages 51–62, São Paulo, SP, Brasil, October, 2018. Calebe de Paula Bianchini, Paulo Sérgio Lopes de Souza, Carla Osthoff Ferreira de Barros, Renato Antônio Celso Ferreira, SBC. ISSN 2358-6613, de 01 a 03 de outubro de 2018

Paper abstract bibtex

Reduction operations are extensively employed in many computational problems, where a finite set of numeric elements are combined into a single value, using for this a combining function. A parallel reduction, in turn, is the operation concurrently performed when multiple execution units are available. The present work depicts a GPU-based parallel approach for it, which employs techniques like loop unrolling, persistent threads and algebraic expressions to avoid thread divergence, and was able to outperform the methods currently in use. Experiments conducted to evaluate the approach show that the strategy performs efficiently on both AMD and NVidia’s hardwares, as well as using OpenCL and CUDA, making it portable.

@InProceedings{jnm-fggpri-2018,
  author       = {Walid A. R. Jradi and Hugo A. D. do Nascimento and Wellington S. Martins},
  title        = {A Fast and Generic {GPU}-Based Parallel Reduction Implementation},
  booktitle    = {XIX Simp\'{o}sio em Sistemas Computacionais de Alto Desempenho (WSCAD 2018)},
  year         = {2018},
  pages        = {51--62},
  address      = {S\~{a}o Paulo, SP, Brasil},
  month        = oct,
  organization = {Calebe de Paula Bianchini, Paulo Sérgio Lopes de Souza, Carla Osthoff Ferreira de Barros, Renato Antônio Celso Ferreira},
  publisher    = {SBC},
  note         = {ISSN 2358-6613, de 01 a 03 de outubro de 2018},
  abstract     = {Reduction operations are extensively employed in many computational problems, where a finite set of numeric elements are combined into a single value, using for this a combining function. A parallel reduction, in turn, is the operation concurrently performed when multiple execution units are available. The present work depicts a GPU-based parallel approach for it, which employs techniques like loop unrolling, persistent threads and algebraic expressions to avoid thread divergence, and was able to outperform the methods currently in use. Experiments conducted to evaluate the approach show that the strategy performs efficiently on both AMD and NVidia’s hardwares, as well as using OpenCL and CUDA, making it portable.},
  issn         = {2358-6613},
  keywords     = {GPU, Parallel reduction},
  owner        = {hugo},
  timestamp    = {2018.11.05},
  url          = {https://portaldeconteudo.sbc.org.br/index.php/wscad/issue/view/259/WSCAD2018},
}

Downloads: 0

{"_id":"gv3sM2qCF994E8oNv","bibbaseid":"jradi-donascimento-martins-afastandgenericgpubasedparallelreductionimplementation-2018","author_short":["Jradi, W. A. R.","do Nascimento, H. A. D.","Martins, W. S."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["Walid","A.","R."],"propositions":[],"lastnames":["Jradi"],"suffixes":[]},{"firstnames":["Hugo","A.","D."],"propositions":["do"],"lastnames":["Nascimento"],"suffixes":[]},{"firstnames":["Wellington","S."],"propositions":[],"lastnames":["Martins"],"suffixes":[]}],"title":"A Fast and Generic GPU-Based Parallel Reduction Implementation","booktitle":"XIX Simpósio em Sistemas Computacionais de Alto Desempenho (WSCAD 2018)","year":"2018","pages":"51–62","address":"São Paulo, SP, Brasil","month":"October","organization":"Calebe de Paula Bianchini, Paulo Sérgio Lopes de Souza, Carla Osthoff Ferreira de Barros, Renato Antônio Celso Ferreira","publisher":"SBC","note":"ISSN 2358-6613, de 01 a 03 de outubro de 2018","abstract":"Reduction operations are extensively employed in many computational problems, where a finite set of numeric elements are combined into a single value, using for this a combining function. A parallel reduction, in turn, is the operation concurrently performed when multiple execution units are available. The present work depicts a GPU-based parallel approach for it, which employs techniques like loop unrolling, persistent threads and algebraic expressions to avoid thread divergence, and was able to outperform the methods currently in use. Experiments conducted to evaluate the approach show that the strategy performs efficiently on both AMD and NVidia’s hardwares, as well as using OpenCL and CUDA, making it portable.","issn":"2358-6613","keywords":"GPU, Parallel reduction","owner":"hugo","timestamp":"2018.11.05","url":"https://portaldeconteudo.sbc.org.br/index.php/wscad/issue/view/259/WSCAD2018","bibtex":"@InProceedings{jnm-fggpri-2018,\n author = {Walid A. R. Jradi and Hugo A. D. do Nascimento and Wellington S. Martins},\n title = {A Fast and Generic {GPU}-Based Parallel Reduction Implementation},\n booktitle = {XIX Simp\\'{o}sio em Sistemas Computacionais de Alto Desempenho (WSCAD 2018)},\n year = {2018},\n pages = {51--62},\n address = {S\\~{a}o Paulo, SP, Brasil},\n month = oct,\n organization = {Calebe de Paula Bianchini, Paulo Sérgio Lopes de Souza, Carla Osthoff Ferreira de Barros, Renato Antônio Celso Ferreira},\n publisher = {SBC},\n note = {ISSN 2358-6613, de 01 a 03 de outubro de 2018},\n abstract = {Reduction operations are extensively employed in many computational problems, where a finite set of numeric elements are combined into a single value, using for this a combining function. A parallel reduction, in turn, is the operation concurrently performed when multiple execution units are available. The present work depicts a GPU-based parallel approach for it, which employs techniques like loop unrolling, persistent threads and algebraic expressions to avoid thread divergence, and was able to outperform the methods currently in use. Experiments conducted to evaluate the approach show that the strategy performs efficiently on both AMD and NVidia’s hardwares, as well as using OpenCL and CUDA, making it portable.},\n issn = {2358-6613},\n keywords = {GPU, Parallel reduction},\n owner = {hugo},\n timestamp = {2018.11.05},\n url = {https://portaldeconteudo.sbc.org.br/index.php/wscad/issue/view/259/WSCAD2018},\n}\n\n","author_short":["Jradi, W. A. R.","do Nascimento, H. A. D.","Martins, W. S."],"key":"jnm-fggpri-2018","id":"jnm-fggpri-2018","bibbaseid":"jradi-donascimento-martins-afastandgenericgpubasedparallelreductionimplementation-2018","role":"author","urls":{"Paper":"https://portaldeconteudo.sbc.org.br/index.php/wscad/issue/view/259/WSCAD2018"},"keyword":["GPU","Parallel reduction"],"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://raw.githubusercontent.com/ivatoufg/ivatoufg.github.io/main/publicacoes_ivato.bib","dataSources":["pdiowiFz3NQoprmiW"],"keywords":["gpu","parallel reduction"],"search_terms":["fast","generic","gpu","based","parallel","reduction","implementation","jradi","do nascimento","martins"],"title":"A Fast and Generic GPU-Based Parallel Reduction Implementation","year":2018}