Machine Learning Models for GPU Error Prediction in a Large Scale HPC System. Nie, B., Xue, J., Gupta, S., Patel, T., Engelmann, C., Smirni, E., & Tiwari, D. In 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2018, Luxembourg City, Luxembourg, June 25-28, 2018, pages 95–106, 2018. IEEE Computer Society.
Machine Learning Models for GPU Error Prediction in a Large Scale HPC System [link]Paper  doi  bibtex   2 downloads  
@inproceedings{DBLP:conf/dsn/NieXGPEST18,
  author       = {Bin Nie and
                  Ji Xue and
                  Saurabh Gupta and
                  Tirthak Patel and
                  Christian Engelmann and
                  Evgenia Smirni and
                  Devesh Tiwari},
  title        = {Machine Learning Models for {GPU} Error Prediction in a Large Scale
                  {HPC} System},
  booktitle    = {48th Annual {IEEE/IFIP} International Conference on Dependable Systems
                  and Networks, {DSN} 2018, Luxembourg City, Luxembourg, June 25-28,
                  2018},
  pages        = {95--106},
  publisher    = {{IEEE} Computer Society},
  year         = {2018},
  url          = {https://doi.org/10.1109/DSN.2018.00022},
  doi          = {10.1109/DSN.2018.00022},
  timestamp    = {Mon, 05 Feb 2024 00:00:00 +0100},
  biburl       = {https://dblp.org/rec/conf/dsn/NieXGPEST18.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Downloads: 2