Measuring and Understanding Extreme-Scale Application Resilience: A Field Study of 5, 000, 000 HPC Application Runs. Martino, C. D.; Kramer, W.; Kalbarczyk, Z.; and Iyer, R. K. In 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, DSN 2015, Rio de Janeiro, Brazil, June 22-25, 2015, pages 25–36, 2015.
Measuring and Understanding Extreme-Scale Application Resilience: A Field Study of 5, 000, 000 HPC Application Runs [link]Paper  doi  bibtex   
@inproceedings{DBLP:conf/dsn/MartinoKKI15,
  author    = {Catello Di Martino and
               William Kramer and
               Zbigniew Kalbarczyk and
               Ravishankar K. Iyer},
  title     = {Measuring and Understanding Extreme-Scale Application Resilience:
               {A} Field Study of 5, 000, 000 {HPC} Application Runs},
  booktitle = {45th Annual {IEEE/IFIP} International Conference on Dependable Systems
               and Networks, {DSN} 2015, Rio de Janeiro, Brazil, June 22-25, 2015},
  pages     = {25--36},
  year      = {2015},
  crossref  = {DBLP:conf/dsn/2015},
  url       = {https://doi.org/10.1109/DSN.2015.50},
  doi       = {10.1109/DSN.2015.50},
  timestamp = {Sun, 21 May 2017 01:00:00 +0200},
  biburl    = {https://dblp.org/rec/bib/conf/dsn/MartinoKKI15},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}
Downloads: 0