An Energy-Efficient Abstraction for Simultaneous Breadth-First Searches. McLaughlin, A., Riedy, J., & Bader, D. A. In The IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, September, 2015.
abstract   bibtex   
Optimized GPU kernels are sufficiently complicated to write that they often are specialized to specific input data, target architectures, or applications. This paper presents a multi-search abstraction for computing multiple breadth-first searches in parallel and demonstrates a high-performance, general implementation. Our abstraction removes the burden of orchestrating graph traversal from the user while providing high performance and low energy usage, an often overlooked component of algorithm design. Energy consumption has become a first-class hardware design constraint for both massive and embedded computing platforms. Our abstraction can be applied to such problems as the all-pairs shortest-path problem, community detection, reachability querying, and others. To map graph traversal efficiently to NVIDIA GPUs, our hybrid implementation chooses between processing active vertices with a single thread or an entire warp based on vertex outdegree. For a set of twelve varied graphs, the implementation of our abstraction saves 42% time and 62% energy on average compared to representative implementations of specific applications from existing literature.
@inproceedings{hpec15,
  author = {Adam McLaughlin and Jason Riedy and David A. Bader},
  title = {An Energy-Efficient Abstraction for Simultaneous Breadth-First Searches},
  ejr-withauthor = {Adam McLaughlin and David A. Bader},
  opttags = {parallel; graph; energy},
  booktitle = {The IEEE High Performance Extreme Computing Conference (HPEC)},
  year = 2015,
  month = sep,
  address = {Waltham, MA},
  dom = 17,
  role = {proceedings},
  abstract = {Optimized GPU kernels are sufficiently complicated to write that they often are specialized to specific input data, target architectures, or applications.  This paper presents a multi-search abstraction for computing multiple breadth-first searches in parallel and demonstrates a high-performance, general implementation.  Our abstraction removes the burden of orchestrating graph traversal from the user while providing high performance and low energy usage, an often overlooked component of algorithm design.  Energy consumption has become a first-class hardware design constraint for both massive and embedded computing platforms.  Our abstraction can be applied to such problems as the all-pairs shortest-path problem, community detection, reachability querying, and others.  To map graph traversal efficiently to NVIDIA GPUs, our hybrid implementation chooses between processing active vertices with a single thread or an entire warp based on vertex outdegree.  For a set of twelve varied graphs, the implementation of our abstraction saves 42\% time and 62\% energy on average compared to representative implementations of specific applications from existing literature.},
  file = {material/multi_search_energy.pdf},
  projtag = {xscala, grateful, hpda},
  keywords = {hpda, graph analysis, parallel algorithm},
  ejr-proj = {high-performance-data-analysis, graph-analysis},
  ejr-grant = {grateful, xscala}
}

Downloads: 0