An Energy-Efficient Abstraction for Simultaneous Breadth-First Searches. In The IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, September, 2015.
abstract   bibtex   
Optimized GPU kernels are sufficiently complicated to write that they often are specialized to specific input data, target architectures, or applications. This paper presents a multi-search abstraction for computing multiple breadth-first searches in parallel and demonstrates a high-performance, general implementation. Our abstraction removes the burden of orchestrating graph traversal from the user while providing high performance and low energy usage, an often overlooked component of algorithm design. Energy consumption has become a first-class hardware design constraint for both massive and embedded computing platforms. Our abstraction can be applied to such problems as the all-pairs shortest-path problem, community detection, reachability querying, and others. To map graph traversal efficiently to NVIDIA GPUs, our hybrid implementation chooses between processing active vertices with a single thread or an entire warp based on vertex outdegree. For a set of twelve varied graphs, the implementation of our abstraction saves 42% time and 62% energy on average compared to representative implementations of specific applications from existing literature.
@inproceedings{hpec15,
  author = {Adam McLaughlin and Jason Riedy and David A. Bader},
  title = {An Energy-Efficient Abstraction for Simultaneous
                  Breadth-First Searches},
  ejr-withauthor = {Adam McLaughlin and David A. Bader},
  tags = {parallel; graph; energy},
  booktitle = {The IEEE High Performance Extreme Computing
                  Conference (HPEC)},
  year = 2015,
  month = sep,
  address = {Waltham, MA},
  dom = 17,
  role = {proceedings},
  abstract = {Optimized GPU kernels are sufficiently complicated
                  to write that they often are specialized to specific
                  input data, target architectures, or applications.
                  This paper presents a multi-search abstraction for
                  computing multiple breadth-first searches in
                  parallel and demonstrates a high-performance,
                  general implementation.  Our abstraction removes the
                  burden of orchestrating graph traversal from the
                  user while providing high performance and low energy
                  usage, an often overlooked component of algorithm
                  design.  Energy consumption has become a first-class
                  hardware design constraint for both massive and
                  embedded computing platforms.  Our abstraction can
                  be applied to such problems as the all-pairs
                  shortest-path problem, community detection,
                  reachability querying, and others.  To map graph
                  traversal efficiently to NVIDIA GPUs, our hybrid
                  implementation chooses between processing active
                  vertices with a single thread or an entire warp
                  based on vertex outdegree.  For a set of twelve
                  varied graphs, the implementation of our abstraction
                  saves 42\% time and 62\% energy on average compared
                  to representative implementations of specific
                  applications from existing literature.},
  file = {material/multi_search_energy.pdf},
  projtag = {xscala, grateful, hpda}
}
Downloads: 0