Accelerating Mobile Audio Sensing Algorithms Through On-Chip GPU Offloading. Georgiev, P., Lane, N., D., Mascolo, C., & Chu, D. In Proceedings of the International Conference on Mobile Systems, Applications, and Services (MobiSys), pages 306-318, 6, 2017. ACM.
Accelerating Mobile Audio Sensing Algorithms Through On-Chip GPU Offloading [link]Website  abstract   bibtex   
GPUs have recently enjoyed increased popularity as general purpose software accelerators in multiple application domains including computer vision and natural language processing. However, there has been little exploration into the performance and energy trade-offs mobile GPUs can deliver for the increasingly popular workload of deep-inference audio sensing tasks, such as, spoken keyword spotting in energy-constrained smartphones and wearables. In this paper, we study these trade-offs and introduce an optimization engine that leverages a series of structural and memory access optimization techniques that allow audio algorithm performance to be automatically tuned as a function of GPU device specifications and model semantics. We find that parameter optimized audio routines obtain inferences an order of magnitude faster than sequential CPU implementations, and up to 6.5x times faster than cloud offloading with good connectivity, while critically consuming 3-4x less energy than the CPU. Under our optimized GPU, conventional wisdom about how to use the cloud and low power chips is broken. Unless the network has a throughput of at least 20Mbps (and a RTT of 25 ms or less), with only about 10 to 20 seconds of buffering audio data for batched execution, the optimized GPU audio sensing apps begin to consume less energy than cloud offloading. Under such conditions we find the optimized GPU can provide energy benefits comparable to low-power reference DSP implementations with some preliminary level of optimization; in addition to the GPU always winning with lower latency.
@inProceedings{
 title = {Accelerating Mobile Audio Sensing Algorithms Through On-Chip GPU Offloading},
 type = {inProceedings},
 year = {2017},
 identifiers = {[object Object]},
 keywords = {audio-sensing,gpu,mobile-platforms},
 pages = {306-318},
 websites = {http://dx.doi.org/10.1145/3081333.3081358},
 month = {6},
 publisher = {ACM},
 id = {5746095e-7f9a-34f7-8238-c7800d71c78f},
 created = {2018-07-12T21:31:49.426Z},
 file_attached = {false},
 profile_id = {f954d000-ce94-3da6-bd26-b983145a920f},
 group_id = {b0b145a3-980e-3ad7-a16f-c93918c606ed},
 last_modified = {2018-07-12T21:31:49.426Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 citation_key = {georgiev:accelerating2017},
 source_type = {inproceedings},
 private_publication = {false},
 abstract = {GPUs have recently enjoyed increased popularity as general purpose software accelerators in multiple application domains including computer vision and natural language processing. However, there has been little exploration into the performance and energy trade-offs mobile GPUs can deliver for the increasingly popular workload of deep-inference audio sensing tasks, such as, spoken keyword spotting in energy-constrained smartphones and wearables. In this paper, we study these trade-offs and introduce an optimization engine that leverages a series of structural and memory access optimization techniques that allow audio algorithm performance to be automatically tuned as a function of GPU device specifications and model semantics. We find that parameter optimized audio routines obtain inferences an order of magnitude faster than sequential CPU implementations, and up to 6.5x times faster than cloud offloading with good connectivity, while critically consuming 3-4x less energy than the CPU. Under our optimized GPU, conventional wisdom about how to use the cloud and low power chips is broken. Unless the network has a throughput of at least 20Mbps (and a RTT of 25 ms or less), with only about 10 to 20 seconds of buffering audio data for batched execution, the optimized GPU audio sensing apps begin to consume less energy than cloud offloading. Under such conditions we find the optimized GPU can provide energy benefits comparable to low-power reference DSP implementations with some preliminary level of optimization; in addition to the GPU always winning with lower latency.},
 bibtype = {inProceedings},
 author = {Georgiev, Petko and Lane, Nicholas D and Mascolo, Cecilia and Chu, David},
 booktitle = {Proceedings of the International Conference on Mobile Systems, Applications, and Services (MobiSys)}
}

Downloads: 0