Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture

Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture. Liu, X., Chen, L., Firoz, J., S., Qiu, J., & Jiang, L. 2017.

Website abstract bibtex

Intel Xeon Phi many-integrated-core (MIC) architectures usher in a new era of terascale integration. Among emerging killer applications, parallel graph processing has been a critical technique to analyze connected data. In this paper, we empirically evaluate various computing platforms including an Intel Xeon E5 CPU, a Nvidia Geforce GTX1070 GPU and an Xeon Phi 7210 processor codenamed Knights Landing (KNL) in the domain of parallel graph processing. We show that the KNL gains encouraging performance when processing graphs, so that it can become a promising solution to accelerating multi-threaded graph applications. We further characterize the impact of KNL architectural enhancements on the performance of a state-of-the art graph framework.We have four key observations: 1 Different graph applications require distinctive numbers of threads to reach the peak performance. For the same application, various datasets need even different numbers of threads to achieve the best performance. 2 Only a few graph applications benefit from the high bandwidth MCDRAM, while others favor the low latency DDR4 DRAM. 3 Vector processing units executing AVX512 SIMD instructions on KNLs are underutilized when running the state-of-the-art graph framework. 4 The sub-NUMA cache clustering mode offering the lowest local memory access latency hurts the performance of graph benchmarks that are lack of NUMA awareness. At last, We suggest future works including system auto-tuning tools and graph framework optimizations to fully exploit the potential of KNL for parallel graph processing.

@article{
 title = {Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture},
 type = {article},
 year = {2017},
 identifiers = {[object Object]},
 websites = {http://arxiv.org/abs/1708.04701},
 id = {61e168f8-dae8-3e59-b1a6-46f14e9f406c},
 created = {2018-08-09T16:38:14.663Z},
 file_attached = {false},
 profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
 group_id = {f0704cbc-a3d0-3264-b41e-15f4ed0c92ee},
 last_modified = {2018-08-09T16:38:14.663Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 private_publication = {false},
 abstract = {Intel Xeon Phi many-integrated-core (MIC) architectures usher in a new era of terascale integration. Among emerging killer applications, parallel graph processing has been a critical technique to analyze connected data. In this paper, we empirically evaluate various computing platforms including an Intel Xeon E5 CPU, a Nvidia Geforce GTX1070 GPU and an Xeon Phi 7210 processor codenamed Knights Landing (KNL) in the domain of parallel graph processing. We show that the KNL gains encouraging performance when processing graphs, so that it can become a promising solution to accelerating multi-threaded graph applications. We further characterize the impact of KNL architectural enhancements on the performance of a state-of-the art graph framework.We have four key observations: 1 Different graph applications require distinctive numbers of threads to reach the peak performance. For the same application, various datasets need even different numbers of threads to achieve the best performance. 2 Only a few graph applications benefit from the high bandwidth MCDRAM, while others favor the low latency DDR4 DRAM. 3 Vector processing units executing AVX512 SIMD instructions on KNLs are underutilized when running the state-of-the-art graph framework. 4 The sub-NUMA cache clustering mode offering the lowest local memory access latency hurts the performance of graph benchmarks that are lack of NUMA awareness. At last, We suggest future works including system auto-tuning tools and graph framework optimizations to fully exploit the potential of KNL for parallel graph processing.},
 bibtype = {article},
 author = {Liu, Xu and Chen, Langshi and Firoz, Jesun S. and Qiu, Judy and Jiang, Lei}
}

Downloads: 0

{"_id":"2iL4EJwwpje6TXb5G","bibbaseid":"liu-chen-firoz-qiu-jiang-performancecharacterizationofmultithreadedgraphprocessingapplicationsonintelmanyintegratedcorearchitecture-2017","downloads":0,"creationDate":"2018-08-11T20:07:28.361Z","title":"Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture","author_short":["Liu, X.","Chen, L.","Firoz, J., S.","Qiu, J.","Jiang, L."],"year":2017,"bibtype":"article","biburl":null,"bibdata":{"title":"Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture","type":"article","year":"2017","identifiers":"[object Object]","websites":"http://arxiv.org/abs/1708.04701","id":"61e168f8-dae8-3e59-b1a6-46f14e9f406c","created":"2018-08-09T16:38:14.663Z","file_attached":false,"profile_id":"42d295c0-0737-38d6-8b43-508cab6ea85d","group_id":"f0704cbc-a3d0-3264-b41e-15f4ed0c92ee","last_modified":"2018-08-09T16:38:14.663Z","read":false,"starred":false,"authored":false,"confirmed":"true","hidden":false,"private_publication":false,"abstract":"Intel Xeon Phi many-integrated-core (MIC) architectures usher in a new era of terascale integration. Among emerging killer applications, parallel graph processing has been a critical technique to analyze connected data. In this paper, we empirically evaluate various computing platforms including an Intel Xeon E5 CPU, a Nvidia Geforce GTX1070 GPU and an Xeon Phi 7210 processor codenamed Knights Landing (KNL) in the domain of parallel graph processing. We show that the KNL gains encouraging performance when processing graphs, so that it can become a promising solution to accelerating multi-threaded graph applications. We further characterize the impact of KNL architectural enhancements on the performance of a state-of-the art graph framework.We have four key observations: 1 Different graph applications require distinctive numbers of threads to reach the peak performance. For the same application, various datasets need even different numbers of threads to achieve the best performance. 2 Only a few graph applications benefit from the high bandwidth MCDRAM, while others favor the low latency DDR4 DRAM. 3 Vector processing units executing AVX512 SIMD instructions on KNLs are underutilized when running the state-of-the-art graph framework. 4 The sub-NUMA cache clustering mode offering the lowest local memory access latency hurts the performance of graph benchmarks that are lack of NUMA awareness. At last, We suggest future works including system auto-tuning tools and graph framework optimizations to fully exploit the potential of KNL for parallel graph processing.","bibtype":"article","author":"Liu, Xu and Chen, Langshi and Firoz, Jesun S. and Qiu, Judy and Jiang, Lei","bibtex":"@article{\n title = {Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture},\n type = {article},\n year = {2017},\n identifiers = {[object Object]},\n websites = {http://arxiv.org/abs/1708.04701},\n id = {61e168f8-dae8-3e59-b1a6-46f14e9f406c},\n created = {2018-08-09T16:38:14.663Z},\n file_attached = {false},\n profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},\n group_id = {f0704cbc-a3d0-3264-b41e-15f4ed0c92ee},\n last_modified = {2018-08-09T16:38:14.663Z},\n read = {false},\n starred = {false},\n authored = {false},\n confirmed = {true},\n hidden = {false},\n private_publication = {false},\n abstract = {Intel Xeon Phi many-integrated-core (MIC) architectures usher in a new era of terascale integration. Among emerging killer applications, parallel graph processing has been a critical technique to analyze connected data. In this paper, we empirically evaluate various computing platforms including an Intel Xeon E5 CPU, a Nvidia Geforce GTX1070 GPU and an Xeon Phi 7210 processor codenamed Knights Landing (KNL) in the domain of parallel graph processing. We show that the KNL gains encouraging performance when processing graphs, so that it can become a promising solution to accelerating multi-threaded graph applications. We further characterize the impact of KNL architectural enhancements on the performance of a state-of-the art graph framework.We have four key observations: 1 Different graph applications require distinctive numbers of threads to reach the peak performance. For the same application, various datasets need even different numbers of threads to achieve the best performance. 2 Only a few graph applications benefit from the high bandwidth MCDRAM, while others favor the low latency DDR4 DRAM. 3 Vector processing units executing AVX512 SIMD instructions on KNLs are underutilized when running the state-of-the-art graph framework. 4 The sub-NUMA cache clustering mode offering the lowest local memory access latency hurts the performance of graph benchmarks that are lack of NUMA awareness. At last, We suggest future works including system auto-tuning tools and graph framework optimizations to fully exploit the potential of KNL for parallel graph processing.},\n bibtype = {article},\n author = {Liu, Xu and Chen, Langshi and Firoz, Jesun S. and Qiu, Judy and Jiang, Lei}\n}","author_short":["Liu, X.","Chen, L.","Firoz, J., S.","Qiu, J.","Jiang, L."],"urls":{"Website":"http://arxiv.org/abs/1708.04701"},"bibbaseid":"liu-chen-firoz-qiu-jiang-performancecharacterizationofmultithreadedgraphprocessingapplicationsonintelmanyintegratedcorearchitecture-2017","role":"author","downloads":0},"search_terms":["performance","characterization","multi","threaded","graph","processing","applications","intel","many","integrated","core","architecture","liu","chen","firoz","qiu","jiang"],"keywords":[],"authorIDs":[]}