A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units. Kreutzer, M., Hager, G., Wellein, G., Fehske, H., & Bishop, A. SIAM Journal on Scientific Computing, 36(5):C401--C423, January, 2014. 284
Paper doi abstract bibtex Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-\$C\$-\${\textbackslash}sigma\$, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-\$C\$-\${\textbackslash}sigma\$ compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-\$C\$-\${\textbackslash}sigma\$ spMVM kernel. SELL-\$C\$-\${\textbackslash}sigma\$ comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent (``catch-all'') sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms., Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-\$C\$-\${\textbackslash}sigma\$, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-\$C\$-\${\textbackslash}sigma\$ compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-\$C\$-\${\textbackslash}sigma\$ spMVM kernel. SELL-\$C\$-\${\textbackslash}sigma\$ comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent (``catch-all'') sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.
@article{kreutzer_unified_2014,
title = {A {Unified} {Sparse} {Matrix} {Data} {Format} for {Efficient} {General} {Sparse} {Matrix}-{Vector} {Multiplication} on {Modern} {Processors} with {Wide} {SIMD} {Units}},
volume = {36},
issn = {1064-8275},
url = {http://epubs.siam.org/doi/abs/10.1137/130930352},
doi = {10.1137/130930352},
abstract = {Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-\$C\$-\${\textbackslash}sigma\$, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-\$C\$-\${\textbackslash}sigma\$ compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-\$C\$-\${\textbackslash}sigma\$ spMVM kernel. SELL-\$C\$-\${\textbackslash}sigma\$ comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent (``catch-all'') sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms., Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-\$C\$-\${\textbackslash}sigma\$, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-\$C\$-\${\textbackslash}sigma\$ compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-\$C\$-\${\textbackslash}sigma\$ spMVM kernel. SELL-\$C\$-\${\textbackslash}sigma\$ comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent (``catch-all'') sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.},
number = {5},
urldate = {2015-02-05},
journal = {SIAM Journal on Scientific Computing},
author = {Kreutzer, M. and Hager, G. and Wellein, G. and Fehske, H. and Bishop, A.},
month = jan,
year = {2014},
note = {284},
pages = {C401--C423},
file = {Kreutzer et al_2014_A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector.pdf:/home/schlady/.zotero/zotero/za3jlr8i.default/zotero/storage/XUGN7EVT/Kreutzer et al_2014_A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector.pdf:application/pdf;Snapshot:/home/schlady/.zotero/zotero/za3jlr8i.default/zotero/storage/6QDGFI5B/130930352.html:text/html}
}
Downloads: 0
{"_id":"Hoh4Lsi2Ae3y8QHed","bibbaseid":"kreutzer-hager-wellein-fehske-bishop-aunifiedsparsematrixdataformatforefficientgeneralsparsematrixvectormultiplicationonmodernprocessorswithwidesimdunits-2014","downloads":0,"creationDate":"2016-05-03T10:04:14.915Z","title":"A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units","author_short":["Kreutzer, M.","Hager, G.","Wellein, G.","Fehske, H.","Bishop, A."],"year":2014,"bibtype":"article","biburl":"http://theorie2.physik.uni-greifswald.de/tmp/theorie2.bib","bibdata":{"bibtype":"article","type":"article","title":"A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units","volume":"36","issn":"1064-8275","url":"http://epubs.siam.org/doi/abs/10.1137/130930352","doi":"10.1137/130930352","abstract":"Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-\\$C\\$-\\${\\textbackslash}sigma\\$, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-\\$C\\$-\\${\\textbackslash}sigma\\$ compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-\\$C\\$-\\${\\textbackslash}sigma\\$ spMVM kernel. SELL-\\$C\\$-\\${\\textbackslash}sigma\\$ comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent (``catch-all'') sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms., Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-\\$C\\$-\\${\\textbackslash}sigma\\$, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-\\$C\\$-\\${\\textbackslash}sigma\\$ compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-\\$C\\$-\\${\\textbackslash}sigma\\$ spMVM kernel. SELL-\\$C\\$-\\${\\textbackslash}sigma\\$ comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent (``catch-all'') sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.","number":"5","urldate":"2015-02-05","journal":"SIAM Journal on Scientific Computing","author":[{"propositions":[],"lastnames":["Kreutzer"],"firstnames":["M."],"suffixes":[]},{"propositions":[],"lastnames":["Hager"],"firstnames":["G."],"suffixes":[]},{"propositions":[],"lastnames":["Wellein"],"firstnames":["G."],"suffixes":[]},{"propositions":[],"lastnames":["Fehske"],"firstnames":["H."],"suffixes":[]},{"propositions":[],"lastnames":["Bishop"],"firstnames":["A."],"suffixes":[]}],"month":"January","year":"2014","note":"284","pages":"C401--C423","file":"Kreutzer et al_2014_A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector.pdf:/home/schlady/.zotero/zotero/za3jlr8i.default/zotero/storage/XUGN7EVT/Kreutzer et al_2014_A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector.pdf:application/pdf;Snapshot:/home/schlady/.zotero/zotero/za3jlr8i.default/zotero/storage/6QDGFI5B/130930352.html:text/html","bibtex":"@article{kreutzer_unified_2014,\n\ttitle = {A {Unified} {Sparse} {Matrix} {Data} {Format} for {Efficient} {General} {Sparse} {Matrix}-{Vector} {Multiplication} on {Modern} {Processors} with {Wide} {SIMD} {Units}},\n\tvolume = {36},\n\tissn = {1064-8275},\n\turl = {http://epubs.siam.org/doi/abs/10.1137/130930352},\n\tdoi = {10.1137/130930352},\n\tabstract = {Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-\\$C\\$-\\${\\textbackslash}sigma\\$, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-\\$C\\$-\\${\\textbackslash}sigma\\$ compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-\\$C\\$-\\${\\textbackslash}sigma\\$ spMVM kernel. SELL-\\$C\\$-\\${\\textbackslash}sigma\\$ comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent (``catch-all'') sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms., Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-\\$C\\$-\\${\\textbackslash}sigma\\$, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-\\$C\\$-\\${\\textbackslash}sigma\\$ compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-\\$C\\$-\\${\\textbackslash}sigma\\$ spMVM kernel. SELL-\\$C\\$-\\${\\textbackslash}sigma\\$ comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent (``catch-all'') sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.},\n\tnumber = {5},\n\turldate = {2015-02-05},\n\tjournal = {SIAM Journal on Scientific Computing},\n\tauthor = {Kreutzer, M. and Hager, G. and Wellein, G. and Fehske, H. and Bishop, A.},\n\tmonth = jan,\n\tyear = {2014},\n\tnote = {284},\n\tpages = {C401--C423},\n\tfile = {Kreutzer et al_2014_A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector.pdf:/home/schlady/.zotero/zotero/za3jlr8i.default/zotero/storage/XUGN7EVT/Kreutzer et al_2014_A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector.pdf:application/pdf;Snapshot:/home/schlady/.zotero/zotero/za3jlr8i.default/zotero/storage/6QDGFI5B/130930352.html:text/html}\n}\n\n","author_short":["Kreutzer, M.","Hager, G.","Wellein, G.","Fehske, H.","Bishop, A."],"key":"kreutzer_unified_2014","id":"kreutzer_unified_2014","bibbaseid":"kreutzer-hager-wellein-fehske-bishop-aunifiedsparsematrixdataformatforefficientgeneralsparsematrixvectormultiplicationonmodernprocessorswithwidesimdunits-2014","role":"author","urls":{"Paper":"http://epubs.siam.org/doi/abs/10.1137/130930352"},"downloads":0,"html":""},"search_terms":["unified","sparse","matrix","data","format","efficient","general","sparse","matrix","vector","multiplication","modern","processors","wide","simd","units","kreutzer","hager","wellein","fehske","bishop"],"keywords":[],"authorIDs":[],"dataSources":["NNzLf23uJ9Wdymi4q"]}