A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units. Kreutzer, M., Hager, G., Wellein, G., Fehske, H., & Bishop, A. SIAM Journal on Scientific Computing, 36(5):C401--C423, January, 2014. 284
A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units [link]Paper  doi  abstract   bibtex   
Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-\$C\$-\${\textbackslash}sigma\$, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-\$C\$-\${\textbackslash}sigma\$ compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-\$C\$-\${\textbackslash}sigma\$ spMVM kernel. SELL-\$C\$-\${\textbackslash}sigma\$ comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent (``catch-all'') sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms., Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures. However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems. Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-\$C\$-\${\textbackslash}sigma\$, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-\$C\$-\${\textbackslash}sigma\$ compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-\$C\$-\${\textbackslash}sigma\$ spMVM kernel. SELL-\$C\$-\${\textbackslash}sigma\$ comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent (``catch-all'') sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.
@article{kreutzer_unified_2014,
	title = {A {Unified} {Sparse} {Matrix} {Data} {Format} for {Efficient} {General} {Sparse} {Matrix}-{Vector} {Multiplication} on {Modern} {Processors} with {Wide} {SIMD} {Units}},
	volume = {36},
	issn = {1064-8275},
	url = {http://epubs.siam.org/doi/abs/10.1137/130930352},
	doi = {10.1137/130930352},
	abstract = {Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures.  However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems.  Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-\$C\$-\${\textbackslash}sigma\$, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-\$C\$-\${\textbackslash}sigma\$ compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-\$C\$-\${\textbackslash}sigma\$ spMVM kernel. SELL-\$C\$-\${\textbackslash}sigma\$ comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent (``catch-all'') sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.,  Sparse matrix-vector multiplication (spMVM) is the most time-consuming kernel in many numerical algorithms and has been studied extensively on all modern processor and accelerator architectures.  However, the optimal sparse matrix data storage format is highly hardware-specific, which could become an obstacle when using heterogeneous systems.  Also, it is as yet unclear how the wide single instruction multiple data (SIMD) units in current multi- and many-core processors should be used most efficiently if there is no structure in the sparsity pattern of the matrix. We suggest SELL-\$C\$-\${\textbackslash}sigma\$, a variant of Sliced ELLPACK, as a SIMD-friendly data format which combines long-standing ideas from general-purpose graphics processing units and vector computer programming. We discuss the advantages of SELL-\$C\$-\${\textbackslash}sigma\$ compared to established formats like Compressed Row Storage and ELLPACK and show its suitability on a variety of hardware platforms (Intel Sandy Bridge, Intel Xeon Phi, and Nvidia Tesla K20) for a wide range of test matrices from different application areas. Using appropriate performance models we develop deep insight into the data transfer properties of the SELL-\$C\$-\${\textbackslash}sigma\$ spMVM kernel. SELL-\$C\$-\${\textbackslash}sigma\$ comes with two tuning parameters whose performance impact across the range of test matrices is studied and for which reasonable choices are proposed. This leads to a hardware-independent (``catch-all'') sparse matrix format, which achieves very high efficiency for all test matrices across all hardware platforms.},
	number = {5},
	urldate = {2015-02-05},
	journal = {SIAM Journal on Scientific Computing},
	author = {Kreutzer, M. and Hager, G. and Wellein, G. and Fehske, H. and Bishop, A.},
	month = jan,
	year = {2014},
	note = {284},
	pages = {C401--C423},
	file = {Kreutzer et al_2014_A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector.pdf:/home/schlady/.zotero/zotero/za3jlr8i.default/zotero/storage/XUGN7EVT/Kreutzer et al_2014_A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector.pdf:application/pdf;Snapshot:/home/schlady/.zotero/zotero/za3jlr8i.default/zotero/storage/6QDGFI5B/130930352.html:text/html}
}

Downloads: 0