REREC: In-ReRAM Acceleration with Access-Aware Mapping for Personalized Recommendation

REREC: In-ReRAM Acceleration with Access-Aware Mapping for Personalized Recommendation. Wang, Y., Zhu, Z., <a href="https://homes.luddy.indiana.edu/fc7/" target="_blank">Fan Chen</a>, Ma, M., Dai, G., Wang, Y., Li, H. H., & Chen, Y. In International Conference on Computer-Aided Design (ICCAD), pages 1-9, 2021.
doi abstract bibtex

Personalized recommendation systems are widely used in many Internet services. The sparse embedding lookup in recommendation models dominates the computational cost of inference due to its intensive irregular memory accesses. Applying resistive random access memory (ReRAM) based process-in-memory (PIM) architecture to accelerate recommendation processing can avoid data movements caused by off-chip memory accesses. However, naive adoption of ReRAM-based DNN accelerators leads to low computation parallelism and severe under-utilization of computing resources, which is caused by the fine-grained inner-product in feature interaction. In this paper, we propose Rerec, an architecture-algorithm co-designed accelerator, which specializes in fine-grained ReRAM-based inner-product engines with access-aware mapping algorithm for recommendation inference. At the architecture level, we reduce the size and increase the amount of crossbars. The crossbars are fully-connected by Analog-to-Digital Converters (ADCs) in one inner-product engine, which can adapt to the fine-grained and irregular computational patterns and improve the processing parallelism. We further explore trade-offs of (i) crossbar size vs. hardware utilization, and (ii) ADC implementation vs. area/energy efficiency to optimize the design. At the algorithm level, we propose a novel access-aware mapping (AAM) algorithm to optimize resource allocations. Our AAM algorithm tackles the problems of (i) the workload imbalance and (ii) the long recommendation inference latency induced by the great variance of access frequency of embedding vectors. Experimental results show that Rerecachieves 7.69x speedup compared with a ReRAM-based baseline design. Compared to CPU and the state-of-the-art recommendation accelerator, Rerecdemonstrates 29.26x and 3.48x performance improvement, respectively.

@inproceedings{ICCAD2021,
 author = { Wang, Yitu and Zhu, Zhenhua and {<a href="https://homes.luddy.indiana.edu/fc7/" target="_blank">Fan Chen</a></span>} and Ma, Mingyuan and Dai, Guohao and Wang, Yu and Hai Helen {Li} and Chen, Yiran},
 title = {{REREC: In-ReRAM Acceleration with Access-Aware Mapping for Personalized Recommendation}},
 booktitle = {International Conference on Computer-Aided Design (ICCAD)},
 year = {2021},
 pages={1-9},
 doi={10.1109/ICCAD51958.2021.9643573},
 abstract={Personalized recommendation systems are widely used in many Internet services. The sparse embedding lookup in recommendation models dominates the computational cost of inference due to its intensive irregular memory accesses. Applying resistive random access memory (ReRAM) based process-in-memory (PIM) architecture to accelerate recommendation processing can avoid data movements caused by off-chip memory accesses. However, naive adoption of ReRAM-based DNN accelerators leads to low computation parallelism and severe under-utilization of computing resources, which is caused by the fine-grained inner-product in feature interaction. In this paper, we propose Rerec, an architecture-algorithm co-designed accelerator, which specializes in fine-grained ReRAM-based inner-product engines with access-aware mapping algorithm for recommendation inference. At the architecture level, we reduce the size and increase the amount of crossbars. The crossbars are fully-connected by Analog-to-Digital Converters (ADCs) in one inner-product engine, which can adapt to the fine-grained and irregular computational patterns and improve the processing parallelism. We further explore trade-offs of (i) crossbar size vs. hardware utilization, and (ii) ADC implementation vs. area/energy efficiency to optimize the design. At the algorithm level, we propose a novel access-aware mapping (AAM) algorithm to optimize resource allocations. Our AAM algorithm tackles the problems of (i) the workload imbalance and (ii) the long recommendation inference latency induced by the great variance of access frequency of embedding vectors. Experimental results show that Rerecachieves 7.69x speedup compared with a ReRAM-based baseline design. Compared to CPU and the state-of-the-art recommendation accelerator, Rerecdemonstrates 29.26x and 3.48x performance improvement, respectively.}
}

Downloads: 0

{"_id":"KwgP7ELtJzchoRQtu","bibbaseid":"wang-zhu-ahrefhttpshomesluddyindianaedufc7targetblankfanchenaspan-ma-dai-wang-li-chen-rerecinreramaccelerationwithaccessawaremappingforpersonalizedrecommendation-2021","author_short":["Wang, Y.","Zhu, Z.","<a href=\"https://homes.luddy.indiana.edu/fc7/\" target=\"_blank\">Fan Chen</a>","Ma, M.","Dai, G.","Wang, Y.","Li, H. H.","Chen, Y."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"propositions":[],"lastnames":["Wang"],"firstnames":["Yitu"],"suffixes":[]},{"propositions":[],"lastnames":["Zhu"],"firstnames":["Zhenhua"],"suffixes":[]},{"firstnames":[],"propositions":[],"lastnames":["<a href=\"https://homes.luddy.indiana.edu/fc7/\" target=\"_blank\">Fan Chen</a>"],"suffixes":[]},{"propositions":[],"lastnames":["Ma"],"firstnames":["Mingyuan"],"suffixes":[]},{"propositions":[],"lastnames":["Dai"],"firstnames":["Guohao"],"suffixes":[]},{"propositions":[],"lastnames":["Wang"],"firstnames":["Yu"],"suffixes":[]},{"firstnames":["Hai","Helen"],"propositions":[],"lastnames":["Li"],"suffixes":[]},{"propositions":[],"lastnames":["Chen"],"firstnames":["Yiran"],"suffixes":[]}],"title":"REREC: In-ReRAM Acceleration with Access-Aware Mapping for Personalized Recommendation","booktitle":"International Conference on Computer-Aided Design (ICCAD)","year":"2021","pages":"1-9","doi":"10.1109/ICCAD51958.2021.9643573","abstract":"Personalized recommendation systems are widely used in many Internet services. The sparse embedding lookup in recommendation models dominates the computational cost of inference due to its intensive irregular memory accesses. Applying resistive random access memory (ReRAM) based process-in-memory (PIM) architecture to accelerate recommendation processing can avoid data movements caused by off-chip memory accesses. However, naive adoption of ReRAM-based DNN accelerators leads to low computation parallelism and severe under-utilization of computing resources, which is caused by the fine-grained inner-product in feature interaction. In this paper, we propose Rerec, an architecture-algorithm co-designed accelerator, which specializes in fine-grained ReRAM-based inner-product engines with access-aware mapping algorithm for recommendation inference. At the architecture level, we reduce the size and increase the amount of crossbars. The crossbars are fully-connected by Analog-to-Digital Converters (ADCs) in one inner-product engine, which can adapt to the fine-grained and irregular computational patterns and improve the processing parallelism. We further explore trade-offs of (i) crossbar size vs. hardware utilization, and (ii) ADC implementation vs. area/energy efficiency to optimize the design. At the algorithm level, we propose a novel access-aware mapping (AAM) algorithm to optimize resource allocations. Our AAM algorithm tackles the problems of (i) the workload imbalance and (ii) the long recommendation inference latency induced by the great variance of access frequency of embedding vectors. Experimental results show that Rerecachieves 7.69x speedup compared with a ReRAM-based baseline design. Compared to CPU and the state-of-the-art recommendation accelerator, Rerecdemonstrates 29.26x and 3.48x performance improvement, respectively.","bibtex":"@inproceedings{ICCAD2021,\n author = { Wang, Yitu and Zhu, Zhenhua and {<a href=\"https://homes.luddy.indiana.edu/fc7/\" target=\"_blank\">Fan Chen</a>} and Ma, Mingyuan and Dai, Guohao and Wang, Yu and Hai Helen {Li} and Chen, Yiran},\n title = {{REREC: In-ReRAM Acceleration with Access-Aware Mapping for Personalized Recommendation}},\n booktitle = {International Conference on Computer-Aided Design (ICCAD)},\n year = {2021},\n pages={1-9},\n doi={10.1109/ICCAD51958.2021.9643573},\n abstract={Personalized recommendation systems are widely used in many Internet services. The sparse embedding lookup in recommendation models dominates the computational cost of inference due to its intensive irregular memory accesses. Applying resistive random access memory (ReRAM) based process-in-memory (PIM) architecture to accelerate recommendation processing can avoid data movements caused by off-chip memory accesses. However, naive adoption of ReRAM-based DNN accelerators leads to low computation parallelism and severe under-utilization of computing resources, which is caused by the fine-grained inner-product in feature interaction. In this paper, we propose Rerec, an architecture-algorithm co-designed accelerator, which specializes in fine-grained ReRAM-based inner-product engines with access-aware mapping algorithm for recommendation inference. At the architecture level, we reduce the size and increase the amount of crossbars. The crossbars are fully-connected by Analog-to-Digital Converters (ADCs) in one inner-product engine, which can adapt to the fine-grained and irregular computational patterns and improve the processing parallelism. We further explore trade-offs of (i) crossbar size vs. hardware utilization, and (ii) ADC implementation vs. area/energy efficiency to optimize the design. At the algorithm level, we propose a novel access-aware mapping (AAM) algorithm to optimize resource allocations. Our AAM algorithm tackles the problems of (i) the workload imbalance and (ii) the long recommendation inference latency induced by the great variance of access frequency of embedding vectors. Experimental results show that Rerecachieves 7.69x speedup compared with a ReRAM-based baseline design. Compared to CPU and the state-of-the-art recommendation accelerator, Rerecdemonstrates 29.26x and 3.48x performance improvement, respectively.}\n} \n\n\n\n","author_short":["Wang, Y.","Zhu, Z.","<a href=\"https://homes.luddy.indiana.edu/fc7/\" target=\"_blank\">Fan Chen</a>","Ma, M.","Dai, G.","Wang, Y.","Li, H. H.","Chen, Y."],"key":"ICCAD2021","id":"ICCAD2021","bibbaseid":"wang-zhu-ahrefhttpshomesluddyindianaedufc7targetblankfanchenaspan-ma-dai-wang-li-chen-rerecinreramaccelerationwithaccessawaremappingforpersonalizedrecommendation-2021","role":"author","urls":{},"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://homes.luddy.indiana.edu/fc7/fan-publication.bib","dataSources":["SzsArXknrJasD2EPT"],"keywords":[],"search_terms":["rerec","reram","acceleration","access","aware","mapping","personalized","recommendation","wang","zhu","<a href=\"https://homes.luddy.indiana.edu/fc7/\" target=\"_blank\">fan chen</a>","ma","dai","wang","li","chen"],"title":"REREC: In-ReRAM Acceleration with Access-Aware Mapping for Personalized Recommendation","year":2021}