{"_id":"XbEdPX6Kam4peBY6E","bibbaseid":"chu-ahrefhttpshomesluddyindianaedufc7targetbilankfanchenaspan-xu-wang-recoinalowpowerprocessinginreramarchitecturefordeformableconvolution-2021","author_short":["Chu, C.","<a href=\"https://homes.luddy.indiana.edu/fc7/\" target=\"_bilank\">Fan Chen</a></span>","Xu, D.","Wang, Y."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["Cheng"],"propositions":[],"lastnames":["Chu"],"suffixes":[]},{"firstnames":[],"propositions":[],"lastnames":["<a href=\"https://homes.luddy.indiana.edu/fc7/\" target=\"_bilank\">Fan Chen</a></span>"],"suffixes":[]},{"firstnames":["Dawen"],"propositions":[],"lastnames":["Xu"],"suffixes":[]},{"firstnames":["Ying"],"propositions":[],"lastnames":["Wang"],"suffixes":[]}],"title":"RECOIN: A Low-Power Processing-in-ReRAM Architecture for Deformable Convolution","year":"2021","doi":"10.1145/3453688.3461480","abstract":"The recent proposed Deformable Convolutional Networks (DCNs) greatly enhance the performance of conventional Convolutional Neural Networks (CNNs) on vision recognition tasks by allowing flexible input sampling during inference runtime. DCNs introduce an additional convolutional layer for adaptive sampling offset generation, followed by a bilinear interpolation (BLI) algorithm to integerize the generated non-integer offset values. Finally, a regular convolution is performed on the loaded input pixels. Compared with conventional CNNs, DCN demonstrated significantly increased computational complexity and irregular input-dependentmemory access patterns, making it a great challenge for deploying DCNs onto edge devices for real-time computer vision tasks. In this work, we propose RECOIN, a processing-in-memory (PIM) architecture, which supports DCN inference on resistive memory (ReRAM)crossbars, thus making the first DCN inference accelerator possible. We present a novel BLI processing engine that leverage both row-and column-oriented computation for in-situ BLI calculation. Amapping scheme and an address converter are particular designed to accommodate the intensive computation and irregular data access. We implement the DCN inference in a 4-stage pipeline and evaluate the effectiveness of RECOIN on six DCN models. Experimental results show RECOIN achieves respectively 225x and 17.4x improvement in energy efficiency compared to general-purpose CPU and GPU. Compared to two state-of-the-art ASIC accelerators, RECOIN achieve 26.8x and 20.4x speedup respectively.","booktitle":"Proceedings of the 2021 on Great Lakes Symposium on VLSI (GLSVLSI)","pages":"235-240","bibtex":"@inproceedings{glsvlsi2021,\r\nauthor = {Cheng Chu and {<a href=\"https://homes.luddy.indiana.edu/fc7/\" target=\"_bilank\">Fan Chen</a></span>} and\r\n Dawen Xu and\r\n Ying Wang},\r\ntitle = {{RECOIN: A Low-Power Processing-in-ReRAM Architecture for Deformable Convolution}},\r\nyear = {2021},\r\ndoi = {10.1145/3453688.3461480},\r\nabstract = {The recent proposed Deformable Convolutional Networks (DCNs) greatly enhance the performance\r\nof conventional Convolutional Neural Networks (CNNs) on vision recognition tasks by\r\nallowing flexible input sampling during inference runtime. DCNs introduce an additional\r\nconvolutional layer for adaptive sampling offset generation, followed by a bilinear\r\ninterpolation (BLI) algorithm to integerize the generated non-integer offset values.\r\nFinally, a regular convolution is performed on the loaded input pixels. Compared with\r\nconventional CNNs, DCN demonstrated significantly increased computational complexity\r\nand irregular input-dependentmemory access patterns, making it a great challenge for\r\ndeploying DCNs onto edge devices for real-time computer vision tasks. In this work,\r\nwe propose RECOIN, a processing-in-memory (PIM) architecture, which supports DCN inference\r\non resistive memory (ReRAM)crossbars, thus making the first DCN inference accelerator\r\npossible. We present a novel BLI processing engine that leverage both row-and column-oriented\r\ncomputation for in-situ BLI calculation. Amapping scheme and an address converter\r\nare particular designed to accommodate the intensive computation and irregular data\r\naccess. We implement the DCN inference in a 4-stage pipeline and evaluate the effectiveness\r\nof RECOIN on six DCN models. Experimental results show RECOIN achieves respectively\r\n225x and 17.4x improvement in energy efficiency compared to general-purpose CPU and\r\nGPU. Compared to two state-of-the-art ASIC accelerators, RECOIN achieve 26.8x and\r\n20.4x speedup respectively.},\r\nbooktitle = {Proceedings of the 2021 on Great Lakes Symposium on VLSI (GLSVLSI)},\r\npages = {235-240},\r\n}\r\n\r\n\r\n","author_short":["Chu, C.","<a href=\"https://homes.luddy.indiana.edu/fc7/\" target=\"_bilank\">Fan Chen</a></span>","Xu, D.","Wang, Y."],"key":"glsvlsi2021","id":"glsvlsi2021","bibbaseid":"chu-ahrefhttpshomesluddyindianaedufc7targetbilankfanchenaspan-xu-wang-recoinalowpowerprocessinginreramarchitecturefordeformableconvolution-2021","role":"author","urls":{},"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://homes.luddy.indiana.edu/fc7/fan-publication.bib","dataSources":["SzsArXknrJasD2EPT"],"keywords":[],"search_terms":["recoin","low","power","processing","reram","architecture","deformable","convolution","chu","<a href=\"https://homes.luddy.indiana.edu/fc7/\" target=\"_bilank\">fan chen</a></span>","xu","wang"],"title":"RECOIN: A Low-Power Processing-in-ReRAM Architecture for Deformable Convolution","year":2021}