RECOIN: A Low-Power Processing-in-ReRAM Architecture for Deformable Convolution

RECOIN: A Low-Power Processing-in-ReRAM Architecture for Deformable Convolution. Chu, C., <a href="https://homes.luddy.indiana.edu/fc7/" target="_bilank">Fan Chen</a>, Xu, D., & Wang, Y. In Proceedings of the 2021 on Great Lakes Symposium on VLSI (GLSVLSI), pages 235-240, 2021.
doi abstract bibtex

The recent proposed Deformable Convolutional Networks (DCNs) greatly enhance the performance of conventional Convolutional Neural Networks (CNNs) on vision recognition tasks by allowing flexible input sampling during inference runtime. DCNs introduce an additional convolutional layer for adaptive sampling offset generation, followed by a bilinear interpolation (BLI) algorithm to integerize the generated non-integer offset values. Finally, a regular convolution is performed on the loaded input pixels. Compared with conventional CNNs, DCN demonstrated significantly increased computational complexity and irregular input-dependentmemory access patterns, making it a great challenge for deploying DCNs onto edge devices for real-time computer vision tasks. In this work, we propose RECOIN, a processing-in-memory (PIM) architecture, which supports DCN inference on resistive memory (ReRAM)crossbars, thus making the first DCN inference accelerator possible. We present a novel BLI processing engine that leverage both row-and column-oriented computation for in-situ BLI calculation. Amapping scheme and an address converter are particular designed to accommodate the intensive computation and irregular data access. We implement the DCN inference in a 4-stage pipeline and evaluate the effectiveness of RECOIN on six DCN models. Experimental results show RECOIN achieves respectively 225x and 17.4x improvement in energy efficiency compared to general-purpose CPU and GPU. Compared to two state-of-the-art ASIC accelerators, RECOIN achieve 26.8x and 20.4x speedup respectively.

@inproceedings{glsvlsi2021,
author = {Cheng Chu and {<a href="https://homes.luddy.indiana.edu/fc7/" target="_bilank">Fan Chen</a></span>} and
               Dawen Xu and
               Ying Wang},
title = {{RECOIN: A Low-Power Processing-in-ReRAM Architecture for Deformable Convolution}},
year = {2021},
doi = {10.1145/3453688.3461480},
abstract = {The recent proposed Deformable Convolutional Networks (DCNs) greatly enhance the performance
of conventional Convolutional Neural Networks (CNNs) on vision recognition tasks by
allowing flexible input sampling during inference runtime. DCNs introduce an additional
convolutional layer for adaptive sampling offset generation, followed by a bilinear
interpolation (BLI) algorithm to integerize the generated non-integer offset values.
Finally, a regular convolution is performed on the loaded input pixels. Compared with
conventional CNNs, DCN demonstrated significantly increased computational complexity
and irregular input-dependentmemory access patterns, making it a great challenge for
deploying DCNs onto edge devices for real-time computer vision tasks. In this work,
we propose RECOIN, a processing-in-memory (PIM) architecture, which supports DCN inference
on resistive memory (ReRAM)crossbars, thus making the first DCN inference accelerator
possible. We present a novel BLI processing engine that leverage both row-and column-oriented
computation for in-situ BLI calculation. Amapping scheme and an address converter
are particular designed to accommodate the intensive computation and irregular data
access. We implement the DCN inference in a 4-stage pipeline and evaluate the effectiveness
of RECOIN on six DCN models. Experimental results show RECOIN achieves respectively
225x and 17.4x improvement in energy efficiency compared to general-purpose CPU and
GPU. Compared to two state-of-the-art ASIC accelerators, RECOIN achieve 26.8x and
20.4x speedup respectively.},
booktitle = {Proceedings of the 2021 on Great Lakes Symposium on VLSI (GLSVLSI)},
pages = {235-240},
}

Downloads: 0

{"_id":"XbEdPX6Kam4peBY6E","bibbaseid":"chu-ahrefhttpshomesluddyindianaedufc7targetbilankfanchenaspan-xu-wang-recoinalowpowerprocessinginreramarchitecturefordeformableconvolution-2021","author_short":["Chu, C.","<a href=\"https://homes.luddy.indiana.edu/fc7/\" target=\"_bilank\">Fan Chen</a>","Xu, D.","Wang, Y."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","author":[{"firstnames":["Cheng"],"propositions":[],"lastnames":["Chu"],"suffixes":[]},{"firstnames":[],"propositions":[],"lastnames":["<a href=\"https://homes.luddy.indiana.edu/fc7/\" target=\"_bilank\">Fan Chen</a>"],"suffixes":[]},{"firstnames":["Dawen"],"propositions":[],"lastnames":["Xu"],"suffixes":[]},{"firstnames":["Ying"],"propositions":[],"lastnames":["Wang"],"suffixes":[]}],"title":"RECOIN: A Low-Power Processing-in-ReRAM Architecture for Deformable Convolution","year":"2021","doi":"10.1145/3453688.3461480","abstract":"The recent proposed Deformable Convolutional Networks (DCNs) greatly enhance the performance of conventional Convolutional Neural Networks (CNNs) on vision recognition tasks by allowing flexible input sampling during inference runtime. DCNs introduce an additional convolutional layer for adaptive sampling offset generation, followed by a bilinear interpolation (BLI) algorithm to integerize the generated non-integer offset values. Finally, a regular convolution is performed on the loaded input pixels. Compared with conventional CNNs, DCN demonstrated significantly increased computational complexity and irregular input-dependentmemory access patterns, making it a great challenge for deploying DCNs onto edge devices for real-time computer vision tasks. In this work, we propose RECOIN, a processing-in-memory (PIM) architecture, which supports DCN inference on resistive memory (ReRAM)crossbars, thus making the first DCN inference accelerator possible. We present a novel BLI processing engine that leverage both row-and column-oriented computation for in-situ BLI calculation. Amapping scheme and an address converter are particular designed to accommodate the intensive computation and irregular data access. We implement the DCN inference in a 4-stage pipeline and evaluate the effectiveness of RECOIN on six DCN models. Experimental results show RECOIN achieves respectively 225x and 17.4x improvement in energy efficiency compared to general-purpose CPU and GPU. Compared to two state-of-the-art ASIC accelerators, RECOIN achieve 26.8x and 20.4x speedup respectively.","booktitle":"Proceedings of the 2021 on Great Lakes Symposium on VLSI (GLSVLSI)","pages":"235-240","bibtex":"@inproceedings{glsvlsi2021,\nauthor = {Cheng Chu and {<a href=\"https://homes.luddy.indiana.edu/fc7/\" target=\"_bilank\">Fan Chen</a>} and\n Dawen Xu and\n Ying Wang},\ntitle = {{RECOIN: A Low-Power Processing-in-ReRAM Architecture for Deformable Convolution}},\nyear = {2021},\ndoi = {10.1145/3453688.3461480},\nabstract = {The recent proposed Deformable Convolutional Networks (DCNs) greatly enhance the performance\nof conventional Convolutional Neural Networks (CNNs) on vision recognition tasks by\nallowing flexible input sampling during inference runtime. DCNs introduce an additional\nconvolutional layer for adaptive sampling offset generation, followed by a bilinear\ninterpolation (BLI) algorithm to integerize the generated non-integer offset values.\nFinally, a regular convolution is performed on the loaded input pixels. Compared with\nconventional CNNs, DCN demonstrated significantly increased computational complexity\nand irregular input-dependentmemory access patterns, making it a great challenge for\ndeploying DCNs onto edge devices for real-time computer vision tasks. In this work,\nwe propose RECOIN, a processing-in-memory (PIM) architecture, which supports DCN inference\non resistive memory (ReRAM)crossbars, thus making the first DCN inference accelerator\npossible. We present a novel BLI processing engine that leverage both row-and column-oriented\ncomputation for in-situ BLI calculation. Amapping scheme and an address converter\nare particular designed to accommodate the intensive computation and irregular data\naccess. We implement the DCN inference in a 4-stage pipeline and evaluate the effectiveness\nof RECOIN on six DCN models. Experimental results show RECOIN achieves respectively\n225x and 17.4x improvement in energy efficiency compared to general-purpose CPU and\nGPU. Compared to two state-of-the-art ASIC accelerators, RECOIN achieve 26.8x and\n20.4x speedup respectively.},\nbooktitle = {Proceedings of the 2021 on Great Lakes Symposium on VLSI (GLSVLSI)},\npages = {235-240},\n}\n\n\n","author_short":["Chu, C.","<a href=\"https://homes.luddy.indiana.edu/fc7/\" target=\"_bilank\">Fan Chen</a>","Xu, D.","Wang, Y."],"key":"glsvlsi2021","id":"glsvlsi2021","bibbaseid":"chu-ahrefhttpshomesluddyindianaedufc7targetbilankfanchenaspan-xu-wang-recoinalowpowerprocessinginreramarchitecturefordeformableconvolution-2021","role":"author","urls":{},"metadata":{"authorlinks":{}}},"bibtype":"inproceedings","biburl":"https://homes.luddy.indiana.edu/fc7/fan-publication.bib","dataSources":["SzsArXknrJasD2EPT"],"keywords":[],"search_terms":["recoin","low","power","processing","reram","architecture","deformable","convolution","chu","<a href=\"https://homes.luddy.indiana.edu/fc7/\" target=\"_bilank\">fan chen</a>","xu","wang"],"title":"RECOIN: A Low-Power Processing-in-ReRAM Architecture for Deformable Convolution","year":2021}