REQ-YOLO: A resource-aware, efficient quantization framework for object detection on FPGAS

REQ-YOLO: A resource-aware, efficient quantization framework for object detection on FPGAS. Ding, C., Wang, S., Liu, N., Xu, K., Wang, Y., & Liang, Y. In FPGA 2019 - Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 33-42, 2, 2019. Association for Computing Machinery, Inc.

Paper doi abstract bibtex

Deep neural networks (DNNs), as the basis of object detection, will play a key role in the development of future autonomous systems with full autonomy. The autonomous systems have special requirements of real-time, energy-efficient implementations of DNNs on a power-constrained system. Two research thrusts are dedicated to performance and energy efficiency enhancement of the inference phase of DNNs. The first one is model compression techniques while the second is efficient hardware implementation. Recent works on extremely-low-bit CNNs such as the binary neural network (BNN) and XNOR-Net replace the traditional floating point operations with binary bit operations which significantly reduces the memory bandwidth and storage requirement. However, it suffers from non-negligible accuracy loss and underutilized digital signal processing (DSP) blocks of FPGAs. To overcome these limitations, this paper proposes REQ-YOLO, a resource aware, systematic weight quantization framework for object detection, considering both algorithm and hardware resource aspects in object detection. We adopt the block-circulant matrix method and propose a heterogeneous weight quantization using Alternative Direction Method of Multipliers (ADMM), an effective optimization technique for general, non-convex optimization problems. To achieve real-time, highly-efficient implementations on FPGA, we present the detailed hardware implementation of block circulant matrices on CONV layers and develop an efficient processing element (PE) structure supporting the heterogeneous weight quantization, CONV dataflow and pipelining techniques, design optimization, and a template-based automatic synthesis framework to optimally exploit hardware resource. Experimental results show that our proposed REQ-YOLO framework can significantly compress the YOLO model while introducing very small accuracy degradation. The related codes are here: https://github.com/Anonymous788/heterogeneous_ADMM_YOLO.

@inproceedings{
 title = {REQ-YOLO: A resource-aware, efficient quantization framework for object detection on FPGAS},
 type = {inproceedings},
 year = {2019},
 keywords = {ADMM,Compression,FPGA,Object detection,YOLO},
 pages = {33-42},
 month = {2},
 publisher = {Association for Computing Machinery, Inc},
 day = {20},
 id = {52689d07-fec6-3f4e-bfd1-a41dc8b42f09},
 created = {2025-06-28T17:34:41.456Z},
 file_attached = {true},
 profile_id = {78e67dcc-28e6-3300-a4ed-85434b13f01f},
 group_id = {1ff583c0-be37-34fa-9c04-73c69437d354},
 last_modified = {2025-06-28T17:34:43.434Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {false},
 hidden = {false},
 folder_uuids = {e1046472-c458-4ece-b1ec-2fb78d2d5eb6},
 private_publication = {false},
 abstract = {Deep neural networks (DNNs), as the basis of object detection, will play a key role in the development of future autonomous systems with full autonomy. The autonomous systems have special requirements of real-time, energy-efficient implementations of DNNs on a power-constrained system. Two research thrusts are dedicated to performance and energy efficiency enhancement of the inference phase of DNNs. The first one is model compression techniques while the second is efficient hardware implementation. Recent works on extremely-low-bit CNNs such as the binary neural network (BNN) and XNOR-Net replace the traditional floating point operations with binary bit operations which significantly reduces the memory bandwidth and storage requirement. However, it suffers from non-negligible accuracy loss and underutilized digital signal processing (DSP) blocks of FPGAs. To overcome these limitations, this paper proposes REQ-YOLO, a resource aware, systematic weight quantization framework for object detection, considering both algorithm and hardware resource aspects in object detection. We adopt the block-circulant matrix method and propose a heterogeneous weight quantization using Alternative Direction Method of Multipliers (ADMM), an effective optimization technique for general, non-convex optimization problems. To achieve real-time, highly-efficient implementations on FPGA, we present the detailed hardware implementation of block circulant matrices on CONV layers and develop an efficient processing element (PE) structure supporting the heterogeneous weight quantization, CONV dataflow and pipelining techniques, design optimization, and a template-based automatic synthesis framework to optimally exploit hardware resource. Experimental results show that our proposed REQ-YOLO framework can significantly compress the YOLO model while introducing very small accuracy degradation. The related codes are here: https://github.com/Anonymous788/heterogeneous_ADMM_YOLO.},
 bibtype = {inproceedings},
 author = {Ding, Caiwen and Wang, Shuo and Liu, Ning and Xu, Kaidi and Wang, Yanzhi and Liang, Yun},
 doi = {10.1145/3289602.3293904},
 booktitle = {FPGA 2019 - Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays}
}

Downloads: 0

{"_id":"vFeB7L77iugkL7RG7","bibbaseid":"ding-wang-liu-xu-wang-liang-reqyoloaresourceawareefficientquantizationframeworkforobjectdetectiononfpgas-2019","author_short":["Ding, C.","Wang, S.","Liu, N.","Xu, K.","Wang, Y.","Liang, Y."],"bibdata":{"title":"REQ-YOLO: A resource-aware, efficient quantization framework for object detection on FPGAS","type":"inproceedings","year":"2019","keywords":"ADMM,Compression,FPGA,Object detection,YOLO","pages":"33-42","month":"2","publisher":"Association for Computing Machinery, Inc","day":"20","id":"52689d07-fec6-3f4e-bfd1-a41dc8b42f09","created":"2025-06-28T17:34:41.456Z","file_attached":"true","profile_id":"78e67dcc-28e6-3300-a4ed-85434b13f01f","group_id":"1ff583c0-be37-34fa-9c04-73c69437d354","last_modified":"2025-06-28T17:34:43.434Z","read":false,"starred":false,"authored":false,"confirmed":false,"hidden":false,"folder_uuids":"e1046472-c458-4ece-b1ec-2fb78d2d5eb6","private_publication":false,"abstract":"Deep neural networks (DNNs), as the basis of object detection, will play a key role in the development of future autonomous systems with full autonomy. The autonomous systems have special requirements of real-time, energy-efficient implementations of DNNs on a power-constrained system. Two research thrusts are dedicated to performance and energy efficiency enhancement of the inference phase of DNNs. The first one is model compression techniques while the second is efficient hardware implementation. Recent works on extremely-low-bit CNNs such as the binary neural network (BNN) and XNOR-Net replace the traditional floating point operations with binary bit operations which significantly reduces the memory bandwidth and storage requirement. However, it suffers from non-negligible accuracy loss and underutilized digital signal processing (DSP) blocks of FPGAs. To overcome these limitations, this paper proposes REQ-YOLO, a resource aware, systematic weight quantization framework for object detection, considering both algorithm and hardware resource aspects in object detection. We adopt the block-circulant matrix method and propose a heterogeneous weight quantization using Alternative Direction Method of Multipliers (ADMM), an effective optimization technique for general, non-convex optimization problems. To achieve real-time, highly-efficient implementations on FPGA, we present the detailed hardware implementation of block circulant matrices on CONV layers and develop an efficient processing element (PE) structure supporting the heterogeneous weight quantization, CONV dataflow and pipelining techniques, design optimization, and a template-based automatic synthesis framework to optimally exploit hardware resource. Experimental results show that our proposed REQ-YOLO framework can significantly compress the YOLO model while introducing very small accuracy degradation. The related codes are here: https://github.com/Anonymous788/heterogeneous_ADMM_YOLO.","bibtype":"inproceedings","author":"Ding, Caiwen and Wang, Shuo and Liu, Ning and Xu, Kaidi and Wang, Yanzhi and Liang, Yun","doi":"10.1145/3289602.3293904","booktitle":"FPGA 2019 - Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","bibtex":"@inproceedings{\n title = {REQ-YOLO: A resource-aware, efficient quantization framework for object detection on FPGAS},\n type = {inproceedings},\n year = {2019},\n keywords = {ADMM,Compression,FPGA,Object detection,YOLO},\n pages = {33-42},\n month = {2},\n publisher = {Association for Computing Machinery, Inc},\n day = {20},\n id = {52689d07-fec6-3f4e-bfd1-a41dc8b42f09},\n created = {2025-06-28T17:34:41.456Z},\n file_attached = {true},\n profile_id = {78e67dcc-28e6-3300-a4ed-85434b13f01f},\n group_id = {1ff583c0-be37-34fa-9c04-73c69437d354},\n last_modified = {2025-06-28T17:34:43.434Z},\n read = {false},\n starred = {false},\n authored = {false},\n confirmed = {false},\n hidden = {false},\n folder_uuids = {e1046472-c458-4ece-b1ec-2fb78d2d5eb6},\n private_publication = {false},\n abstract = {Deep neural networks (DNNs), as the basis of object detection, will play a key role in the development of future autonomous systems with full autonomy. The autonomous systems have special requirements of real-time, energy-efficient implementations of DNNs on a power-constrained system. Two research thrusts are dedicated to performance and energy efficiency enhancement of the inference phase of DNNs. The first one is model compression techniques while the second is efficient hardware implementation. Recent works on extremely-low-bit CNNs such as the binary neural network (BNN) and XNOR-Net replace the traditional floating point operations with binary bit operations which significantly reduces the memory bandwidth and storage requirement. However, it suffers from non-negligible accuracy loss and underutilized digital signal processing (DSP) blocks of FPGAs. To overcome these limitations, this paper proposes REQ-YOLO, a resource aware, systematic weight quantization framework for object detection, considering both algorithm and hardware resource aspects in object detection. We adopt the block-circulant matrix method and propose a heterogeneous weight quantization using Alternative Direction Method of Multipliers (ADMM), an effective optimization technique for general, non-convex optimization problems. To achieve real-time, highly-efficient implementations on FPGA, we present the detailed hardware implementation of block circulant matrices on CONV layers and develop an efficient processing element (PE) structure supporting the heterogeneous weight quantization, CONV dataflow and pipelining techniques, design optimization, and a template-based automatic synthesis framework to optimally exploit hardware resource. Experimental results show that our proposed REQ-YOLO framework can significantly compress the YOLO model while introducing very small accuracy degradation. The related codes are here: https://github.com/Anonymous788/heterogeneous_ADMM_YOLO.},\n bibtype = {inproceedings},\n author = {Ding, Caiwen and Wang, Shuo and Liu, Ning and Xu, Kaidi and Wang, Yanzhi and Liang, Yun},\n doi = {10.1145/3289602.3293904},\n booktitle = {FPGA 2019 - Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays}\n}","author_short":["Ding, C.","Wang, S.","Liu, N.","Xu, K.","Wang, Y.","Liang, Y."],"urls":{"Paper":"https://bibbase.org/service/mendeley/bfbbf840-4c42-3914-a463-19024f50b30c/file/5b7a4b15-fff5-60b7-032e-6d7c0946f44e/REQ_YOLO_A_Resource_Aware_Efficient_Quantization_Framework_for_Object_Detection_on_FPGAs.pdf.pdf"},"biburl":"https://bibbase.org/service/mendeley/bfbbf840-4c42-3914-a463-19024f50b30c","bibbaseid":"ding-wang-liu-xu-wang-liang-reqyoloaresourceawareefficientquantizationframeworkforobjectdetectiononfpgas-2019","role":"author","keyword":["ADMM","Compression","FPGA","Object detection","YOLO"],"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"inproceedings","biburl":"https://bibbase.org/service/mendeley/bfbbf840-4c42-3914-a463-19024f50b30c","dataSources":["2252seNhipfTmjEBQ"],"keywords":["admm","compression","fpga","object detection","yolo"],"search_terms":["req","yolo","resource","aware","efficient","quantization","framework","object","detection","fpgas","ding","wang","liu","xu","wang","liang"],"title":"REQ-YOLO: A resource-aware, efficient quantization framework for object detection on FPGAS","year":2019}