Bi-Real Net: Enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm

Bi-Real Net: Enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm. Liu, Z., Wu, B., Luo, W., Yang, X., Liu, W., & Cheng, K., T. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11219 LNCS:747-763, 2018.
doi abstract bibtex

In this work, we study the 1-bit convolutional neural networks (CNNs), of which both the weights and activations are binary. While being efficient, the classification accuracy of the current 1-bit CNNs is much worse compared to their counterpart real-valued CNN models on the large-scale dataset, like ImageNet. To minimize the performance gap between the 1-bit and real-valued CNN models, we propose a novel model, dubbed Bi-Real net, which connects the real activations (after the 1-bit convolution and/or BatchNorm layer, before the sign function) to activations of the consecutive block, through an identity shortcut. Consequently, compared to the standard 1-bit CNN, the representational capability of the Bi-Real net is significantly enhanced and the additional cost on computation is negligible. Moreover, we develop a specific training algorithm including three technical novelties for 1-bit CNNs. Firstly, we derive a tight approximation to the derivative of the non-differentiable sign function with respect to activation. Secondly, we propose a magnitude-aware gradient with respect to the weight for updating the weight parameters. Thirdly, we pre-train the real-valued CNN model with a clip function, rather than the ReLU function, to better initialize the Bi-Real net. Experiments on ImageNet show that the Bi-Real net with the proposed training algorithm achieves 56.4% and 62.2% top-1 accuracy with 18 layers and 34 layers, respectively. Compared to the state-of-the-arts (e.g., XNOR Net), Bi-Real net achieves up to 10% higher top-1 accuracy with more memory saving and lower computational cost.

@article{
 title = {Bi-Real Net: Enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm},
 type = {article},
 year = {2018},
 pages = {747-763},
 volume = {11219 LNCS},
 id = {bc13cae3-8b4f-3c90-ac8d-085b330bc608},
 created = {2022-07-05T12:32:33.753Z},
 file_attached = {false},
 profile_id = {bfbbf840-4c42-3914-a463-19024f50b30c},
 group_id = {1ff583c0-be37-34fa-9c04-73c69437d354},
 last_modified = {2022-07-05T12:32:34.420Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {true},
 hidden = {false},
 folder_uuids = {9972d981-f25e-4229-94fb-1c4fc6296c30},
 private_publication = {false},
 abstract = {In this work, we study the 1-bit convolutional neural networks (CNNs), of which both the weights and activations are binary. While being efficient, the classification accuracy of the current 1-bit CNNs is much worse compared to their counterpart real-valued CNN models on the large-scale dataset, like ImageNet. To minimize the performance gap between the 1-bit and real-valued CNN models, we propose a novel model, dubbed Bi-Real net, which connects the real activations (after the 1-bit convolution and/or BatchNorm layer, before the sign function) to activations of the consecutive block, through an identity shortcut. Consequently, compared to the standard 1-bit CNN, the representational capability of the Bi-Real net is significantly enhanced and the additional cost on computation is negligible. Moreover, we develop a specific training algorithm including three technical novelties for 1-bit CNNs. Firstly, we derive a tight approximation to the derivative of the non-differentiable sign function with respect to activation. Secondly, we propose a magnitude-aware gradient with respect to the weight for updating the weight parameters. Thirdly, we pre-train the real-valued CNN model with a clip function, rather than the ReLU function, to better initialize the Bi-Real net. Experiments on ImageNet show that the Bi-Real net with the proposed training algorithm achieves 56.4% and 62.2% top-1 accuracy with 18 layers and 34 layers, respectively. Compared to the state-of-the-arts (e.g., XNOR Net), Bi-Real net achieves up to 10% higher top-1 accuracy with more memory saving and lower computational cost.},
 bibtype = {article},
 author = {Liu, Zechun and Wu, Baoyuan and Luo, Wenhan and Yang, Xin and Liu, Wei and Cheng, Kwang Ting},
 doi = {10.1007/978-3-030-01267-0_44},
 journal = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)}
}

Downloads: 0

{"_id":"nL6HWpsn8Re6FWrBB","bibbaseid":"liu-wu-luo-yang-liu-cheng-birealnetenhancingtheperformanceof1bitcnnswithimprovedrepresentationalcapabilityandadvancedtrainingalgorithm-2018","author_short":["Liu, Z.","Wu, B.","Luo, W.","Yang, X.","Liu, W.","Cheng, K., T."],"bibdata":{"title":"Bi-Real Net: Enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm","type":"article","year":"2018","pages":"747-763","volume":"11219 LNCS","id":"bc13cae3-8b4f-3c90-ac8d-085b330bc608","created":"2022-07-05T12:32:33.753Z","file_attached":false,"profile_id":"bfbbf840-4c42-3914-a463-19024f50b30c","group_id":"1ff583c0-be37-34fa-9c04-73c69437d354","last_modified":"2022-07-05T12:32:34.420Z","read":false,"starred":false,"authored":false,"confirmed":"true","hidden":false,"folder_uuids":"9972d981-f25e-4229-94fb-1c4fc6296c30","private_publication":false,"abstract":"In this work, we study the 1-bit convolutional neural networks (CNNs), of which both the weights and activations are binary. While being efficient, the classification accuracy of the current 1-bit CNNs is much worse compared to their counterpart real-valued CNN models on the large-scale dataset, like ImageNet. To minimize the performance gap between the 1-bit and real-valued CNN models, we propose a novel model, dubbed Bi-Real net, which connects the real activations (after the 1-bit convolution and/or BatchNorm layer, before the sign function) to activations of the consecutive block, through an identity shortcut. Consequently, compared to the standard 1-bit CNN, the representational capability of the Bi-Real net is significantly enhanced and the additional cost on computation is negligible. Moreover, we develop a specific training algorithm including three technical novelties for 1-bit CNNs. Firstly, we derive a tight approximation to the derivative of the non-differentiable sign function with respect to activation. Secondly, we propose a magnitude-aware gradient with respect to the weight for updating the weight parameters. Thirdly, we pre-train the real-valued CNN model with a clip function, rather than the ReLU function, to better initialize the Bi-Real net. Experiments on ImageNet show that the Bi-Real net with the proposed training algorithm achieves 56.4% and 62.2% top-1 accuracy with 18 layers and 34 layers, respectively. Compared to the state-of-the-arts (e.g., XNOR Net), Bi-Real net achieves up to 10% higher top-1 accuracy with more memory saving and lower computational cost.","bibtype":"article","author":"Liu, Zechun and Wu, Baoyuan and Luo, Wenhan and Yang, Xin and Liu, Wei and Cheng, Kwang Ting","doi":"10.1007/978-3-030-01267-0_44","journal":"Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)","bibtex":"@article{\n title = {Bi-Real Net: Enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm},\n type = {article},\n year = {2018},\n pages = {747-763},\n volume = {11219 LNCS},\n id = {bc13cae3-8b4f-3c90-ac8d-085b330bc608},\n created = {2022-07-05T12:32:33.753Z},\n file_attached = {false},\n profile_id = {bfbbf840-4c42-3914-a463-19024f50b30c},\n group_id = {1ff583c0-be37-34fa-9c04-73c69437d354},\n last_modified = {2022-07-05T12:32:34.420Z},\n read = {false},\n starred = {false},\n authored = {false},\n confirmed = {true},\n hidden = {false},\n folder_uuids = {9972d981-f25e-4229-94fb-1c4fc6296c30},\n private_publication = {false},\n abstract = {In this work, we study the 1-bit convolutional neural networks (CNNs), of which both the weights and activations are binary. While being efficient, the classification accuracy of the current 1-bit CNNs is much worse compared to their counterpart real-valued CNN models on the large-scale dataset, like ImageNet. To minimize the performance gap between the 1-bit and real-valued CNN models, we propose a novel model, dubbed Bi-Real net, which connects the real activations (after the 1-bit convolution and/or BatchNorm layer, before the sign function) to activations of the consecutive block, through an identity shortcut. Consequently, compared to the standard 1-bit CNN, the representational capability of the Bi-Real net is significantly enhanced and the additional cost on computation is negligible. Moreover, we develop a specific training algorithm including three technical novelties for 1-bit CNNs. Firstly, we derive a tight approximation to the derivative of the non-differentiable sign function with respect to activation. Secondly, we propose a magnitude-aware gradient with respect to the weight for updating the weight parameters. Thirdly, we pre-train the real-valued CNN model with a clip function, rather than the ReLU function, to better initialize the Bi-Real net. Experiments on ImageNet show that the Bi-Real net with the proposed training algorithm achieves 56.4% and 62.2% top-1 accuracy with 18 layers and 34 layers, respectively. Compared to the state-of-the-arts (e.g., XNOR Net), Bi-Real net achieves up to 10% higher top-1 accuracy with more memory saving and lower computational cost.},\n bibtype = {article},\n author = {Liu, Zechun and Wu, Baoyuan and Luo, Wenhan and Yang, Xin and Liu, Wei and Cheng, Kwang Ting},\n doi = {10.1007/978-3-030-01267-0_44},\n journal = {Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)}\n}","author_short":["Liu, Z.","Wu, B.","Luo, W.","Yang, X.","Liu, W.","Cheng, K., T."],"biburl":"https://bibbase.org/service/mendeley/bfbbf840-4c42-3914-a463-19024f50b30c","bibbaseid":"liu-wu-luo-yang-liu-cheng-birealnetenhancingtheperformanceof1bitcnnswithimprovedrepresentationalcapabilityandadvancedtrainingalgorithm-2018","role":"author","urls":{},"metadata":{"authorlinks":{}},"downloads":0},"bibtype":"article","biburl":"https://bibbase.org/service/mendeley/bfbbf840-4c42-3914-a463-19024f50b30c","dataSources":["2252seNhipfTmjEBQ"],"keywords":[],"search_terms":["real","net","enhancing","performance","bit","cnns","improved","representational","capability","advanced","training","algorithm","liu","wu","luo","yang","liu","cheng"],"title":"Bi-Real Net: Enhancing the performance of 1-bit CNNs with improved representational capability and advanced training algorithm","year":2018}