Design of an Always-On Deep Neural Network-Based 1-μW Voice Activity Detector Aided With a Customized Software Model for Analog Feature Extraction

Design of an Always-On Deep Neural Network-Based 1-μW Voice Activity Detector Aided With a Customized Software Model for Analog Feature Extraction. Yang, M., Yeh, C., Zhou, Y., Cerqueira, J. P., Lazar, A. A., & Seok, M. IEEE Journal of Solid-State Circuits, 2019.
doi abstract bibtex

This paper presents an ultra-low-power voice activity detector (VAD). It uses analog signal processing for acoustic feature extraction (AFE) directly on the microphone output, approximate event-driven analog-to-digital conversion (ED-ADC), and digital deep neural network (DNN) for speech/non-speech classification. New circuits, including the low-noise amplifier, bandpass filter, and full-wave rectifier contribute to the more than 9x normalized power/channel reduction in the feature extraction front-end compared to the best prior art. The digital DNN is a three-hidden-layer binarized multilayer perceptron (MLP) with a 2-neuron output layer and a 48-neuron input layer that receives parallel event streams from the ED-ADCs. To obtain the DNN weights via off-line training, a customized front-end model written in python is constructed to accelerate feature generation in software emulation, and the model parameters are extracted from Spectre simulations. The chip, fabricated in 0.18-μm CMOS, has a core area of 1.66 x 1.52 mm² and consumes 1 μW. The classification measurements using the 1-hour 10-dB signal-to-noise ratio audio with restaurant background noise show a mean speech/non-speech hit rate of 84.4%/85.4% with a 1.88%/4.65% 1-σ variation across ten dies that are all loaded with the same weights.

@article{yang_design_2019,
	title = {Design of an {Always}-{On} {Deep} {Neural} {Network}-{Based} 1-μ{W} {Voice} {Activity} {Detector} {Aided} {With} a {Customized} {Software} {Model} for {Analog} {Feature} {Extraction}},
	issn = {0018-9200},
	doi = {10.1109/JSSC.2019.2900860},
	abstract = {This paper presents an ultra-low-power voice activity detector (VAD). It uses analog signal processing for acoustic feature extraction (AFE) directly on the microphone output, approximate event-driven analog-to-digital conversion (ED-ADC), and digital deep neural network (DNN) for speech/non-speech classification. New circuits, including the low-noise amplifier, bandpass filter, and full-wave rectifier contribute to the more than 9x normalized power/channel reduction in the feature extraction front-end compared to the best prior art. The digital DNN is a three-hidden-layer binarized multilayer perceptron (MLP) with a 2-neuron output layer and a 48-neuron input layer that receives parallel event streams from the ED-ADCs. To obtain the DNN weights via off-line training, a customized front-end model written in python is constructed to accelerate feature generation in software emulation, and the model parameters are extracted from Spectre simulations. The chip, fabricated in 0.18-μm CMOS, has a core area of 1.66 x 1.52 mm² and consumes 1 μW. The classification measurements using the 1-hour 10-dB signal-to-noise ratio audio with restaurant background noise show a mean speech/non-speech hit rate of 84.4\%/85.4\% with a 1.88\%/4.65\% 1-σ variation across ten dies that are all loaded with the same weights.},
	journal = {IEEE Journal of Solid-State Circuits},
	author = {Yang, M. and Yeh, C. and Zhou, Y. and Cerqueira, J. P. and Lazar, A. A. and Seok, M.},
	year = {2019},
	pages = {1--14}
}

Downloads: 0

{"_id":"HvrECZvcnwEfArYeK","bibbaseid":"yang-yeh-zhou-cerqueira-lazar-seok-designofanalwaysondeepneuralnetworkbased1wvoiceactivitydetectoraidedwithacustomizedsoftwaremodelforanalogfeatureextraction-2019","authorIDs":[],"author_short":["Yang, M.","Yeh, C.","Zhou, Y.","Cerqueira, J. P.","Lazar, A. A.","Seok, M."],"bibdata":{"bibtype":"article","type":"article","title":"Design of an Always-On Deep Neural Network-Based 1-μW Voice Activity Detector Aided With a Customized Software Model for Analog Feature Extraction","issn":"0018-9200","doi":"10.1109/JSSC.2019.2900860","abstract":"This paper presents an ultra-low-power voice activity detector (VAD). It uses analog signal processing for acoustic feature extraction (AFE) directly on the microphone output, approximate event-driven analog-to-digital conversion (ED-ADC), and digital deep neural network (DNN) for speech/non-speech classification. New circuits, including the low-noise amplifier, bandpass filter, and full-wave rectifier contribute to the more than 9x normalized power/channel reduction in the feature extraction front-end compared to the best prior art. The digital DNN is a three-hidden-layer binarized multilayer perceptron (MLP) with a 2-neuron output layer and a 48-neuron input layer that receives parallel event streams from the ED-ADCs. To obtain the DNN weights via off-line training, a customized front-end model written in python is constructed to accelerate feature generation in software emulation, and the model parameters are extracted from Spectre simulations. The chip, fabricated in 0.18-μm CMOS, has a core area of 1.66 x 1.52 mm² and consumes 1 μW. The classification measurements using the 1-hour 10-dB signal-to-noise ratio audio with restaurant background noise show a mean speech/non-speech hit rate of 84.4%/85.4% with a 1.88%/4.65% 1-σ variation across ten dies that are all loaded with the same weights.","journal":"IEEE Journal of Solid-State Circuits","author":[{"propositions":[],"lastnames":["Yang"],"firstnames":["M."],"suffixes":[]},{"propositions":[],"lastnames":["Yeh"],"firstnames":["C."],"suffixes":[]},{"propositions":[],"lastnames":["Zhou"],"firstnames":["Y."],"suffixes":[]},{"propositions":[],"lastnames":["Cerqueira"],"firstnames":["J.","P."],"suffixes":[]},{"propositions":[],"lastnames":["Lazar"],"firstnames":["A.","A."],"suffixes":[]},{"propositions":[],"lastnames":["Seok"],"firstnames":["M."],"suffixes":[]}],"year":"2019","pages":"1–14","bibtex":"@article{yang_design_2019,\n\ttitle = {Design of an {Always}-{On} {Deep} {Neural} {Network}-{Based} 1-μ{W} {Voice} {Activity} {Detector} {Aided} {With} a {Customized} {Software} {Model} for {Analog} {Feature} {Extraction}},\n\tissn = {0018-9200},\n\tdoi = {10.1109/JSSC.2019.2900860},\n\tabstract = {This paper presents an ultra-low-power voice activity detector (VAD). It uses analog signal processing for acoustic feature extraction (AFE) directly on the microphone output, approximate event-driven analog-to-digital conversion (ED-ADC), and digital deep neural network (DNN) for speech/non-speech classification. New circuits, including the low-noise amplifier, bandpass filter, and full-wave rectifier contribute to the more than 9x normalized power/channel reduction in the feature extraction front-end compared to the best prior art. The digital DNN is a three-hidden-layer binarized multilayer perceptron (MLP) with a 2-neuron output layer and a 48-neuron input layer that receives parallel event streams from the ED-ADCs. To obtain the DNN weights via off-line training, a customized front-end model written in python is constructed to accelerate feature generation in software emulation, and the model parameters are extracted from Spectre simulations. The chip, fabricated in 0.18-μm CMOS, has a core area of 1.66 x 1.52 mm² and consumes 1 μW. The classification measurements using the 1-hour 10-dB signal-to-noise ratio audio with restaurant background noise show a mean speech/non-speech hit rate of 84.4\\%/85.4\\% with a 1.88\\%/4.65\\% 1-σ variation across ten dies that are all loaded with the same weights.},\n\tjournal = {IEEE Journal of Solid-State Circuits},\n\tauthor = {Yang, M. and Yeh, C. and Zhou, Y. and Cerqueira, J. P. and Lazar, A. A. and Seok, M.},\n\tyear = {2019},\n\tpages = {1--14}\n}\n\n","author_short":["Yang, M.","Yeh, C.","Zhou, Y.","Cerqueira, J. P.","Lazar, A. A.","Seok, M."],"key":"yang_design_2019","id":"yang_design_2019","bibbaseid":"yang-yeh-zhou-cerqueira-lazar-seok-designofanalwaysondeepneuralnetworkbased1wvoiceactivitydetectoraidedwithacustomizedsoftwaremodelforanalogfeatureextraction-2019","role":"author","urls":{},"downloads":0},"bibtype":"article","biburl":"https://bibbase.org/zotero/ky25","creationDate":"2019-05-11T17:47:04.202Z","downloads":0,"keywords":[],"search_terms":["design","always","deep","neural","network","based","voice","activity","detector","aided","customized","software","model","analog","feature","extraction","yang","yeh","zhou","cerqueira","lazar","seok"],"title":"Design of an Always-On Deep Neural Network-Based 1-μW Voice Activity Detector Aided With a Customized Software Model for Analog Feature Extraction","year":2019,"dataSources":["XxiQtwZYfozhQmvGR"]}