Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation

Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation. Stoter, F., R., Chakrabarty, S., Edler, B., & Habets, E., A. In ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, volume 2018-April, pages 436-440, 9, 2018. Institute of Electrical and Electronics Engineers Inc..

Paper abstract bibtex

The task of estimating the maximum number of concurrent speakers from single channel mixtures is important for various audio-based applications, such as blind source separation, speaker diarisation, audio surveillance or auditory scene classification. Building upon powerful machine learning methodology, we develop a Deep Neural Network (DNN) that estimates a speaker count. While DNNs efficiently map input representations to output targets, it remains unclear how to best handle the network output to infer integer source count estimates, as a discrete count estimate can either be tackled as a regression or a classification problem. In this paper, we investigate this important design decision and also address complementary parameter choices such as the input representation. We evaluate a state-of-the-art DNN audio model based on a Bi-directional Long Short-Term Memory network architecture for speaker count estimations. Through experimental evaluations aimed at identifying the best overall strategy for the task and show results for five seconds speech segments in mixtures of up to ten speakers.

@inProceedings{
 title = {Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation},
 type = {inProceedings},
 year = {2018},
 identifiers = {[object Object]},
 keywords = {Cocktail-party,Number of concurrent speakers,Overlapped speech,Speaker count estimation},
 pages = {436-440},
 volume = {2018-April},
 month = {9},
 publisher = {Institute of Electrical and Electronics Engineers Inc.},
 day = {10},
 id = {d4b80c3f-5dec-3331-bf7c-48cc28a80e83},
 created = {2019-04-17T12:22:36.377Z},
 file_attached = {true},
 profile_id = {e83ca6f0-89d2-36f2-9e0e-4ed1884d3df3},
 group_id = {b9518282-8bc2-37c7-8d87-8b371e2f819d},
 last_modified = {2019-04-17T12:22:36.776Z},
 read = {false},
 starred = {false},
 authored = {false},
 confirmed = {false},
 hidden = {false},
 private_publication = {false},
 abstract = {The task of estimating the maximum number of concurrent speakers from single channel mixtures is important for various audio-based applications, such as blind source separation, speaker diarisation, audio surveillance or auditory scene classification. Building upon powerful machine learning methodology, we develop a Deep Neural Network (DNN) that estimates a speaker count. While DNNs efficiently map input representations to output targets, it remains unclear how to best handle the network output to infer integer source count estimates, as a discrete count estimate can either be tackled as a regression or a classification problem. In this paper, we investigate this important design decision and also address complementary parameter choices such as the input representation. We evaluate a state-of-the-art DNN audio model based on a Bi-directional Long Short-Term Memory network architecture for speaker count estimations. Through experimental evaluations aimed at identifying the best overall strategy for the task and show results for five seconds speech segments in mixtures of up to ten speakers.},
 bibtype = {inProceedings},
 author = {Stoter, Fabian Robert and Chakrabarty, Soumitro and Edler, Bernd and Habets, Emanuel A.P.},
 booktitle = {ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings}
}

Downloads: 0

{"_id":"yMY2PehSPvYshwu6C","bibbaseid":"stoter-chakrabarty-edler-habets-classificationvsregressioninsupervisedlearningforsinglechannelspeakercountestimation-2018","downloads":0,"creationDate":"2019-04-17T12:23:34.351Z","title":"Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation","author_short":["Stoter, F., R.","Chakrabarty, S.","Edler, B.","Habets, E., A."],"year":2018,"bibtype":"inProceedings","biburl":null,"bibdata":{"title":"Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation","type":"inProceedings","year":"2018","identifiers":"[object Object]","keywords":"Cocktail-party,Number of concurrent speakers,Overlapped speech,Speaker count estimation","pages":"436-440","volume":"2018-April","month":"9","publisher":"Institute of Electrical and Electronics Engineers Inc.","day":"10","id":"d4b80c3f-5dec-3331-bf7c-48cc28a80e83","created":"2019-04-17T12:22:36.377Z","file_attached":"true","profile_id":"e83ca6f0-89d2-36f2-9e0e-4ed1884d3df3","group_id":"b9518282-8bc2-37c7-8d87-8b371e2f819d","last_modified":"2019-04-17T12:22:36.776Z","read":false,"starred":false,"authored":false,"confirmed":false,"hidden":false,"private_publication":false,"abstract":"The task of estimating the maximum number of concurrent speakers from single channel mixtures is important for various audio-based applications, such as blind source separation, speaker diarisation, audio surveillance or auditory scene classification. Building upon powerful machine learning methodology, we develop a Deep Neural Network (DNN) that estimates a speaker count. While DNNs efficiently map input representations to output targets, it remains unclear how to best handle the network output to infer integer source count estimates, as a discrete count estimate can either be tackled as a regression or a classification problem. In this paper, we investigate this important design decision and also address complementary parameter choices such as the input representation. We evaluate a state-of-the-art DNN audio model based on a Bi-directional Long Short-Term Memory network architecture for speaker count estimations. Through experimental evaluations aimed at identifying the best overall strategy for the task and show results for five seconds speech segments in mixtures of up to ten speakers.","bibtype":"inProceedings","author":"Stoter, Fabian Robert and Chakrabarty, Soumitro and Edler, Bernd and Habets, Emanuel A.P.","booktitle":"ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings","bibtex":"@inProceedings{\n title = {Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation},\n type = {inProceedings},\n year = {2018},\n identifiers = {[object Object]},\n keywords = {Cocktail-party,Number of concurrent speakers,Overlapped speech,Speaker count estimation},\n pages = {436-440},\n volume = {2018-April},\n month = {9},\n publisher = {Institute of Electrical and Electronics Engineers Inc.},\n day = {10},\n id = {d4b80c3f-5dec-3331-bf7c-48cc28a80e83},\n created = {2019-04-17T12:22:36.377Z},\n file_attached = {true},\n profile_id = {e83ca6f0-89d2-36f2-9e0e-4ed1884d3df3},\n group_id = {b9518282-8bc2-37c7-8d87-8b371e2f819d},\n last_modified = {2019-04-17T12:22:36.776Z},\n read = {false},\n starred = {false},\n authored = {false},\n confirmed = {false},\n hidden = {false},\n private_publication = {false},\n abstract = {The task of estimating the maximum number of concurrent speakers from single channel mixtures is important for various audio-based applications, such as blind source separation, speaker diarisation, audio surveillance or auditory scene classification. Building upon powerful machine learning methodology, we develop a Deep Neural Network (DNN) that estimates a speaker count. While DNNs efficiently map input representations to output targets, it remains unclear how to best handle the network output to infer integer source count estimates, as a discrete count estimate can either be tackled as a regression or a classification problem. In this paper, we investigate this important design decision and also address complementary parameter choices such as the input representation. We evaluate a state-of-the-art DNN audio model based on a Bi-directional Long Short-Term Memory network architecture for speaker count estimations. Through experimental evaluations aimed at identifying the best overall strategy for the task and show results for five seconds speech segments in mixtures of up to ten speakers.},\n bibtype = {inProceedings},\n author = {Stoter, Fabian Robert and Chakrabarty, Soumitro and Edler, Bernd and Habets, Emanuel A.P.},\n booktitle = {ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings}\n}","author_short":["Stoter, F., R.","Chakrabarty, S.","Edler, B.","Habets, E., A."],"urls":{"Paper":"https://bibbase.org/service/mendeley/e83ca6f0-89d2-36f2-9e0e-4ed1884d3df3/file/1ece8be6-09a1-adb9-3440-f0cccf16da93/2018-Classification_vs._Regression_in_Supervised_Learning_for_Single_Channel_Speaker_Count_Estimation.pdf.pdf"},"bibbaseid":"stoter-chakrabarty-edler-habets-classificationvsregressioninsupervisedlearningforsinglechannelspeakercountestimation-2018","role":"author","keyword":["Cocktail-party","Number of concurrent speakers","Overlapped speech","Speaker count estimation"],"downloads":0},"search_terms":["classification","regression","supervised","learning","single","channel","speaker","count","estimation","stoter","chakrabarty","edler","habets"],"keywords":["cocktail-party","number of concurrent speakers","overlapped speech","speaker count estimation"],"authorIDs":[]}