The munich LSTM-RNN approach to the MediaEval 2014 "Emotion in Music" Task. Coutinho, E., Weninger, F., Schuller, B., & Scherer, K., R. In CEUR Workshop Proceedings, volume 1263, 2014.
The munich LSTM-RNN approach to the MediaEval 2014 "Emotion in Music" Task [link]Website  abstract   bibtex   
In this paper we describe TUM's approach for the MediaEval's \Emotion in Music" task. The goal of this task is to automatically estimate the emotions expressed by music (in terms of Arousal and Valence) in a time-continuous fashion. Our system consists of Long-Short Term Memory Recurrent Neural Networks (LSTM-RNN) for dynamic Arousal and Valence regression. We used two di erent sets of acoustic and psychoacoustic features that have been previously proven as e ective for emotion prediction in music and speech. The best model yielded an average Pearson's correlation coe-cient of 0.354 (Arousal) and 0.198 (Valence), and an average Root Mean Squared Error of 0.102 (Arousal) and 0.079 (Valence).
@inproceedings{
 title = {The munich LSTM-RNN approach to the MediaEval 2014 "Emotion in Music" Task},
 type = {inproceedings},
 year = {2014},
 volume = {1263},
 websites = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-84909953530&partnerID=40&md5=96b8657503c52119dd83867ccbdc3264},
 id = {87114ac2-214b-3813-b485-ba0f7153d1fd},
 created = {2020-05-27T15:09:41.796Z},
 file_attached = {true},
 profile_id = {ffa9027c-806a-3827-93a1-02c42eb146a1},
 last_modified = {2021-06-18T09:04:34.619Z},
 read = {false},
 starred = {false},
 authored = {true},
 confirmed = {true},
 hidden = {false},
 citation_key = {Coutinho2014a},
 source_type = {CONF},
 notes = {cited By 11},
 private_publication = {false},
 abstract = {In this paper we describe TUM's approach for the MediaEval's \Emotion in Music" task. The goal of this task is to automatically estimate the emotions expressed by music (in terms of Arousal and Valence) in a time-continuous fashion. Our system consists of Long-Short Term Memory Recurrent Neural Networks (LSTM-RNN) for dynamic Arousal and Valence regression. We used two di erent sets of acoustic and psychoacoustic features that have been previously proven as e ective for emotion prediction in music and speech. The best model yielded an average Pearson's correlation coe-cient of 0.354 (Arousal) and 0.198 (Valence), and an average Root Mean Squared Error of 0.102 (Arousal) and 0.079 (Valence).},
 bibtype = {inproceedings},
 author = {Coutinho, Eduardo and Weninger, Felix and Schuller, Björn and Scherer, Klaus R.},
 booktitle = {CEUR Workshop Proceedings}
}

Downloads: 0