A Fusion of Deep Convolutional Generative Adversarial Networks and Sequence to Sequence Autoencoders for Acoustic Scene Classification. Arniriparian, S., Freitag, M., Cummins, N., Gerczuk, M., Pugachevskiy, S., & Schuller, B. In 2018 26th European Signal Processing Conference (EUSIPCO), pages 977-981, Sep., 2018.
A Fusion of Deep Convolutional Generative Adversarial Networks and Sequence to Sequence Autoencoders for Acoustic Scene Classification [pdf]Paper  doi  abstract   bibtex   
Unsupervised representation learning shows high promise for generating robust features for acoustic scene analysis. In this regard, we propose and investigate a novel combination of features learnt using both a deep convolutional generative adversarial network (DCGAN) and a recurrent sequence to sequence autoencoder (S2SAE). Each of the representation learning algorithms is trained individually on spectral features extracted from audio instances. The learnt representations are: (i) the activations of the discriminator in case of the DCGAN, and (ii) the activations of a fully connected layer between the decoder and encoder units in case of the S2SAE. We then train two multilayer perceptron neural networks on the DCGAN and S2SAE feature vectors to predict the class labels. The individual predicted labels are combined in a weighted decision-level fusion to achieve the final prediction. The system is evaluated on the development partition of the acoustic scene classification data set of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2017). In comparison to the baseline, the accuracy is increased from 74.8 % to 86.4 % using only the DCGAN, to 88.5 % on the development set using only the S2SAE, and to 91.1 % after fusion of the individual predictions.

Downloads: 0