\n \n \n \n\n\n \n\n\n
\n
\n\n \n \n Nicolini, M., Simonetta, F., & Ntalampiras, S.\n\n\n \n \n \n Lightweight Audio-Based Human Activity Classification Using Transfer Learning.\n \n \n\n\n \n\n\n\n In pages 783–789, March 2023. \n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{nicolini_lightweight_2023,\n\ttitle = {Lightweight {Audio}-{Based} {Human} {Activity} {Classification} {Using} {Transfer} {Learning}},\n\tcopyright = {All rights reserved},\n\tisbn = {978-989-758-626-2},\n\turl = {https://www.scitepress.org/Papers/2023/116479/116479.pdf},\n\tdoi = {10.5220/0011647900003411},\n\tabstract = {This paper employs the acoustic modality to address the human activity recognition (HAR) problem. The cornerstone of the proposed solution is the YAMNet deep neural network, the embeddings of which comprise the input to a fully-connected linear layer trained for HAR. Importantly, the dataset is publicly available and includes the following human activities: preparing coffee, frying egg, no activity, showering, using microwave, washing dishes, washing hands, and washing teeth. The specific set of activities is representative of a standard home environment facilitating a wide range of applications. The performance offered by the proposed transfer learning-based framework surpasses the state of the art, while being able to be executed on mobile devices, such as smartphones, tablets, etc. In fact, the obtained model has been exported and thoroughly tested for real-time HAR on a smartphone device with the input being the audio captured from its microphone.},\n\turldate = {2023-03-06},\n\tauthor = {Nicolini, Marco and Simonetta, Federico and Ntalampiras, Stavros},\n\tmonth = mar,\n\tyear = {2023},\n\tpages = {783--789},\n}\n\n
\n
\n\n\n
\n This paper employs the acoustic modality to address the human activity recognition (HAR) problem. The cornerstone of the proposed solution is the YAMNet deep neural network, the embeddings of which comprise the input to a fully-connected linear layer trained for HAR. Importantly, the dataset is publicly available and includes the following human activities: preparing coffee, frying egg, no activity, showering, using microwave, washing dishes, washing hands, and washing teeth. The specific set of activities is representative of a standard home environment facilitating a wide range of applications. The performance offered by the proposed transfer learning-based framework surpasses the state of the art, while being able to be executed on mobile devices, such as smartphones, tablets, etc. In fact, the obtained model has been exported and thoroughly tested for real-time HAR on a smartphone device with the input being the audio captured from its microphone.\n
\n\n\n
\n\n\n
\n
\n\n \n \n Cozzatti, M., Simonetta, F., & Ntalampiras, S.\n\n\n \n \n \n Variational Autoencoders for Anomaly Detection in Respiratory Sounds.\n \n \n\n\n \n\n\n\n In
Artificial Neural Networks and Machine Learning – ICANN 2022, Cham, 2022. Springer Nature Switzerland\n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{cozzatti_variational_2022,\n\taddress = {Cham},\n\ttitle = {Variational {Autoencoders} for {Anomaly} {Detection} in {Respiratory} {Sounds}},\n\tcopyright = {All rights reserved},\n\turl = {https://arxiv.org/abs/2208.03326},\n\tdoi = {10.1007/978-3-031-15937-4_28},\n\tlanguage = {en},\n\turldate = {2022-11-05},\n\tbooktitle = {Artificial {Neural} {Networks} and {Machine} {Learning} – {ICANN} 2022},\n\tpublisher = {Springer Nature Switzerland},\n\tauthor = {Cozzatti, Michele and Simonetta, Federico and Ntalampiras, Stavros},\n\tyear = {2022},\n}\n\n
\n
\n\n\n\n
\n\n\n
\n\n\n
\n
\n\n \n \n Simonetta, F., Ntalampiras, S., & Avanzini, F.\n\n\n \n \n \n Acoustics-specific Piano Velocity Estimation.\n \n \n\n\n \n\n\n\n In
Proceedings of the IEEE MMSP 2022, 2022. \n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{simonetta_acoustics-specific_2022,\n\ttitle = {Acoustics-specific {Piano} {Velocity} {Estimation}},\n\tcopyright = {All rights reserved},\n\turl = {http://arxiv.org/abs/2203.16294},\n\tdoi = {10.1109/mmsp55362.2022.9948719},\n\tabstract = {Motivated by the state-of-art psychological research, we note that a piano performance transcribed with existing Automatic Music Transcription (AMT) methods cannot be successfully resynthesized without affecting the artistic content of the performance. This is due to 1) the different mappings between MIDI parameters used by different instruments, and 2) the fact that musicians adapt their way of playing to the surrounding acoustic environment. To face this issue, we propose a methodology to build acoustics-specific AMT systems that are able to model the adaptations that musicians apply to convey their interpretation. Specifically, we train models tailored for virtual instruments in a modular architecture that takes as input an audio recording and the relative aligned music score, and outputs the acoustics-specific velocities of each note. We test different model shapes and show that the proposed methodology generally outperforms the usual AMT pipeline which does not consider specificities of the instrument and of the acoustic environment. Interestingly, such a methodology is extensible in a straightforward way since only slight efforts are required to train models for the inference of other piano parameters, such as pedaling.},\n\turldate = {2022-04-06},\n\tbooktitle = {Proceedings of the {IEEE} {MMSP} 2022},\n\tauthor = {Simonetta, Federico and Ntalampiras, Stavros and Avanzini, Federico},\n\tyear = {2022},\n}\n\n
\n
\n\n\n
\n Motivated by the state-of-art psychological research, we note that a piano performance transcribed with existing Automatic Music Transcription (AMT) methods cannot be successfully resynthesized without affecting the artistic content of the performance. This is due to 1) the different mappings between MIDI parameters used by different instruments, and 2) the fact that musicians adapt their way of playing to the surrounding acoustic environment. To face this issue, we propose a methodology to build acoustics-specific AMT systems that are able to model the adaptations that musicians apply to convey their interpretation. Specifically, we train models tailored for virtual instruments in a modular architecture that takes as input an audio recording and the relative aligned music score, and outputs the acoustics-specific velocities of each note. We test different model shapes and show that the proposed methodology generally outperforms the usual AMT pipeline which does not consider specificities of the instrument and of the acoustic environment. Interestingly, such a methodology is extensible in a straightforward way since only slight efforts are required to train models for the inference of other piano parameters, such as pedaling.\n
\n\n\n
\n\n\n
\n
\n\n \n \n Simonetta, F., Ntalampiras, S., & Avanzini, F.\n\n\n \n \n \n Audio-to-Score Alignment Using Deep Automatic Music Transcription.\n \n \n\n\n \n\n\n\n In
Proceeddings of the IEEE MMSP 2021, 2021. \n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{simonetta_audio--score_2021,\n\ttitle = {Audio-to-{Score} {Alignment} {Using} {Deep} {Automatic} {Music} {Transcription}},\n\tcopyright = {Creative Commons Attribution 4.0 International},\n\turl = {https://arxiv.org/abs/2107.12854},\n\tdoi = {10.1109/mmsp53017.2021.9733531},\n\tabstract = {Audio-to-score alignment (A2SA) is a multimodal task consisting in the alignment of audio signals to music scores. Recent literature confirms the benefits of Automatic Music Transcription (AMT) for A2SA at the frame-level. In this work, we aim to elaborate on the exploitation of AMT Deep Learning (DL) models for achieving alignment at the note-level. We propose a method which benefits from HMM-based score-to-score alignment and AMT, showing a remarkable advancement beyond the state-of-the-art. We design a systematic procedure to take advantage of large datasets which do not offer an aligned score. Finally, we perform a thorough comparison and extensive tests on multiple datasets.},\n\tbooktitle = {Proceeddings of the {IEEE} {MMSP} 2021},\n\tauthor = {Simonetta, Federico and Ntalampiras, Stavros and Avanzini, Federico},\n\tyear = {2021},\n}\n\n
\n
\n\n\n
\n Audio-to-score alignment (A2SA) is a multimodal task consisting in the alignment of audio signals to music scores. Recent literature confirms the benefits of Automatic Music Transcription (AMT) for A2SA at the frame-level. In this work, we aim to elaborate on the exploitation of AMT Deep Learning (DL) models for achieving alignment at the note-level. We propose a method which benefits from HMM-based score-to-score alignment and AMT, showing a remarkable advancement beyond the state-of-the-art. We design a systematic procedure to take advantage of large datasets which do not offer an aligned score. Finally, we perform a thorough comparison and extensive tests on multiple datasets.\n
\n\n\n
\n\n\n
\n
\n\n \n \n Simonetta, F., Ntalampiras, S., & Avanzini, F.\n\n\n \n \n \n ASMD: an automatic framework for compiling multimodal datasets with audio and scores.\n \n \n\n\n \n\n\n\n In
Proceedings of the 17th Sound and Music Computing Conference, Torino, 2020. \n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{simonetta_asmd:_2020,\n\taddress = {Torino},\n\ttitle = {{ASMD}: an automatic framework for compiling multimodal datasets with audio and scores},\n\tcopyright = {All rights reserved},\n\turl = {https://air.unimi.it/handle/2434/748917},\n\tdoi = {10.5281/zenodo.3898666},\n\tabstract = {This paper describes an open-source Python framework for handling datasets for music processing tasks, built with the aim of improving the reproducibility of research projects in music computing and assessing the generalization abilities of machine learning models. The framework enables the automatic download and installation of several commonly used datasets for multimodal music processing. Specifically, we provide a Python API to access the datasets through Boolean set operations based on particular attributes, such as intersections and unions of composers, instruments, and so on. The framework is designed to ease the inclusion of new datasets and the respective ground-truth annotations so that one can build, convert, and extend one's own collection as well as distribute it by means of a compliant format to take advantage of the API. All code and ground-truth are released under suitable open licenses.},\n\tbooktitle = {Proceedings of the 17th {Sound} and {Music} {Computing} {Conference}},\n\tauthor = {Simonetta, Federico and Ntalampiras, Stavros and Avanzini, Federico},\n\tyear = {2020},\n}\n\n
\n
\n\n\n
\n This paper describes an open-source Python framework for handling datasets for music processing tasks, built with the aim of improving the reproducibility of research projects in music computing and assessing the generalization abilities of machine learning models. The framework enables the automatic download and installation of several commonly used datasets for multimodal music processing. Specifically, we provide a Python API to access the datasets through Boolean set operations based on particular attributes, such as intersections and unions of composers, instruments, and so on. The framework is designed to ease the inclusion of new datasets and the respective ground-truth annotations so that one can build, convert, and extend one's own collection as well as distribute it by means of a compliant format to take advantage of the API. All code and ground-truth are released under suitable open licenses.\n
\n\n\n
\n\n\n
\n\n\n
\n\n\n
\n
\n\n \n \n Simonetta, F., Cancino-Chacón, C. E., Ntalampiras, S., & Widmer, G.\n\n\n \n \n \n A convolutional approach to melody line identification in symbolic scores.\n \n \n\n\n \n\n\n\n In
Proceedings of the 20th international society for music information retrieval conference, pages 924–931, Delft, The Netherlands, November 2019. ISMIR\n
tex.venue: Delft, The Netherlands\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n \n \n 1 download\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{simonetta_convolutional_2019,\n\taddress = {Delft, The Netherlands},\n\ttitle = {A convolutional approach to melody line identification in symbolic scores},\n\tcopyright = {All rights reserved},\n\turl = {https://doi.org/10.5281/zenodo.3527966},\n\tdoi = {10.5281/zenodo.3527966},\n\tbooktitle = {Proceedings of the 20th international society for music information retrieval conference},\n\tpublisher = {ISMIR},\n\tauthor = {Simonetta, Federico and Cancino-Chacón, Carlos Eduardo and Ntalampiras, Stavros and Widmer, Gerhard},\n\tmonth = nov,\n\tyear = {2019},\n\tnote = {tex.venue: Delft, The Netherlands},\n\tpages = {924--931},\n}\n\n
\n
\n\n\n\n
\n\n\n
\n
\n\n \n \n Simonetta, F., Carnovalini, F., Orio, N., & Rodà, A.\n\n\n \n \n \n Symbolic Music Similarity through a Graph-based Representation.\n \n \n\n\n \n\n\n\n In
Proceedings of the Audio Mostly 2018 on Sound in Immersion and Emotion - AM'18, 2018. ACM Press\n
\n\n
\n\n
\n\n
\n\n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n \n \n 3 downloads\n \n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{simonetta_symbolic_2018,\n\ttitle = {Symbolic {Music} {Similarity} through a {Graph}-based {Representation}},\n\tcopyright = {All rights reserved},\n\tdoi = {10.1145/3243274.3243301},\n\tabstract = {In this work, a novel representation system for symbolic music is described. The proposed representation system is graph-based and could theoretically represent music both from a horizontal ( contrapuntal) and from a vertical (harmonic) point of view, by keeping into account contextual and harmonic information. It could also include relationships between internal variations of motifs and themes. This is achieved by gradually simplifying the melodies and generating layers of reductions that include only the most important notes from a structural and harmonic viewpoint. This representation system has been tested in a music information retrieval task, namely melodic similarity, and compared to another system that performs the same task but does not consider any contextual or harmonic information, showing how the structural information is needed in order to find certain relations between musical pieces. Moreover, a new dataset consisting of more than 5000 leadsheets is presented, with additional meta-musical information taken from different web databases, including author, year of first performance, lyrics, genre and stylistic tags.},\n\tbooktitle = {Proceedings of the {Audio} {Mostly} 2018 on {Sound} in {Immersion} and {Emotion} - {AM}'18},\n\tpublisher = {ACM Press},\n\tauthor = {Simonetta, Federico and Carnovalini, Filippo and Orio, Nicola and Rodà, Antonio},\n\tyear = {2018},\n}\n\n
\n
\n\n\n
\n In this work, a novel representation system for symbolic music is described. The proposed representation system is graph-based and could theoretically represent music both from a horizontal ( contrapuntal) and from a vertical (harmonic) point of view, by keeping into account contextual and harmonic information. It could also include relationships between internal variations of motifs and themes. This is achieved by gradually simplifying the melodies and generating layers of reductions that include only the most important notes from a structural and harmonic viewpoint. This representation system has been tested in a music information retrieval task, namely melodic similarity, and compared to another system that performs the same task but does not consider any contextual or harmonic information, showing how the structural information is needed in order to find certain relations between musical pieces. Moreover, a new dataset consisting of more than 5000 leadsheets is presented, with additional meta-musical information taken from different web databases, including author, year of first performance, lyrics, genre and stylistic tags.\n
\n\n\n
\n\n\n\n\n\n