MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding. Chou, Y., Chen, I., Chang, C., Ching, J., & Yang, Y. 2021. abstract bibtex An attempt to employ the mask language modeling approach of BERT to pre-train a 12-layer Transformer model for tackling a number of symbolic-domain discriminative music understanding tasks, finding that, given a pretrained Transformer, the models outperform recurrent neural network based baselines with less than 10 epochs of fine-tuning. This paper presents an attempt to employ the mask language modeling approach of BERT to pre-train a 12-layer Transformer model over 4,166 pieces of polyphonic piano MIDI files for tackling a number of symbolic-domain discriminative music understanding tasks. These include two note-level classification tasks, i.e., melody extraction and velocity prediction, as well as two sequence-level classification tasks, i.e., composer classification and emotion classification. We find that, given a pretrained Transformer, our models outperform recurrent neural network based baselines with less than 10 epochs of fine-tuning. Ablation studies show that the pre-training remains effective even if none of the MIDI data of the downstream tasks are seen at the pre-training stage, and that freezing the self-attention layers of the Transformer at the fine-tuning stage slightly degrades performance. All the five datasets employed in this work are publicly available, as well as checkpoints of our pre-trained and fine-tuned models. As such, our research can be taken as a benchmark for symbolic-domain music understanding.
@misc{chou_midibert-piano_2021,
title = {{MidiBERT}-{Piano}: {Large}-scale {Pre}-training for {Symbolic} {Music} {Understanding}},
shorttitle = {{MidiBERT}-{Piano}},
abstract = {An attempt to employ the mask language modeling approach of BERT to pre-train a 12-layer Transformer model for tackling a number of symbolic-domain discriminative music understanding tasks, finding that, given a pretrained Transformer, the models outperform recurrent neural network based baselines with less than 10 epochs of fine-tuning. This paper presents an attempt to employ the mask language modeling approach of BERT to pre-train a 12-layer Transformer model over 4,166 pieces of polyphonic piano MIDI files for tackling a number of symbolic-domain discriminative music understanding tasks. These include two note-level classification tasks, i.e., melody extraction and velocity prediction, as well as two sequence-level classification tasks, i.e., composer classification and emotion classification. We find that, given a pretrained Transformer, our models outperform recurrent neural network based baselines with less than 10 epochs of fine-tuning. Ablation studies show that the pre-training remains effective even if none of the MIDI data of the downstream tasks are seen at the pre-training stage, and that freezing the self-attention layers of the Transformer at the fine-tuning stage slightly degrades performance. All the five datasets employed in this work are publicly available, as well as checkpoints of our pre-trained and fine-tuned models. As such, our research can be taken as a benchmark for symbolic-domain music understanding.},
author = {Chou, Yi-Hui and Chen, I.-Chun and Chang, Chin-Jui and Ching, Joann and Yang, Yi-Hsuan},
year = {2021},
keywords = {Performance},
}
Downloads: 0
{"_id":"t48RejGJg4QsTD6aH","bibbaseid":"chou-chen-chang-ching-yang-midibertpianolargescalepretrainingforsymbolicmusicunderstanding-2021","author_short":["Chou, Y.","Chen, I.","Chang, C.","Ching, J.","Yang, Y."],"bibdata":{"bibtype":"misc","type":"misc","title":"MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding","shorttitle":"MidiBERT-Piano","abstract":"An attempt to employ the mask language modeling approach of BERT to pre-train a 12-layer Transformer model for tackling a number of symbolic-domain discriminative music understanding tasks, finding that, given a pretrained Transformer, the models outperform recurrent neural network based baselines with less than 10 epochs of fine-tuning. This paper presents an attempt to employ the mask language modeling approach of BERT to pre-train a 12-layer Transformer model over 4,166 pieces of polyphonic piano MIDI files for tackling a number of symbolic-domain discriminative music understanding tasks. These include two note-level classification tasks, i.e., melody extraction and velocity prediction, as well as two sequence-level classification tasks, i.e., composer classification and emotion classification. We find that, given a pretrained Transformer, our models outperform recurrent neural network based baselines with less than 10 epochs of fine-tuning. Ablation studies show that the pre-training remains effective even if none of the MIDI data of the downstream tasks are seen at the pre-training stage, and that freezing the self-attention layers of the Transformer at the fine-tuning stage slightly degrades performance. All the five datasets employed in this work are publicly available, as well as checkpoints of our pre-trained and fine-tuned models. As such, our research can be taken as a benchmark for symbolic-domain music understanding.","author":[{"propositions":[],"lastnames":["Chou"],"firstnames":["Yi-Hui"],"suffixes":[]},{"propositions":[],"lastnames":["Chen"],"firstnames":["I.-Chun"],"suffixes":[]},{"propositions":[],"lastnames":["Chang"],"firstnames":["Chin-Jui"],"suffixes":[]},{"propositions":[],"lastnames":["Ching"],"firstnames":["Joann"],"suffixes":[]},{"propositions":[],"lastnames":["Yang"],"firstnames":["Yi-Hsuan"],"suffixes":[]}],"year":"2021","keywords":"Performance","bibtex":"@misc{chou_midibert-piano_2021,\n\ttitle = {{MidiBERT}-{Piano}: {Large}-scale {Pre}-training for {Symbolic} {Music} {Understanding}},\n\tshorttitle = {{MidiBERT}-{Piano}},\n\tabstract = {An attempt to employ the mask language modeling approach of BERT to pre-train a 12-layer Transformer model for tackling a number of symbolic-domain discriminative music understanding tasks, finding that, given a pretrained Transformer, the models outperform recurrent neural network based baselines with less than 10 epochs of fine-tuning. This paper presents an attempt to employ the mask language modeling approach of BERT to pre-train a 12-layer Transformer model over 4,166 pieces of polyphonic piano MIDI files for tackling a number of symbolic-domain discriminative music understanding tasks. These include two note-level classification tasks, i.e., melody extraction and velocity prediction, as well as two sequence-level classification tasks, i.e., composer classification and emotion classification. We find that, given a pretrained Transformer, our models outperform recurrent neural network based baselines with less than 10 epochs of fine-tuning. Ablation studies show that the pre-training remains effective even if none of the MIDI data of the downstream tasks are seen at the pre-training stage, and that freezing the self-attention layers of the Transformer at the fine-tuning stage slightly degrades performance. All the five datasets employed in this work are publicly available, as well as checkpoints of our pre-trained and fine-tuned models. As such, our research can be taken as a benchmark for symbolic-domain music understanding.},\n\tauthor = {Chou, Yi-Hui and Chen, I.-Chun and Chang, Chin-Jui and Ching, Joann and Yang, Yi-Hsuan},\n\tyear = {2021},\n\tkeywords = {Performance},\n}\n\n\n\n","author_short":["Chou, Y.","Chen, I.","Chang, C.","Ching, J.","Yang, Y."],"key":"chou_midibert-piano_2021","id":"chou_midibert-piano_2021","bibbaseid":"chou-chen-chang-ching-yang-midibertpianolargescalepretrainingforsymbolicmusicunderstanding-2021","role":"author","urls":{},"keyword":["Performance"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"misc","biburl":"https://bibbase.org/zotero/fsimonetta","dataSources":["pzyFFGWvxG2bs63zP"],"keywords":["performance"],"search_terms":["midibert","piano","large","scale","pre","training","symbolic","music","understanding","chou","chen","chang","ching","yang"],"title":"MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding","year":2021}