Not All Roads Lead to Rome: Pitch Representation and Model Architecture for Automatic Harmonic Analysis. Micchi, G.; Gotham, M.; and Giraud, M. Transactions of the International Society for Music Information Retrieval, 3(1):42–54, 2020.
doi  abstract   bibtex   
Automatic harmonic analysis has been an enduring focus of the MIR community, and has enjoyed a particularly vigorous revival of interest in the machine-learning age. We focus here on the specific case of Roman numeral analysis which, by virtue of requiring key/functional information in addition to chords, may be viewed as an acutely challenging use case. We report on three main developments. First, we provide a new meta-corpus bringing together all existing Roman numeral analysis datasets; this offers greater scale and diversity, not only of the music represented, but also of human analytical viewpoints. Second, we examine best practices in the encoding of pitch, time, and harmony for machine learning tasks. The main contribution here is the introduction of full pitch spelling to such a system, an absolute must for the comprehensive study of musical harmony. Third, we devised and tested several neural network architectures and compared their relative accuracy. In the best-performing of these models, convolutional layers gather the local information needed to analyse the chord at a given moment while a recurrent part learns longer-range harmonic progressions. Altogether, our best representation and architecture produce a small but significant improvement on overall accuracy while simultaneously integrating full pitch spelling. This enables the system to retain important information from the musical sources and provide more meaningful predictions for any new input.
@article{micchi.ea2020-not,
  abstract      = {Automatic harmonic analysis has been an enduring
                  focus of the MIR community, and has enjoyed a
                  particularly vigorous revival of interest in the
                  machine-learning age. We focus here on the specific
                  case of Roman numeral analysis which, by virtue of
                  requiring key/functional information in addition to
                  chords, may be viewed as an acutely challenging use
                  case. We report on three main developments. First,
                  we provide a new meta-corpus bringing together all
                  existing Roman numeral analysis datasets; this
                  offers greater scale and diversity, not only of the
                  music represented, but also of human analytical
                  viewpoints. Second, we examine best practices in the
                  encoding of pitch, time, and harmony for machine
                  learning tasks. The main contribution here is the
                  introduction of full pitch spelling to such a
                  system, an absolute must for the comprehensive study
                  of musical harmony. Third, we devised and tested
                  several neural network architectures and compared
                  their relative accuracy. In the best-performing of
                  these models, convolutional layers gather the local
                  information needed to analyse the chord at a given
                  moment while a recurrent part learns longer-range
                  harmonic progressions. Altogether, our best
                  representation and architecture produce a small but
                  significant improvement on overall accuracy while
                  simultaneously integrating full pitch spelling. This
                  enables the system to retain important information
                  from the musical sources and provide more meaningful
                  predictions for any new input.},
  author        = {Micchi, Gianluca and Gotham, Mark and Giraud,
                  Mathieu},
  doi           = {10.5334/tismir.45},
  journal       = {Transactions of the International Society for Music
                  Information Retrieval},
  keywords      = {1,1 key,chords and functional harmony,computational
                  musicology,corpus,functional harmony,introduction,is
                  common to a,machine learning,motivation,pitch
                  encoding,previous work,roman numeral analysis,some
                  sense of,tonal harmony,very wide},
  mendeley-tags = {computational musicology},
  number        = 1,
  pages         = {42--54},
  title         = {{Not All Roads Lead to Rome: Pitch Representation
                  and Model Architecture for Automatic Harmonic
                  Analysis}},
  volume        = 3,
  year          = 2020
}
Downloads: 0