MIRDATA: SOFTWARE FOR REPRODUCIBLE USAGE OF DATASETS

MIRDATA: SOFTWARE FOR REPRODUCIBLE USAGE OF DATASETS. Bittner, R. M, Fuentes, M., Rubinstein, D., Jansson, A., Choi, K., & Kell, T. In International Conference on Music Information Retrieval (ISMIR), pages 8, 2019.
abstract bibtex

There are a number of efforts in the MIR community towards increased reproducibility, such as creating more open datasets, publishing code, and the use of common software libraries, e.g. for evaluation. However, when it comes to datasets, there is usually little guarantee that researchers are using the exact same data in the same way, which among other issues, makes comparisons of different methods on the “same” datasets problematic. In this paper, we ﬁrst show how (often unknown) differences in datasets can lead to signiﬁcantly different experimental results. We propose a solution to these problems in the form of an open source library, mirdata, which handles datasets in their current distribution modes, but controls for possible variability. In particular, it contains tools which: (1) validate if the user’s data (e.g. audio, annotations) is consistent with a canonical version of the dataset; (2) load annotations in a consistent manner; (3) download or give instructions for obtaining data; and (4) make it easy to perform track metadata-speciﬁc analysis.

@inproceedings{bittner_mirdata_2019,
	title = {{MIRDATA}: {SOFTWARE} {FOR} {REPRODUCIBLE} {USAGE} {OF} {DATASETS}},
	abstract = {There are a number of efforts in the MIR community towards increased reproducibility, such as creating more open datasets, publishing code, and the use of common software libraries, e.g. for evaluation. However, when it comes to datasets, there is usually little guarantee that researchers are using the exact same data in the same way, which among other issues, makes comparisons of different methods on the “same” datasets problematic. In this paper, we ﬁrst show how (often unknown) differences in datasets can lead to signiﬁcantly different experimental results. We propose a solution to these problems in the form of an open source library, mirdata, which handles datasets in their current distribution modes, but controls for possible variability. In particular, it contains tools which: (1) validate if the user’s data (e.g. audio, annotations) is consistent with a canonical version of the dataset; (2) load annotations in a consistent manner; (3) download or give instructions for obtaining data; and (4) make it easy to perform track metadata-speciﬁc analysis.},
	language = {en},
	booktitle = {International {Conference} on {Music} {Information} {Retrieval} ({ISMIR})},
	author = {Bittner, Rachel M and Fuentes, Magdalena and Rubinstein, David and Jansson, Andreas and Choi, Keunwoo and Kell, Thor},
	year = {2019},
	keywords = {⛔ No DOI found},
	pages = {8},
}

Downloads: 0

{"_id":"e7J3oMLzxfciqvGoj","bibbaseid":"bittner-fuentes-rubinstein-jansson-choi-kell-mirdatasoftwareforreproducibleusageofdatasets-2019","author_short":["Bittner, R. M","Fuentes, M.","Rubinstein, D.","Jansson, A.","Choi, K.","Kell, T."],"bibdata":{"bibtype":"inproceedings","type":"inproceedings","title":"MIRDATA: SOFTWARE FOR REPRODUCIBLE USAGE OF DATASETS","abstract":"There are a number of efforts in the MIR community towards increased reproducibility, such as creating more open datasets, publishing code, and the use of common software libraries, e.g. for evaluation. However, when it comes to datasets, there is usually little guarantee that researchers are using the exact same data in the same way, which among other issues, makes comparisons of different methods on the “same” datasets problematic. In this paper, we ﬁrst show how (often unknown) differences in datasets can lead to signiﬁcantly different experimental results. We propose a solution to these problems in the form of an open source library, mirdata, which handles datasets in their current distribution modes, but controls for possible variability. In particular, it contains tools which: (1) validate if the user’s data (e.g. audio, annotations) is consistent with a canonical version of the dataset; (2) load annotations in a consistent manner; (3) download or give instructions for obtaining data; and (4) make it easy to perform track metadata-speciﬁc analysis.","language":"en","booktitle":"International Conference on Music Information Retrieval (ISMIR)","author":[{"propositions":[],"lastnames":["Bittner"],"firstnames":["Rachel","M"],"suffixes":[]},{"propositions":[],"lastnames":["Fuentes"],"firstnames":["Magdalena"],"suffixes":[]},{"propositions":[],"lastnames":["Rubinstein"],"firstnames":["David"],"suffixes":[]},{"propositions":[],"lastnames":["Jansson"],"firstnames":["Andreas"],"suffixes":[]},{"propositions":[],"lastnames":["Choi"],"firstnames":["Keunwoo"],"suffixes":[]},{"propositions":[],"lastnames":["Kell"],"firstnames":["Thor"],"suffixes":[]}],"year":"2019","keywords":"⛔ No DOI found","pages":"8","bibtex":"@inproceedings{bittner_mirdata_2019,\n\ttitle = {{MIRDATA}: {SOFTWARE} {FOR} {REPRODUCIBLE} {USAGE} {OF} {DATASETS}},\n\tabstract = {There are a number of efforts in the MIR community towards increased reproducibility, such as creating more open datasets, publishing code, and the use of common software libraries, e.g. for evaluation. However, when it comes to datasets, there is usually little guarantee that researchers are using the exact same data in the same way, which among other issues, makes comparisons of different methods on the “same” datasets problematic. In this paper, we ﬁrst show how (often unknown) differences in datasets can lead to signiﬁcantly different experimental results. We propose a solution to these problems in the form of an open source library, mirdata, which handles datasets in their current distribution modes, but controls for possible variability. In particular, it contains tools which: (1) validate if the user’s data (e.g. audio, annotations) is consistent with a canonical version of the dataset; (2) load annotations in a consistent manner; (3) download or give instructions for obtaining data; and (4) make it easy to perform track metadata-speciﬁc analysis.},\n\tlanguage = {en},\n\tbooktitle = {International {Conference} on {Music} {Information} {Retrieval} ({ISMIR})},\n\tauthor = {Bittner, Rachel M and Fuentes, Magdalena and Rubinstein, David and Jansson, Andreas and Choi, Keunwoo and Kell, Thor},\n\tyear = {2019},\n\tkeywords = {⛔ No DOI found},\n\tpages = {8},\n}\n\n\n\n","author_short":["Bittner, R. M","Fuentes, M.","Rubinstein, D.","Jansson, A.","Choi, K.","Kell, T."],"key":"bittner_mirdata_2019","id":"bittner_mirdata_2019","bibbaseid":"bittner-fuentes-rubinstein-jansson-choi-kell-mirdatasoftwareforreproducibleusageofdatasets-2019","role":"author","urls":{},"keyword":["⛔ No DOI found"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"inproceedings","biburl":"https://bibbase.org/zotero/fsimonetta","dataSources":["pzyFFGWvxG2bs63zP"],"keywords":["⛔ no doi found"],"search_terms":["mirdata","software","reproducible","usage","datasets","bittner","fuentes","rubinstein","jansson","choi","kell"],"title":"MIRDATA: SOFTWARE FOR REPRODUCIBLE USAGE OF DATASETS","year":2019}