Characteristics of Open Data CSV Files. Mitlöhner, J., Neumaier, S., Umbrich, J., & Polleres, A. In 2nd International Conference on Open and Big Data, August, 2016. Invited paper
Characteristics of Open Data CSV Files [pdf]Paper  abstract   bibtex   
This work analyzes an Open Data corpus containing 200K tabular resources with a total file size of 413GB from a data consumer perspective. Our study shows that ∼10% of the resources in Open Data portals are labelled as a tabular data of which only 50% can be considered CSV files. The study inspects the general shape of these tabular data, reports on column and row distribution, analyses the availability of (multiple) header rows and if a file contains multiple tables. In addition, we inspect and analyze the table column types, detect missing values and report about the distribution of the values.
@inproceedings{mitl-etal-2016OBD,
	 author = {Mitl\"ohner, Johann and Neumaier, Sebastian and Umbrich, J\"urgen and Polleres, Axel},
	 booktitle = {2nd International Conference on Open and Big Data},
	 month = aug,
         day = {22--24},
	 note = {Invited paper},
         type = CONF,
         abstract = {This work analyzes an Open Data corpus containing 200K tabular resources with a total file size of 413GB from a data consumer perspective. Our study shows that ∼10\% of the resources in Open Data portals are labelled as a tabular data of which only 50\% can be considered CSV files. The study inspects the general shape of these tabular data, reports on column and row distribution, analyses the availability of (multiple) header rows and if a file contains multiple tables. In addition, we inspect and analyze the table column types, detect missing values and report about the distribution of the values.},
	 title = {Characteristics of Open Data {CSV} Files},
	 year = 2016,
         url = {http://polleres.net/publications/mitl-etal-2016OBD.pdf},
}
Downloads: 0