Capturing and Visualizing Provenance From Data Wrangling. Bors, C., Gschwandtner, T., & Miksch, S. IEEE Computer Graphics and Applications, 39(6):61–75, November, 2019.
doi  abstract   bibtex   
Data quality management and assessment play a vital role for ensuring the trust in the data and its fitness-of-use for subsequent analysis. The transformation history of a data wrangling system is often insufficient for determining the usability of a dataset, lacking information how changes affected the dataset. Capturing workflow provenance along the wrangling process and combining it with descriptive information as data provenance can enable users to comprehend how these changes affected the dataset, and if they benefited data quality. We present DQProv Explorer, a system that captures and visualizes provenance from data wrangling operations. It features three visualization components: allowing the user to explore the provenance graph of operations and the data stream, the development of quality over time for a sequence of wrangling operations applied to the dataset, and the distribution of issues across the entirety of the dataset to determine error patterns.
@article{bors_capturing_2019,
	title = {Capturing and {Visualizing} {Provenance} {From} {Data} {Wrangling}},
	volume = {39},
	issn = {1558-1756},
	doi = {10.1109/MCG.2019.2941856},
	abstract = {Data quality management and assessment play a vital role for ensuring the trust in the data and its fitness-of-use for subsequent analysis. The transformation history of a data wrangling system is often insufficient for determining the usability of a dataset, lacking information how changes affected the dataset. Capturing workflow provenance along the wrangling process and combining it with descriptive information as data provenance can enable users to comprehend how these changes affected the dataset, and if they benefited data quality. We present DQProv Explorer, a system that captures and visualizes provenance from data wrangling operations. It features three visualization components: allowing the user to explore the provenance graph of operations and the data stream, the development of quality over time for a sequence of wrangling operations applied to the dataset, and the distribution of issues across the entirety of the dataset to determine error patterns.},
	number = {6},
	journal = {IEEE Computer Graphics and Applications},
	author = {Bors, Christian and Gschwandtner, Theresia and Miksch, Silvia},
	month = nov,
	year = {2019},
	keywords = {data quality, WHY - Data Wrangling, WHEN - Real-Time Applications, WHY - Evaluation of Tools and Systems, Type of Work: Empirical Study, WHY - Real-time or post-hoc Quantification and Re-Application, NO HOW},
	pages = {61--75},
	file = {IEEE Xplore Full Text PDF:C\:\\Users\\conny\\Zotero\\storage\\C35N5Z97\\Bors et al. - 2019 - Capturing and Visualizing Provenance From Data Wra.pdf:application/pdf}
}

Downloads: 0