Data trajectories: tracking reuse of published data for transitive credit attribution. Missier, P. International Journal of Digital Curation, 11(1):1–16, 2016.
Data trajectories: tracking reuse of published data for transitive credit attribution [link]Paper  doi  abstract   bibtex   
The ability to measure the use and impact of published data sets is key to the success of the open data / open science paradigm. A direct measure of impact would require tracking data (re)use in the wild, which however is difficult to achieve. This is therefore commonly replaced by simpler metrics based on data download and citation counts. In this paper we describe a scenario where it is possible to track the trajectory of a dataset after its publication, and we show how this enables the design of accurate models for ascribing credit to data originators. A Data Trajectory (DT) is a graph that encodes knowledge of how, by whom, and in which context data has been re-used, possibly after several generations. We provide a theoretical model of DTs that is grounded in the W3C PROV data model for provenance, and we show how DTs can be used to automatically propagate a fraction of the credit associated with transitively derived datasets, back to original data contributors. We also show this model of transitive credit in action by means of a Data Reuse Simulator. Ultimately, our hope is that, in the longer term, credit models based on direct measures of data reuse will provide further incentives to data publication. We conclude by outlining a research agenda to address the hard questions of creating, collecting, and using DTs systematically across a large number of data reuse instances, in the wild.
@article{missier_data_2016-1,
	title = {Data trajectories: tracking reuse of published data for transitive credit attribution},
	volume = {11},
	url = {http://bibbase.org/network/publication/missier-datatrajectoriestrackingreuseofpublisheddatafortransitivecreditattribution-2016},
	doi = {doi:10.2218/ijdc.v11i1.425},
	abstract = {The ability to measure the use and impact of published data sets is key to the success of the open data / open science paradigm. A direct measure of impact would require tracking data (re)use in the wild, which however is difficult to achieve. This is therefore commonly replaced by simpler metrics based on data download and citation counts. In this paper we describe a scenario where it is possible to track the trajectory of a dataset after its publication, and we show how this enables the design of accurate models for ascribing credit to data originators. A Data Trajectory (DT) is a graph that encodes knowledge of how, by whom, and in which context data has been re-used, possibly after several generations. We provide a theoretical model of DTs that is grounded in the W3C PROV data model for provenance, and we show how DTs can be used to automatically propagate a fraction of the credit associated with transitively derived datasets, back to original data contributors. We also show this model of transitive credit in action by means of a Data Reuse Simulator. Ultimately, our hope is that, in the longer term, credit models based on direct measures of data reuse will provide further incentives to data publication. We conclude by outlining a research agenda to address the hard questions of creating, collecting, and using DTs systematically across a large number of data reuse instances, in the wild.},
	number = {1},
	journal = {International Journal of Digital Curation},
	author = {Missier, Paolo},
	year = {2016},
	keywords = {provenance, data reuse, data trajectories},
	pages = {1--16},
}

Downloads: 0