Provenance Annotation and Analysis to Support Process Re-Computation. Cala, J. & Missier, P. In Procs. IPAW 2018, London, 2018. Springer.
abstract   bibtex   
Many resource-intensive analytics processes evolve over time following new versions of the reference datasets and software dependen- cies they use. We focus on scenarios in which any version change has the potential to affect many outcomes, as is the case for instance in high throughput genomics where the same process is used to analyse large cohorts of patient genomes, or cases. As any version change is unlikely to affect the entire population, an efficient strategy for restoring the cur- rency of the outcomes requires first to identify the scope of a change, i.e., the subset of affected data products. In this paper we describe a generic and reusable provenance-based approach to address this scope discovery problem. It applies to a scenario where the process consists of complex hierarchical components, where different input cases are processed using different version configurations of each component, and where separate provenance traces are collected for the executions of each of the com- ponents. We show how a new data structure, called a restart tree, is computed and exploited to manage the change scope discovery problem.
@inproceedings{cala_provenance_2018,
	address = {London},
	title = {Provenance {Annotation} and {Analysis} to {Support} {Process} {Re}-{Computation}},
	abstract = {Many resource-intensive analytics processes evolve over time following new versions of the reference datasets and software dependen- cies they use. We focus on scenarios in which any version change has the potential to affect many outcomes, as is the case for instance in high throughput genomics where the same process is used to analyse large cohorts of patient genomes, or cases. As any version change is unlikely to affect the entire population, an efficient strategy for restoring the cur- rency of the outcomes requires first to identify the scope of a change, i.e., the subset of affected data products. In this paper we describe a generic and reusable provenance-based approach to address this scope discovery problem. It applies to a scenario where the process consists of complex hierarchical components, where different input cases are processed using different version configurations of each component, and where separate provenance traces are collected for the executions of each of the com- ponents. We show how a new data structure, called a restart tree, is computed and exploited to manage the change scope discovery problem.},
	booktitle = {Procs. {IPAW} 2018},
	publisher = {Springer},
	author = {Cala, Jacek and Missier, Paolo},
	year = {2018},
	keywords = {\#provenance, \#recomputation, process re-computation, provenance annotations},
}

Downloads: 0