Revealing the Detailed Lineage of Script Outputs using Hybrid Provenance. Zhang, Q., Cao, Y., Wang, Q., Vu, D., Thavasimani, P., McPhillips, T., Missier, P., Slaughter, P., Jones, C., Jones, M. B, & Ludascher, B. In Procs. 11th Intl. Digital Curation Conference (IDCC), Edinburgh, Scotland, UK, 2017. Digital Curation Center.
abstract   bibtex   
We illustrate how combining retrospective and prospective provenance can yield scientifically meaningful hybrid provenance representations of the computational histories of data produced during a script run. We use scripts from multiple disciplines (astrophysics, climate science, biodiversity data curation, and social network analysis), implemented in Python, R, and MATLAB, to highlight the usefulness of diverse forms of retrospective provenance when coupled with prospective provenance. Users provide prospective provenance (i.e., the conceptual workflows latent in scripts) via simple YesWorkflow annotations, embedded as script comments. Runtime observables, hidden in filenames or folder structures, recorded in log-files, or automatically captured using tools such as noWorkflow or the DataONE RunManagers can be linked to prospective provenance via relational views and queries. The YesWorkflow toolkit, example scripts, and demonstration code are available via an open source repository.
@inproceedings{zhang_revealing_2017,
	address = {Edinburgh, Scotland, UK},
	title = {Revealing the {Detailed} {Lineage} of {Script} {Outputs} using {Hybrid} {Provenance}},
	abstract = {We illustrate how combining retrospective and prospective provenance can yield scientifically meaningful hybrid provenance representations of the computational histories of data produced during a script run. We use scripts from multiple disciplines (astrophysics, climate science, biodiversity data curation, and social network analysis), implemented in Python, R, and MATLAB, to highlight the usefulness of diverse forms of retrospective provenance when coupled with prospective provenance. Users provide prospective provenance (i.e., the conceptual workflows latent in scripts) via simple YesWorkflow annotations, embedded as script comments. Runtime observables, hidden in filenames or folder structures, recorded in log-files, or automatically captured using tools such as noWorkflow or the DataONE RunManagers can be linked to prospective provenance via relational views and queries. The YesWorkflow toolkit, example scripts, and demonstration code are available via an open source repository.},
	booktitle = {Procs. 11th {Intl}. {Digital} {Curation} {Conference} ({IDCC})},
	publisher = {Digital Curation Center},
	author = {Zhang, Qian and Cao, Yang and Wang, Qiwen and Vu, Duc and Thavasimani, Priyaa and McPhillips, Tim and Missier, Paolo and Slaughter, Peter and Jones, Christopher and Jones, Matthew B and Ludascher, Bertram},
	year = {2017},
	keywords = {\#provenance},
}

Downloads: 0