Implementing reproducible research. Aarts, A. A., Alexander, A., Attridge, P., Bahník, Š., Barnett-Cowan, M., Bartmess, E., Bosco, F. A., Braun, M., Brown, B., Brown, C. T., Brown, K., Chandler, J. J., Clay, R., Cleary, H., Cohn, M., Costantini, G., Crusius, J., Davison, A., DeCoster, J., DeGaetano, M., Donohue, R., Dunn, E., Edmunds, S., Eggleston, C., Estel, V., Farach, F. J., Fiedler, S., Field, J. G., Fitneva, S., Foster, I., Foster, J. D., Frazier, R. S., Freire, J., Galliani, E. M., Giner-Sorolla, R., Goellner, L., Goss, R. J., Graham, J., Grange, J. A., Guo, P., Hartshorne, J., Hayes, T. B., Hicks, G., Hoefling, H., Howe, B., Hrynaszkiewicz, I., Humphries, D., Hurlin, C., Ibanez, L., Jahn, G., Johnson, K., Joy-Gaba, J. A., Kappes, H. B., Lai, C. K., Lakens, D., Lane, K. A., LeBel, E. P., Lee, M., Lemm, K., Lewis, M., Lin, S. C., Li, P., Mackinnon, S., Mainard, H., Malik, T., Mann, N., May, M., Millman, J., Moore, K., Motyl, M., Müller, S. M., Murray-Rust, D., Murray-Rust, P., Nosek, B. A., Olsson, C., Ong, C. S., Perez, F., Perignon, C., Perugini, M., Pham, Q., Pitts, M., Ratliff, K., Renkewitz, F., Rossini, A., Rutchick, A. M., Sandstrom, G., Selterman, D., Simpson, W., Smith, C. T., Spies, J. R., Stodden, V., Talhelm, T., Veer, A., Vianello, M., & Xie, Y. 2014.
Implementing reproducible research [link]Paper  abstract   bibtex   
In computational science, reproducibility requires that researchers make code and data available to others so that the data can be analyzed in a similar manner as in the original publication. Code must be available to be distributed, data must be accessible in a readable format, and a platform must be available for widely distributing the data and code. In addition, both data and code need to be licensed permissively enough so that others can reproduce the work without a substantial legal burden. Implementing Reproducible Research covers many of the elements necessary for conducting and distributing reproducible research. It explains how to accurately reproduce a scientific result. Divided into three parts, the book discusses the tools, practices, and dissemination platforms for ensuring reproducibility in computational science. It describes: Computational tools, such as Sweave, knitr, VisTrails, Sumatra, CDE, and the Declaratron system Open source practices, good programming practices, trends in open science, and the role of cloud computing in reproducible research Software and methodological platforms, including open source software packages, RunMyCode platform, and open access journals Each part presents contributions from leaders who have developed software and other products that have advanced the field. Supplementary material is available at www.ImplementingRR.org. [Excerpt] From the introduction chapter of Implementing Reproducible Research: Literate statistical programming is a concept introduced by Rossini () that builds on the idea of literate programming as described by Donald Knuth. With literate statistical programming, one combines the description of a statistical analysis and the code for doing the statistical analysis into a single document. Subsequently, one can take the combined document and produce either a human-readable document (i.e. PDF) or a machine readable code file. An early implementation of this concept was the Sweave system of Leisch which uses R as its programming language and LATEX as its documentation language (). Yihui Xie describes his knitr package which builds substantially on Sweave and incorporates many new ideas developed since the initial development of Sweave. Along these lines, Tanu Malik and colleagues describe the Science Object Linking and Embedding framework for creating interactive publications that allow authors to embed various aspects of computational research in document, creating a complete research compendium. There have been a number of systems developed recently that are designed to track the provenance of data analysis outputs and to manage a researcher's workflow. Juliana Freire and colleagues describe the VisTrails system for open source provenance management for scientific workflow creation. VisTrails interfaces with existing scientific software and captures the inputs, outputs, and code that produced a particular result, even presenting this workflow in flowchart form. Andrew Davison and colleagues describe the Sumatra toolkit for reproducible research. Their goal is to introduce a tool for reproducible research that minimizes the disruption to scientists' existing workflows, therefore maximizing the uptake by current scientists. Their tool serves as a kind of "backend" to keep track of the code, data, and dependencies as a researcher works. This allows for easily reproducing specific analyses and for sharing with colleagues. Philip Guo takes the "backend tracking" idea one step further and describes his Code, Data, Environment (CDE) package, which is a minimal "virtual machine" for reproducing the environment as well as the analysis. This package keeps track of all files used by a given program (i.e. a statistical analysis program) and bundles everything, including dependencies, into a single package. This approach guarantees that all requirements are included and that a given analysis can be reproduced on another computer. Peter Murray-Rust and Dave Murray-Rust introduce The Declaration, a tool for the precise mapping of mathematical expressions to computational implementations. They present an example from materials science, defining what reproducibility means in this field, in particular for unstable dynamical systems.
@electronic{citeulike:13171746,
    abstract = {In computational science, reproducibility requires that researchers make code and data available to others so that the data can be analyzed in a similar manner as in the original publication. Code must be available to be distributed, data must be accessible in a readable format, and a platform must be available for widely distributing the data and code. In addition, both data and code need to be licensed permissively enough so that others can reproduce the work without a substantial legal burden.

Implementing Reproducible Research covers many of the elements necessary for conducting and distributing reproducible research. It explains how to accurately reproduce a scientific result.

Divided into three parts, the book discusses the tools, practices, and dissemination platforms for ensuring reproducibility in computational science. It describes:

    Computational tools, such as Sweave, knitr, {VisTrails}, Sumatra, {CDE}, and the Declaratron system
    Open source practices, good programming practices, trends in open science, and the role of cloud computing in reproducible research
    Software and methodological platforms, including open source software packages, {RunMyCode} platform, and open access journals

Each part presents contributions from leaders who have developed software and other products that have advanced the field. Supplementary material is available at {www.ImplementingRR}.org. 

[Excerpt] From the introduction chapter of Implementing Reproducible Research: Literate statistical programming is a concept introduced by Rossini () that builds on the idea of literate programming as described by Donald Knuth. With literate statistical programming, one combines the description of a statistical analysis and the code for doing the statistical analysis into a single document. Subsequently, one can take the combined document and produce either a human-readable document (i.e. {PDF}) or a machine readable code file. An early implementation of this concept was the Sweave system of Leisch which uses R as its programming language and {LATEX} as its documentation language (). Yihui Xie describes his knitr package which builds substantially on Sweave and incorporates many new ideas developed since the initial development of Sweave. Along these lines, Tanu Malik and colleagues describe the Science Object Linking and Embedding framework for creating interactive publications that allow authors to embed various aspects of computational research in document, creating a complete research compendium.

There have been a number of systems developed recently that are designed to track the provenance of data analysis outputs and to manage a researcher's workflow. Juliana Freire and colleagues describe the {VisTrails} system for open source provenance management for scientific workflow creation. {VisTrails} interfaces with existing scientific software and captures the inputs, outputs, and code that produced a particular result, even presenting this workflow in flowchart form. Andrew Davison and colleagues describe the Sumatra toolkit for reproducible research. Their goal is to introduce a tool for reproducible research that minimizes the disruption to scientists' existing workflows, therefore maximizing the uptake by current scientists. Their tool serves as a kind of "backend" to keep track of the code, data, and dependencies as a researcher works. This allows for easily reproducing specific analyses and for sharing with colleagues.

Philip Guo takes the "backend tracking" idea one step further and describes his Code, Data, Environment ({CDE}) package, which is a minimal "virtual machine" for reproducing the environment as well as the analysis. This package keeps track of all files used by a given program (i.e. a statistical analysis program) and bundles everything, including dependencies, into a single package. This approach guarantees that all requirements are included and that a given analysis can be reproduced on another computer.

Peter {Murray-Rust} and Dave {Murray-Rust} introduce The Declaration, a tool for the precise mapping of mathematical expressions to computational implementations. They present an example from materials science, defining what reproducibility means in this field, in particular for unstable dynamical systems.},
    author = {Aarts, Alexander A. and Alexander, Anita and Attridge, Peter and Bahn\'{\i}k, \v{S}t\v{e}p\'{a}n and Barnett-Cowan, Michael and Bartmess, Elizabeth and Bosco, Frank A. and Braun, Mikio and Brown, Benjamin and Brown, C. Titus and Brown, Kristina and Chandler, Jesse J. and Clay, Russ and Cleary, Hayley and Cohn, Michael and Costantini, Giulio and Crusius, Jan and Davison, Andrew and DeCoster, Jamie and DeGaetano, Michelle and Donohue, Ryan and Dunn, Elizabeth and Edmunds, Scott and Eggleston, Casey and Estel, Vivien and Farach, Frank J. and Fiedler, Susann and Field, James G. and Fitneva, Stanka and Foster, Ian and Foster, Joshua D. and Frazier, Rebecca S. and Freire, Juliana and Galliani, Elisa M. and Giner-Sorolla, Roger and Goellner, Lars and Goss, R. Justin and Graham, Jesse and Grange, James A. and Guo, Philip and Hartshorne, Joshua and Hayes, Timothy B. and Hicks, Grace and Hoefling, Holger and Howe, Bill and Hrynaszkiewicz, Iain and Humphries, Denise and Hurlin, Christophe and Ibanez, Luis and Jahn, Georg and Johnson, Kate and Joy-Gaba, Jennifer A. and Kappes, Heather B. and Lai, Calvin K. and Lakens, Daniel and Lane, Kristin A. and LeBel, Etienne P. and Lee, Minha and Lemm, Kristi and Lewis, Melissa and Lin, Stephanie C. and Li, Peter and Mackinnon, Sean and Mainard, Heather and Malik, Tanu and Mann, Nathaniel and May, Michael and Millman, Jarrod and Moore, Katherine and Motyl, Matt and M\"{u}ller, Stephanie M. and Murray-Rust, Dave and Murray-Rust, Peter and Nosek, Brian A. and Olsson, Catherine and Ong, Cheng S. and Perez, Fernando and Perignon, Christophe and Perugini, Marco and Pham, Quan and Pitts, Michael and Ratliff, Kate and Renkewitz, Frank and Rossini, Anthony and Rutchick, Abraham M. and Sandstrom, Gillian and Selterman, Dylan and Simpson, William and Smith, Colin T. and Spies, Jeffrey R. and Stodden, Victoria and Talhelm, Thomas and Veer, Anna and Vianello, Michelangelo and Xie, Yihui},
    citeulike-article-id = {13171746},
    citeulike-linkout-0 = {whttp://mfkp.org/INRMM/article/13171746},
    citeulike-linkout-1 = {https://osf.io/s9tya/wiki/home/},
    citeulike-linkout-2 = {http://www.ImplementingRR.org},
    citeulike-linkout-3 = {http://www.worldcat.org/isbn/9781466561595},
    citeulike-linkout-4 = {http://books.google.com/books?vid=ISBN9781466561595},
    citeulike-linkout-5 = {http://www.amazon.com/gp/search?keywords=9781466561595\&index=books\&linkCode=qs},
    citeulike-linkout-6 = {http://www.librarything.com/isbn/9781466561595},
    citeulike-linkout-7 = {http://www.worldcat.org/oclc/859168552},
    editor = {Stodden, Victoria and Leisch, Friedrich and Peng, Roger D.},
    isbn = {9781466561595},
    keywords = {computational-science, data-sharing, free-scientific-knowledge, multiauthor, reproducible-research, scientific-knowledge-sharing, workflow},
    posted-at = {2014-05-15 17:39:36},
    priority = {2},
    publisher = {Chapman \& Hall/CRC},
    series = {The R Series},
    title = {Implementing reproducible research},
    url = {whttp://mfkp.org/INRMM/article/13171746},
    year = {2014}
}

Downloads: 0