An introduction to Docker for reproducible research, with examples from the R environment. Boettiger, C. ACM SIGOPS Operating Systems Review, 49(1):71–79, January, 2015. arXiv: 1410.0846Paper doi abstract bibtex As computational work becomes more and more integral to many aspects of scientific research, computational reproducibility has become an issue of increasing importance to computer systems researchers and domain scientists alike. Though computational reproducibility seems more straight forward than replicating physical experiments, the complex and rapidly changing nature of computer environments makes being able to reproduce and extend such work a serious challenge. In this paper, I explore common reasons that code developed for one research project cannot be successfully executed or extended by subsequent researchers. I review current approaches to these issues, including virtual machines and workflow systems, and their limitations. I then examine how the popular emerging technology Docker combines several areas from systems research - such as operating system virtualization, cross-platform portability, modular re-usable elements, versioning, and a ‘DevOps’ philosophy, to address these challenges. I illustrate this with several examples of Docker use with a focus on the R statistical environment.
@article{boettiger_introduction_2015,
title = {An introduction to {Docker} for reproducible research, with examples from the {R} environment},
volume = {49},
issn = {01635980},
url = {http://arxiv.org/abs/1410.0846},
doi = {10.1145/2723872.2723882},
abstract = {As computational work becomes more and more integral to many aspects of scientific research, computational reproducibility has become an issue of increasing importance to computer systems researchers and domain scientists alike. Though computational reproducibility seems more straight forward than replicating physical experiments, the complex and rapidly changing nature of computer environments makes being able to reproduce and extend such work a serious challenge. In this paper, I explore common reasons that code developed for one research project cannot be successfully executed or extended by subsequent researchers. I review current approaches to these issues, including virtual machines and workflow systems, and their limitations. I then examine how the popular emerging technology Docker combines several areas from systems research - such as operating system virtualization, cross-platform portability, modular re-usable elements, versioning, and a ‘DevOps’ philosophy, to address these challenges. I illustrate this with several examples of Docker use with a focus on the R statistical environment.},
language = {en},
number = {1},
urldate = {2019-05-02},
journal = {ACM SIGOPS Operating Systems Review},
author = {Boettiger, Carl},
month = jan,
year = {2015},
note = {arXiv: 1410.0846},
keywords = {Computer Science - Software Engineering},
pages = {71--79}
}
Downloads: 0
{"_id":"d77G6iocx6Tx9hKnu","bibbaseid":"boettiger-anintroductiontodockerforreproducibleresearchwithexamplesfromtherenvironment-2015","authorIDs":[],"author_short":["Boettiger, C."],"bibdata":{"bibtype":"article","type":"article","title":"An introduction to Docker for reproducible research, with examples from the R environment","volume":"49","issn":"01635980","url":"http://arxiv.org/abs/1410.0846","doi":"10.1145/2723872.2723882","abstract":"As computational work becomes more and more integral to many aspects of scientific research, computational reproducibility has become an issue of increasing importance to computer systems researchers and domain scientists alike. Though computational reproducibility seems more straight forward than replicating physical experiments, the complex and rapidly changing nature of computer environments makes being able to reproduce and extend such work a serious challenge. In this paper, I explore common reasons that code developed for one research project cannot be successfully executed or extended by subsequent researchers. I review current approaches to these issues, including virtual machines and workflow systems, and their limitations. I then examine how the popular emerging technology Docker combines several areas from systems research - such as operating system virtualization, cross-platform portability, modular re-usable elements, versioning, and a ‘DevOps’ philosophy, to address these challenges. I illustrate this with several examples of Docker use with a focus on the R statistical environment.","language":"en","number":"1","urldate":"2019-05-02","journal":"ACM SIGOPS Operating Systems Review","author":[{"propositions":[],"lastnames":["Boettiger"],"firstnames":["Carl"],"suffixes":[]}],"month":"January","year":"2015","note":"arXiv: 1410.0846","keywords":"Computer Science - Software Engineering","pages":"71–79","bibtex":"@article{boettiger_introduction_2015,\n\ttitle = {An introduction to {Docker} for reproducible research, with examples from the {R} environment},\n\tvolume = {49},\n\tissn = {01635980},\n\turl = {http://arxiv.org/abs/1410.0846},\n\tdoi = {10.1145/2723872.2723882},\n\tabstract = {As computational work becomes more and more integral to many aspects of scientific research, computational reproducibility has become an issue of increasing importance to computer systems researchers and domain scientists alike. Though computational reproducibility seems more straight forward than replicating physical experiments, the complex and rapidly changing nature of computer environments makes being able to reproduce and extend such work a serious challenge. In this paper, I explore common reasons that code developed for one research project cannot be successfully executed or extended by subsequent researchers. I review current approaches to these issues, including virtual machines and workflow systems, and their limitations. I then examine how the popular emerging technology Docker combines several areas from systems research - such as operating system virtualization, cross-platform portability, modular re-usable elements, versioning, and a ‘DevOps’ philosophy, to address these challenges. I illustrate this with several examples of Docker use with a focus on the R statistical environment.},\n\tlanguage = {en},\n\tnumber = {1},\n\turldate = {2019-05-02},\n\tjournal = {ACM SIGOPS Operating Systems Review},\n\tauthor = {Boettiger, Carl},\n\tmonth = jan,\n\tyear = {2015},\n\tnote = {arXiv: 1410.0846},\n\tkeywords = {Computer Science - Software Engineering},\n\tpages = {71--79}\n}\n\n","author_short":["Boettiger, C."],"key":"boettiger_introduction_2015","id":"boettiger_introduction_2015","bibbaseid":"boettiger-anintroductiontodockerforreproducibleresearchwithexamplesfromtherenvironment-2015","role":"author","urls":{"Paper":"http://arxiv.org/abs/1410.0846"},"keyword":["Computer Science - Software Engineering"],"downloads":0},"bibtype":"article","biburl":"https://bibbase.org/zotero/moorepants","creationDate":"2019-12-04T16:23:11.114Z","downloads":0,"keywords":["computer science - software engineering"],"search_terms":["introduction","docker","reproducible","research","examples","environment","boettiger"],"title":"An introduction to Docker for reproducible research, with examples from the R environment","year":2015,"dataSources":["kGdXP2S4pPvthm6Pa"]}