\n \n \n
\n
\n\n \n \n \n \n \n \n I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets.\n \n \n \n \n\n\n \n Chard, K.; D'Arcy, M.; Heavner, B.; Foster, I.; Kesselman, C.; Madduri, R.; Rodriguez, A.; Soiland-Reyes, S.; Goble, C.; Clark, K.; Deutsch, E. W.; Dinov, I.; Price, N.; and Toga, A.\n\n\n \n\n\n\n In
2016 IEEE International Conference on Big Data (Big Data), pages 319–328, Washington DC,USA, December 2016. IEEE\n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{chard_ill_2016,\n\taddress = {Washington DC,USA},\n\ttitle = {I'll take that to go: {Big} data bags and minimal identifiers for exchange of large, complex datasets},\n\tisbn = {978-1-4673-9005-7},\n\tshorttitle = {I'll take that to go},\n\turl = {http://ieeexplore.ieee.org/document/7840618/},\n\tdoi = {10.1109/BigData.2016.7840618},\n\turldate = {2023-12-08},\n\tbooktitle = {2016 {IEEE} {International} {Conference} on {Big} {Data} ({Big} {Data})},\n\tpublisher = {IEEE},\n\tauthor = {Chard, Kyle and D'Arcy, Mike and Heavner, Ben and Foster, Ian and Kesselman, Carl and Madduri, Ravi and Rodriguez, Alexis and Soiland-Reyes, Stian and Goble, Carole and Clark, Kristi and Deutsch, Eric W. and Dinov, Ivo and Price, Nathan and Toga, Arthur},\n\tmonth = dec,\n\tyear = {2016},\n\tpages = {319--328},\n}\n\n
\n
\n\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n Accelerating Data-driven Discovery with Scientific Asset Management.\n \n \n \n \n\n\n \n Schuler, R.; Kesselman, C.; and Czajkowski, K.\n\n\n \n\n\n\n In
Proceedings of the 12th IEEE International Conference on eScience, 2016. IEEE\n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@inproceedings{schuler_accelerating_2016,\n\ttitle = {Accelerating {Data}-driven {Discovery} with {Scientific} {Asset} {Management}},\n\turl = {https://www.zotero.org/crisaless/collections/SP6RMP59/items/8MC3BI7S/attachment/3FEQG8G8/reader},\n\tabstract = {Current approaches for\nment have failed to keep pace with the needs of increasingly data-intensive science. The overhead and burden of managing data in complex discovery processes, involving experimental protocols with numerous data-producing and computational steps, has become the gating factor that determines the pace of discovery. The lack of comprehensive systems to capture, manage, organize and retrieve data throughout the discovery life cycle leads to significant overheads on scientists time and effort, reduced productivity, lack of reproducibility, and an absence of data sharing.\nIn “creative fields” like digital photography and music, digi- tal asset management (DAM) systems for capturing, managing, curating and consuming digital assets like photos and audio recordings, have fundamentally transformed how these data are used. While asset management has not taken hold in eScience applications, we believe that transformation similar to that observed in the creative space could be achieved in scientific domains if appropriate ecosystems of asset management tools existed, tools to capture, manage, and curate data throughout the scientific discovery process. We introduce a framework and infrastructure for asset management in eScience and present initial results from its usage in active research use cases.},\n\tbooktitle = {Proceedings of the 12th {IEEE} {International} {Conference} on {eScience}},\n\tpublisher = {IEEE},\n\tauthor = {Schuler, Robert and Kesselman, Carl and Czajkowski, Karl},\n\tyear = {2016},\n}\n\n
\n
\n\n\n
\n Current approaches for ment have failed to keep pace with the needs of increasingly data-intensive science. The overhead and burden of managing data in complex discovery processes, involving experimental protocols with numerous data-producing and computational steps, has become the gating factor that determines the pace of discovery. The lack of comprehensive systems to capture, manage, organize and retrieve data throughout the discovery life cycle leads to significant overheads on scientists time and effort, reduced productivity, lack of reproducibility, and an absence of data sharing. In “creative fields” like digital photography and music, digi- tal asset management (DAM) systems for capturing, managing, curating and consuming digital assets like photos and audio recordings, have fundamentally transformed how these data are used. While asset management has not taken hold in eScience applications, we believe that transformation similar to that observed in the creative space could be achieved in scientific domains if appropriate ecosystems of asset management tools existed, tools to capture, manage, and curate data throughout the scientific discovery process. We introduce a framework and infrastructure for asset management in eScience and present initial results from its usage in active research use cases.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n ERMrest: an entity-relationship data storage service for web-based, data-oriented collaboration.\n \n \n \n \n\n\n \n Czajkowski, K.; Kesselman, C.; Schuler, R.; and Tangmunarunkit, H.\n\n\n \n\n\n\n . 2016.\n
Publisher: arXiv Version Number: 1\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n\n\n
\n
@article{czajkowski_ermrest_2016,\n\ttitle = {{ERMrest}: an entity-relationship data storage service for web-based, data-oriented collaboration},\n\tcopyright = {arXiv.org perpetual, non-exclusive license},\n\tshorttitle = {{ERMrest}},\n\turl = {https://arxiv.org/abs/1610.06044},\n\tdoi = {10.48550/ARXIV.1610.06044},\n\tabstract = {Scientific discovery is increasingly dependent on a scientist's ability to acquire, curate, integrate, analyze, and share large and diverse collections of data. While the details vary from domain to domain, these data often consist of diverse digital assets (e.g. image files, sequence data, or simulation outputs) that are organized with complex relationships and context which may evolve over the course of an investigation. In addition, discovery is often collaborative, such that sharing of the data and its organizational context is highly desirable. Common systems for managing file or asset metadata hide their inherent relational structures, while traditional relational database systems do not extend to the distributed collaborative environment often seen in scientific investigations. To address these issues, we introduce ERMrest, a collaborative data management service which allows general entity-relationship modeling of metadata manipulated by RESTful access methods. We present the design criteria, architecture, and service implementation, as well as describe an ecosystem of tools and services that we have created to integrate metadata into an end-to-end scientific data life cycle. ERMrest has been deployed to hundreds of users across multiple scientific research communities and projects. We present two representative use cases: an international consortium and an early-phase, multidisciplinary research project.},\n\turldate = {2023-12-08},\n\tauthor = {Czajkowski, Karl and Kesselman, Carl and Schuler, Robert and Tangmunarunkit, Hongsuda},\n\tyear = {2016},\n\tnote = {Publisher: arXiv\nVersion Number: 1},\n\tkeywords = {Databases (cs.DB), Digital Libraries (cs.DL), Distributed, Parallel, and Cluster Computing (cs.DC), FOS: Computer and information sciences, Human-Computer Interaction (cs.HC)},\n}\n\n
\n
\n\n\n
\n Scientific discovery is increasingly dependent on a scientist's ability to acquire, curate, integrate, analyze, and share large and diverse collections of data. While the details vary from domain to domain, these data often consist of diverse digital assets (e.g. image files, sequence data, or simulation outputs) that are organized with complex relationships and context which may evolve over the course of an investigation. In addition, discovery is often collaborative, such that sharing of the data and its organizational context is highly desirable. Common systems for managing file or asset metadata hide their inherent relational structures, while traditional relational database systems do not extend to the distributed collaborative environment often seen in scientific investigations. To address these issues, we introduce ERMrest, a collaborative data management service which allows general entity-relationship modeling of metadata manipulated by RESTful access methods. We present the design criteria, architecture, and service implementation, as well as describe an ecosystem of tools and services that we have created to integrate metadata into an end-to-end scientific data life cycle. ERMrest has been deployed to hundreds of users across multiple scientific research communities and projects. We present two representative use cases: an international consortium and an early-phase, multidisciplinary research project.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n \n MultiCellDS: a standard and a community for sharing multicellular data.\n \n \n \n \n\n\n \n Friedman, S. H.; Anderson, A. R. A.; Bortz, D. M.; Fletcher, A. G.; Frieboes, H. B.; Ghaffarizadeh, A.; Grimes, D. R.; Hawkins-Daarud, A.; Hoehme, S.; Juarez, E. F.; Kesselman, C.; Merks, R. M.; Mumenthaler, S. M.; Newton, P. K.; Norton, K.; Rawat, R.; Rockne, R. C.; Ruderman, D.; Scott, J.; Sindi, S. S.; Sparks, J. L.; Swanson, K.; Agus, D. B.; and Macklin, P.\n\n\n \n\n\n\n Technical Report Systems Biology, December 2016.\n
\n\n
\n\n
\n\n
\n\n \n \n Paper\n \n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@techreport{friedman_multicellds_2016,\n\ttype = {preprint},\n\ttitle = {{MultiCellDS}: a standard and a community for sharing multicellular data},\n\tshorttitle = {{MultiCellDS}},\n\turl = {http://biorxiv.org/lookup/doi/10.1101/090696},\n\tabstract = {Abstract\n Cell biology is increasingly focused on cellular heterogeneity and multicellular systems. To make the fullest use of experimental, clinical, and computational efforts, we need standardized data formats, community-curated “public data libraries”, and tools to combine and analyze shared data. To address these needs, our multidisciplinary community created MultiCellDS (MultiCellular Data Standard): an extensible standard, a library of digital cell lines and tissue snapshots, and support software. With the help of experimentalists, clinicians, modelers, and data and library scientists, we can grow this seed into a community-owned ecosystem of shared data and tools, to the benefit of basic science, engineering, and human health.},\n\tlanguage = {en},\n\turldate = {2022-01-22},\n\tinstitution = {Systems Biology},\n\tauthor = {Friedman, Samuel H. and Anderson, Alexander R. A. and Bortz, David M. and Fletcher, Alexander G. and Frieboes, Hermann B. and Ghaffarizadeh, Ahmadreza and Grimes, David Robert and Hawkins-Daarud, Andrea and Hoehme, Stefan and Juarez, Edwin F. and Kesselman, Carl and Merks, Roeland M.H. and Mumenthaler, Shannon M. and Newton, Paul K. and Norton, Kerri-Ann and Rawat, Rishi and Rockne, Russell C. and Ruderman, Daniel and Scott, Jacob and Sindi, Suzanne S. and Sparks, Jessica L. and Swanson, Kristin and Agus, David B. and Macklin, Paul},\n\tmonth = dec,\n\tyear = {2016},\n\tdoi = {10.1101/090696},\n}\n\n
\n
\n\n\n
\n Abstract Cell biology is increasingly focused on cellular heterogeneity and multicellular systems. To make the fullest use of experimental, clinical, and computational efforts, we need standardized data formats, community-curated “public data libraries”, and tools to combine and analyze shared data. To address these needs, our multidisciplinary community created MultiCellDS (MultiCellular Data Standard): an extensible standard, a library of digital cell lines and tissue snapshots, and support software. With the help of experimentalists, clinicians, modelers, and data and library scientists, we can grow this seed into a community-owned ecosystem of shared data and tools, to the benefit of basic science, engineering, and human health.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n The FaceBase Consortium: a comprehensive resource for craniofacial researchers.\n \n \n \n\n\n \n Brinkley, J.; Fisher, S; Harris, M.; Holmes, G; Hooper, J.; Jabs, E.; Jones, K.; Kesselman, C; Klein, O.; Maas, R.; Marazita, M.; Selleri, L; Spritz, R.; van Bakel, H; Visel, A; Williams, T.; Wysocka, J; Consortium, F.; and Chai, Y\n\n\n \n\n\n\n
Development, 143(14): 2677–88. July 2016.\n
\n\n
\n\n
\n\n
\n\n \n\n \n \n doi\n \n \n\n \n link\n \n \n\n bibtex\n \n\n \n \n \n abstract \n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{brinkley_facebase_2016,\n\ttitle = {The {FaceBase} {Consortium}: a comprehensive resource for craniofacial researchers.},\n\tvolume = {143},\n\tdoi = {10.1242/dev.135434},\n\tabstract = {The FaceBase Consortium, funded by the National Institute of Dental and Craniofacial Research, National Institutes of Health, is designed to accelerate understanding of craniofacial developmental biology by generating comprehensive data resources to empower the research community, exploring high-throughput technology, fostering new scientific collaborations among researchers and human/computer interactions, facilitating hypothesis-driven research and translating science into improved health care to benefit patients. The resources generated by the FaceBase projects include a number of dynamic imaging modalities, genome-wide association studies, software tools for analyzing human facial abnormalities, detailed phenotyping, anatomical and molecular atlases, global and specific gene expression patterns, and transcriptional profiling over the course of embryonic and postnatal development in animal models and humans. The integrated data visualization tools, faceted search infrastructure, and curation provided by the FaceBase Hub offer flexible and intuitive ways to interact with these multidisciplinary data. In parallel, the datasets also offer unique opportunities for new collaborations and training for researchers coming into the field of craniofacial studies. Here, we highlight the focus of each spoke project and the integration of datasets contributed by the spokes to facilitate craniofacial research.},\n\tnumber = {14},\n\tjournal = {Development},\n\tauthor = {Brinkley, JF and Fisher, S and Harris, MP and Holmes, G and Hooper, JE and Jabs, EW and Jones, KL and Kesselman, C and Klein, OD and Maas, RL and Marazita, ML and Selleri, L and Spritz, RA and van Bakel, H and Visel, A and Williams, TJ and Wysocka, J and FaceBase Consortium and Chai, Y},\n\tmonth = jul,\n\tyear = {2016},\n\tpages = {2677--88},\n}\n\n
\n
\n\n\n
\n The FaceBase Consortium, funded by the National Institute of Dental and Craniofacial Research, National Institutes of Health, is designed to accelerate understanding of craniofacial developmental biology by generating comprehensive data resources to empower the research community, exploring high-throughput technology, fostering new scientific collaborations among researchers and human/computer interactions, facilitating hypothesis-driven research and translating science into improved health care to benefit patients. The resources generated by the FaceBase projects include a number of dynamic imaging modalities, genome-wide association studies, software tools for analyzing human facial abnormalities, detailed phenotyping, anatomical and molecular atlases, global and specific gene expression patterns, and transcriptional profiling over the course of embryonic and postnatal development in animal models and humans. The integrated data visualization tools, faceted search infrastructure, and curation provided by the FaceBase Hub offer flexible and intuitive ways to interact with these multidisciplinary data. In parallel, the datasets also offer unique opportunities for new collaborations and training for researchers coming into the field of craniofacial studies. Here, we highlight the focus of each spoke project and the integration of datasets contributed by the spokes to facilitate craniofacial research.\n
\n\n\n
\n\n\n
\n
\n\n \n \n \n \n \n Predictive Big Data Analytics: A Study of Parkinson’s Disease Using Large, Complex, Heterogeneous, Incongruent, Multi-source and Incomplete Observations.\n \n \n \n\n\n \n Dinov, I. D.; Heavner, B.; Tang, M.; Glusman, G.; Chard, K.; Darcy, M.; Madduri, R.; Pa, J.; Spino, C.; Kesselman, C.; and others\n\n\n \n\n\n\n , 11(8): e0157077. 2016.\n
\n\n
\n\n
\n\n
\n\n \n\n \n\n \n link\n \n \n\n bibtex\n \n\n \n\n \n\n \n \n \n \n \n \n \n\n \n \n \n\n\n\n
\n
@article{dinov_predictive_2016,\n\ttitle = {Predictive {Big} {Data} {Analytics}: {A} {Study} of {Parkinson}’s {Disease} {Using} {Large}, {Complex}, {Heterogeneous}, {Incongruent}, {Multi}-source and {Incomplete} {Observations}},\n\tvolume = {11},\n\tnumber = {8},\n\tauthor = {Dinov, Ivo D. and Heavner, Ben and Tang, Ming and Glusman, Gustavo and Chard, Kyle and Darcy, Mike and Madduri, Ravi and Pa, Judy and Spino, Cathie and Kesselman, Carl and {others}},\n\tyear = {2016},\n\tpages = {e0157077},\n}\n\n
\n
\n\n\n\n
\n\n\n\n\n\n