HarpLDA+: Optimizing latent dirichlet allocation for parallel efficiency.
Peng, B.; Zhang, B.; Chen, L.; Avram, M.; Henschel, R.; Stewart, C.; Zhu, S.; McCallum, E.; Smith, L.; Zahniser, T.; Omer, J.; and Qiu, J.
In Obradovic Z. Baeza-Yates R., K., J., N., R., W., C., T., M., S., T., H., X., C., A., B., R., T., J., Z., H., N., J., G., R., editor(s),
Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017, volume 2018-Janua, pages 243-252, 2018. Institute of Electrical and Electronics Engineers Inc.
Website
doi
link
bibtex
abstract
@inproceedings{
title = {HarpLDA+: Optimizing latent dirichlet allocation for parallel efficiency},
type = {inproceedings},
year = {2018},
keywords = {Algorithm optimization; Collective communications,Big data,Learning systems; Statistics},
pages = {243-252},
volume = {2018-Janua},
websites = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85047792604&doi=10.1109%2FBigData.2017.8257932&partnerID=40&md5=634b9836d11831c51e661b9e5e90bcd9},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
id = {ce97ebfa-0470-3e41-8c53-c8c9e7f45275},
created = {2018-06-25T18:22:28.432Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-09-09T19:33:20.152Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Peng2018243},
source_type = {conference},
notes = {cited By 0; Conference of 5th IEEE International Conference on Big Data, Big Data 2017 ; Conference Date: 11 December 2017 Through 14 December 2017; Conference Code:134260},
folder_uuids = {089a8687-5c2e-4a40-91e2-0a855ea1eb95},
private_publication = {false},
abstract = {Latent Dirichlet Allocation (LDA) is a widely used machine learning technique in topic modeling and data analysis. Training large LDA models on big datasets involves dynamic and irregular computation patterns and is a major challenge to both algorithm optimization and system design. In this paper, we present a comprehensive benchmarking of our novel synchronized LDA training system HarpLDA+ based on Hadoop and Java. It demonstrates impressive performance when compared to three other MPI/C++ based state-of-the-art systems, which are LightLDA, F+NomadLDA, and WarpLDA. HarpLDA+ uses optimized collective communication with a timer control for load balance, leading to stable scalability in both shared-memory and distributed systems. We demonstrate in the experiments that HarpLDA+ is effective in reducing synchronization and communication overhead and outperforms the other three LDA training systems. © 2017 IEEE.},
bibtype = {inproceedings},
author = {Peng, B and Zhang, B and Chen, L and Avram, M and Henschel, R and Stewart, C and Zhu, S and McCallum, E and Smith, L and Zahniser, T and Omer, J and Qiu, J},
editor = {Obradovic Z. Baeza-Yates R., Kepner J Nambiar R Wang C Toyoda M Suzumura T Hu X Cuzzocrea A Baeza-Yates R Tang J Zang H Nie J.-Y. Ghosh R},
doi = {10.1109/BigData.2017.8257932},
booktitle = {Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017}
}
Latent Dirichlet Allocation (LDA) is a widely used machine learning technique in topic modeling and data analysis. Training large LDA models on big datasets involves dynamic and irregular computation patterns and is a major challenge to both algorithm optimization and system design. In this paper, we present a comprehensive benchmarking of our novel synchronized LDA training system HarpLDA+ based on Hadoop and Java. It demonstrates impressive performance when compared to three other MPI/C++ based state-of-the-art systems, which are LightLDA, F+NomadLDA, and WarpLDA. HarpLDA+ uses optimized collective communication with a timer control for load balance, leading to stable scalability in both shared-memory and distributed systems. We demonstrate in the experiments that HarpLDA+ is effective in reducing synchronization and communication overhead and outperforms the other three LDA training systems. © 2017 IEEE.
The CSBG - LSU Gateway.
Abeysinghe, E.; Brylinski, M.; Christie, M.; Marru, S.; and Pierce, M.
In
Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18, pages 1-4, 7 2018. ACM Press
Paper
Website
doi
link
bibtex
abstract
@inproceedings{
title = {The CSBG - LSU Gateway},
type = {inproceedings},
year = {2018},
keywords = {Apache Airavata,Bioinformatics,Computational System Biology,Science Gateway},
pages = {1-4},
websites = {http://dl.acm.org/citation.cfm?doid=3219104.3229245},
month = {7},
publisher = {ACM Press},
day = {22},
city = {New York, New York, USA},
id = {ad1aabad-bc79-3f51-aa09-14456eb9b34e},
created = {2019-10-01T17:20:13.350Z},
accessed = {2019-08-19},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:45.496Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Abeysinghe2018},
private_publication = {false},
abstract = {Science gateways are identified as an effective way to publish and distribute software for research communities without the burden of learning HPC (High Performance Computer) systems. In the past, researchers were expected to have in-depth knowledge about using HPC systems for computations along with their respective science field in order to do effective research. Science gateways eliminate the need to learn HPC systems and allows the research communities to focus more on their science and let the gateway handle communicating with HPCs. In this poster we are presenting the science gateway project of CSBG (Computational System Biology Group - www.brylinski.org) of Department of Biological Sciences with Center for Computation & Technology at LSU (Louisiana State University). The gateway project was initiated in order to provide CSBG software tools as a service through a science gateway.},
bibtype = {inproceedings},
author = {Abeysinghe, Eroma and Brylinski, Michal and Christie, Marcus and Marru, Suresh and Pierce, Marlon},
doi = {10.1145/3219104.3229245},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18}
}
Science gateways are identified as an effective way to publish and distribute software for research communities without the burden of learning HPC (High Performance Computer) systems. In the past, researchers were expected to have in-depth knowledge about using HPC systems for computations along with their respective science field in order to do effective research. Science gateways eliminate the need to learn HPC systems and allows the research communities to focus more on their science and let the gateway handle communicating with HPCs. In this poster we are presenting the science gateway project of CSBG (Computational System Biology Group - www.brylinski.org) of Department of Biological Sciences with Center for Computation & Technology at LSU (Louisiana State University). The gateway project was initiated in order to provide CSBG software tools as a service through a science gateway.
Supporting Science Gateways Using Apache Airavata and SciGaP Services.
Pierce, M.; Marru, S.; Abeysinghe, E.; Pamidighantam, S.; Christie, M.; and Wannipurage, D.
In
Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18, pages 1-4, 7 2018. ACM Press
Paper
Website
doi
link
bibtex
abstract
@inproceedings{
title = {Supporting Science Gateways Using Apache Airavata and SciGaP Services},
type = {inproceedings},
year = {2018},
keywords = {Cyberinfrastructure,Science gateways,Software as a service},
pages = {1-4},
websites = {http://dl.acm.org/citation.cfm?doid=3219104.3229240},
month = {7},
publisher = {ACM Press},
day = {22},
city = {New York, New York, USA},
id = {fecae67f-d76e-338d-ae21-1847ddbea86e},
created = {2019-10-01T17:20:20.323Z},
accessed = {2019-08-19},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:44.031Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Pierce2018},
private_publication = {false},
abstract = {The Science Gateways Platform as a service (SciGaP.org) project provides a rapid development and stable hosting platform for a wide range of science gateways that focus on software as a service. Based on the open source Apache Airavata project, SciGaP services include user management, workflow execution management, computational experiment archiving and access, and sharing services that allow users to share results and other digital artifacts. SciGaP services are multi-tenanted, with clients accessing services through a well-defined, programming language-independent API. SciGaP services can be integrated into web, mobile, and desktop clients. To simplify development for new clients, SciGaP includes the PGA, a generic PHP-based gateway client for SciGaP services that also acts as a reference implementation of the API. Several example gateways using these services are summarized. © 2018 Copyright held by the owner/author(s).},
bibtype = {inproceedings},
author = {Pierce, Marlon and Marru, Suresh and Abeysinghe, Eroma and Pamidighantam, Sudhakar and Christie, Marcus and Wannipurage, Dimuthu},
doi = {10.1145/3219104.3229240},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18}
}
The Science Gateways Platform as a service (SciGaP.org) project provides a rapid development and stable hosting platform for a wide range of science gateways that focus on software as a service. Based on the open source Apache Airavata project, SciGaP services include user management, workflow execution management, computational experiment archiving and access, and sharing services that allow users to share results and other digital artifacts. SciGaP services are multi-tenanted, with clients accessing services through a well-defined, programming language-independent API. SciGaP services can be integrated into web, mobile, and desktop clients. To simplify development for new clients, SciGaP includes the PGA, a generic PHP-based gateway client for SciGaP services that also acts as a reference implementation of the API. Several example gateways using these services are summarized. © 2018 Copyright held by the owner/author(s).
Fracture Advancing Step Tectonics Observed in the Yuha Desert and Ocotillo, CA, Following the 2010 M w 7.2 El Mayor-Cucapah Earthquake.
Donnellan, A.; Parker, J.; Heflin, M.; Lyzenga, G.; Moore, A.; Ludwig, L., G.; Rundle, J.; Wang, J.; and Pierce, M.
Earth and Space Science, 5(9): 456-472. 9 2018.
Website
doi
link
bibtex
abstract
@article{
title = {Fracture Advancing Step Tectonics Observed in the Yuha Desert and Ocotillo, CA, Following the 2010 M w 7.2 El Mayor-Cucapah Earthquake},
type = {article},
year = {2018},
keywords = {GPS,UAVSAR,earthquake,fault,geodetic imaging,stepover},
pages = {456-472},
volume = {5},
websites = {http://doi.wiley.com/10.1029/2017EA000351},
month = {9},
publisher = {Wiley-Blackwell Publishing Ltd},
day = {1},
id = {66121872-12de-3182-a2cd-f1cd035a5237},
created = {2019-10-01T17:20:21.526Z},
accessed = {2019-08-19},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:44.418Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Donnellan2018},
private_publication = {false},
abstract = {©2018. The Authors. Uninhabited aerial vehicle synthetic aperture radar (UAVSAR) observations 2009–2017 of the Yuha Desert area and Global Positioning System (GPS) time series encompassing the region reveal a northward migrating pattern of deformation following the 4 April 2010 Mw7.2 El Mayor-Cucapah (EMC) earthquake. The north end of the EMC rupture exhibits an asymmetric pattern of deformation that is substantial and smooth northeast of the rupture and limited but with surface fracturing slip northwest. The earthquake triggered ~1 cm of surface coseismic slip at the Yuha fault, which continued to slip postseismically. 2.5 cm of Yuha fault slip occurred by the time of the 15 June 2010 Mw5.7 Ocotillo aftershock and 5 cm of slip occurred by 2017 following a logarithmic afterslip decay 16-day timescale. The Ocotillo aftershock triggered 1.4 cm of slip on a northwest trend extending to the Elsinore fault and by 7 years after the EMC earthquake 2.4 cm of slip had accumulated with a distribution following an afterslip function with a 16-day timescale consistent with other earthquakes and a rate strengthening upper crustal sedimentary layer. GPS data show broad coseismic uplift of the Salton Trough and delayed postseismic motion that may be indicative of fluid migration there and subsidence west of the rupture extension, which continues following the earthquake. The data indicate that the Elsinore, Laguna Salada, and EMC ruptures are part of the same fault system. The results also suggest that north-south shortening and east-west extension across the region drove fracture advancing step tectonics north of the EMC earthquake rupture.},
bibtype = {article},
author = {Donnellan, Andrea and Parker, Jay and Heflin, Michael and Lyzenga, Gregory and Moore, Angelyn and Ludwig, Lisa Grant and Rundle, John and Wang, Jun and Pierce, Marlon},
doi = {10.1029/2017EA000351},
journal = {Earth and Space Science},
number = {9}
}
©2018. The Authors. Uninhabited aerial vehicle synthetic aperture radar (UAVSAR) observations 2009–2017 of the Yuha Desert area and Global Positioning System (GPS) time series encompassing the region reveal a northward migrating pattern of deformation following the 4 April 2010 Mw7.2 El Mayor-Cucapah (EMC) earthquake. The north end of the EMC rupture exhibits an asymmetric pattern of deformation that is substantial and smooth northeast of the rupture and limited but with surface fracturing slip northwest. The earthquake triggered ~1 cm of surface coseismic slip at the Yuha fault, which continued to slip postseismically. 2.5 cm of Yuha fault slip occurred by the time of the 15 June 2010 Mw5.7 Ocotillo aftershock and 5 cm of slip occurred by 2017 following a logarithmic afterslip decay 16-day timescale. The Ocotillo aftershock triggered 1.4 cm of slip on a northwest trend extending to the Elsinore fault and by 7 years after the EMC earthquake 2.4 cm of slip had accumulated with a distribution following an afterslip function with a 16-day timescale consistent with other earthquakes and a rate strengthening upper crustal sedimentary layer. GPS data show broad coseismic uplift of the Salton Trough and delayed postseismic motion that may be indicative of fluid migration there and subsidence west of the rupture extension, which continues following the earthquake. The data indicate that the Elsinore, Laguna Salada, and EMC ruptures are part of the same fault system. The results also suggest that north-south shortening and east-west extension across the region drove fracture advancing step tectonics north of the EMC earthquake rupture.
Evaluating NextCloud as a File Storage for Apache Airavata.
Kariyattin, S.; Marru, S.; and Pierce, M.
In
Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18, pages 1-4, 7 2018. ACM Press
Paper
Website
doi
link
bibtex
abstract
@inproceedings{
title = {Evaluating NextCloud as a File Storage for Apache Airavata},
type = {inproceedings},
year = {2018},
keywords = {Apache Airavata,File Storage,File Transfer,NextCloud,WebDAV},
pages = {1-4},
websites = {http://dl.acm.org/citation.cfm?doid=3219104.3229270},
month = {7},
publisher = {ACM Press},
day = {22},
city = {New York, New York, USA},
id = {36bce677-451f-3965-a4d4-f542d3b36a5c},
created = {2019-10-01T17:20:23.138Z},
accessed = {2019-08-19},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:43.797Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Kariyattin2018},
private_publication = {false},
abstract = {Science gateways enable researchers from broad communities to access advanced computing and storage resources. The researchers analyze large amounts of data using the compute resources and the generated results, usually files are saved in the storage. Consider a scenario where a researcher has large output data files of historically run experiments on an external server. If the researcher wants to move the data to the gateway storage, then the only way to do it is through data transfer. This task would be cumbersome and time consuming. The paper discusses an approach through which historic or any data existing on a different server or in a cloud storage (Google Drive) or in an object storage (Amazon S3) can be ingested into the existing gateway without actually transferring it to the server. We discuss about a software called NextCloud and how it can be used as a gateway storage by integrating it with Apache Airavata. Airavata currently uses local file storage to store user related data files. On the client side, Airavata clients use different protocols like HTTP and SFTP for file transfer. NextCloud is an open source file share and communication platform that provides a common file access layer through its universal file access to different data sources. Integrating NextCloud with Airavata could solve the problem of providing unified file transfer API across all the Airavata clients. As NextCloud supports various external storages, its integration with Airavata would also enable the data ingestion and importing large data from different storage sources to Airavata.},
bibtype = {inproceedings},
author = {Kariyattin, Sachin and Marru, Suresh and Pierce, Marlon},
doi = {10.1145/3219104.3229270},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18}
}
Science gateways enable researchers from broad communities to access advanced computing and storage resources. The researchers analyze large amounts of data using the compute resources and the generated results, usually files are saved in the storage. Consider a scenario where a researcher has large output data files of historically run experiments on an external server. If the researcher wants to move the data to the gateway storage, then the only way to do it is through data transfer. This task would be cumbersome and time consuming. The paper discusses an approach through which historic or any data existing on a different server or in a cloud storage (Google Drive) or in an object storage (Amazon S3) can be ingested into the existing gateway without actually transferring it to the server. We discuss about a software called NextCloud and how it can be used as a gateway storage by integrating it with Apache Airavata. Airavata currently uses local file storage to store user related data files. On the client side, Airavata clients use different protocols like HTTP and SFTP for file transfer. NextCloud is an open source file share and communication platform that provides a common file access layer through its universal file access to different data sources. Integrating NextCloud with Airavata could solve the problem of providing unified file transfer API across all the Airavata clients. As NextCloud supports various external storages, its integration with Airavata would also enable the data ingestion and importing large data from different storage sources to Airavata.
Scaling JupyterHub Using Kubernetes on Jetstream Cloud.
Sarajlic, S.; Chastang, J.; Marru, S.; Fischer, J.; and Lowe, M.
In
Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18, pages 1-4, 7 2018. ACM Press
Paper
Website
doi
link
bibtex
abstract
@inproceedings{
title = {Scaling JupyterHub Using Kubernetes on Jetstream Cloud},
type = {inproceedings},
year = {2018},
keywords = {Cloud Computing,JupyterHub,Kubernetes,Magnum,OpenStack,Unidata,Workforce Development},
pages = {1-4},
websites = {http://dl.acm.org/citation.cfm?doid=3219104.3229249},
month = {7},
publisher = {ACM Press},
day = {22},
city = {New York, New York, USA},
id = {fc553946-212b-396c-83bf-e9b38820388d},
created = {2019-10-01T17:20:23.818Z},
accessed = {2019-08-19},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:43.786Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Sarajlic2018a},
private_publication = {false},
abstract = {Unidata, an NSF funded project that started in 1983, is a diverse community of education and research institutions with the common goal of sharing geoscience data and the tools to access and visualize that data. Unidata provides weather observations and other data, software tools, and support to enhance Earth-system education and research, and continuously examines ways of adapting their workflows for new technologies to maximize the reach of their education and research efforts. In support of Unidata objectives to host workshops for atmospheric data analysis using JupyterHub, we explore a cloud computing approach leveraging Kubernetes coupled with JupyterHub that when combined will provide a solution for researchers and students to pull data from Unidata and burst onto Jetstream cloud by requesting resources dynamically via easy to use JupyterHub. More specifically, on Jetstream, Kubernetes is used for automating deployment and scaling of domain specific containerized applications, and JupyterHub is used for spawning multiple hubs within the same Kubernetes cluster instance that will be used for supporting classroom settings. JupyterHub's modular kernel feature will support dynamic needs of classroom application requirements. The proposed approach will serve as an end-to-end solution for researchers to execute their workflows, with JupyterHub serving as a powerful tool for user training and next-generation workforce development in atmospheric sciences.},
bibtype = {inproceedings},
author = {Sarajlic, Semir and Chastang, Julien and Marru, Suresh and Fischer, Jeremy and Lowe, Mike},
doi = {10.1145/3219104.3229249},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18}
}
Unidata, an NSF funded project that started in 1983, is a diverse community of education and research institutions with the common goal of sharing geoscience data and the tools to access and visualize that data. Unidata provides weather observations and other data, software tools, and support to enhance Earth-system education and research, and continuously examines ways of adapting their workflows for new technologies to maximize the reach of their education and research efforts. In support of Unidata objectives to host workshops for atmospheric data analysis using JupyterHub, we explore a cloud computing approach leveraging Kubernetes coupled with JupyterHub that when combined will provide a solution for researchers and students to pull data from Unidata and burst onto Jetstream cloud by requesting resources dynamically via easy to use JupyterHub. More specifically, on Jetstream, Kubernetes is used for automating deployment and scaling of domain specific containerized applications, and JupyterHub is used for spawning multiple hubs within the same Kubernetes cluster instance that will be used for supporting classroom settings. JupyterHub's modular kernel feature will support dynamic needs of classroom application requirements. The proposed approach will serve as an end-to-end solution for researchers to execute their workflows, with JupyterHub serving as a powerful tool for user training and next-generation workforce development in atmospheric sciences.
ImageX 3.0: a full stack imaging archive solution.
Young, M., D.; Gopu, A.; and Perigo, R.
In
SPIE ASTRONOMICAL TELESCOPES + INSTRUMENTATION 10-15 June 2018 Austin, Texas, United States, pages 46, 7 2018. SPIE-Intl Soc Optical Eng
Paper
doi
link
bibtex
abstract
@inproceedings{
title = {ImageX 3.0: a full stack imaging archive solution},
type = {inproceedings},
year = {2018},
pages = {46},
month = {7},
publisher = {SPIE-Intl Soc Optical Eng},
day = {6},
id = {fd7aed67-c949-3b19-8de5-42e15d1b0f0b},
created = {2019-10-01T17:20:27.583Z},
accessed = {2019-08-19},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2021-04-23T19:54:35.027Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Young2018},
private_publication = {false},
abstract = {Over the past several years we have faced the need to develop a number of solutions to address the challenge of archiving large-format scientific imaging data and seamlessly visualizing that data-irrespective of the image format-on a web browser. ImageX is a ground-up rewrite and synthesis of our solutions to this issue, with a goal of reducing the workload required to transition from simply storing vast amounts of scientific imaging data on disk to securely archiving and sharing that data with the world. The components that make up the ImageX service stack include a secure and scalable back-end data service optimized for providing imaging data, a pre-processor to harvest metadata and intelligently scale and store the imaging data, and a flexible and embeddable front-end visualization web application. Our latest version of the software suite called ImageX 3.0 has been designed to meet the needs of a single user running locally on their own personal computer or scaled up to provide support for the image storage and visualization needs of a modern observatory with the intention of providing a 'Push button' solution to a fully deployed solution. Each ImageX 3.0 component is provided as a Docker container, and can be rapidly and seamlessly deployed to meet demand. In this paper, we describe the ImageX architecture while demonstrating many of its features, including intelligent image scaling with adaptive histograms, load-balancing, and administrative tools. On the user-facing side we demonstrate how the ImageX 3.0 viewer can be embedded into the content of any web application, and explore the astronomy-specific features and plugins we've written into it. The ImageX service stack is fully open-sourced, and is built upon widely-supported industry standards (Node.js, Angular, etc.). Apart from being deployed as a standalone service stack, ImageX components are currently in use or expected to be deployed on: (1) the ODI-PPA portal serving astronomical images taken at the WIYN Observatory in near real-time; (2) the web portal serving microscopy images taken at the IU Electron Microscopy Center; (3) the RADY-SCA portal supporting radiology and medical imaging as well as neuroscience researchers at IU. © 2018 SPIE.},
bibtype = {inproceedings},
author = {Young, Michael D. and Gopu, Arvind and Perigo, Raymond},
doi = {10.1117/12.2313684},
booktitle = {SPIE ASTRONOMICAL TELESCOPES + INSTRUMENTATION 10-15 June 2018 Austin, Texas, United States}
}
Over the past several years we have faced the need to develop a number of solutions to address the challenge of archiving large-format scientific imaging data and seamlessly visualizing that data-irrespective of the image format-on a web browser. ImageX is a ground-up rewrite and synthesis of our solutions to this issue, with a goal of reducing the workload required to transition from simply storing vast amounts of scientific imaging data on disk to securely archiving and sharing that data with the world. The components that make up the ImageX service stack include a secure and scalable back-end data service optimized for providing imaging data, a pre-processor to harvest metadata and intelligently scale and store the imaging data, and a flexible and embeddable front-end visualization web application. Our latest version of the software suite called ImageX 3.0 has been designed to meet the needs of a single user running locally on their own personal computer or scaled up to provide support for the image storage and visualization needs of a modern observatory with the intention of providing a 'Push button' solution to a fully deployed solution. Each ImageX 3.0 component is provided as a Docker container, and can be rapidly and seamlessly deployed to meet demand. In this paper, we describe the ImageX architecture while demonstrating many of its features, including intelligent image scaling with adaptive histograms, load-balancing, and administrative tools. On the user-facing side we demonstrate how the ImageX 3.0 viewer can be embedded into the content of any web application, and explore the astronomy-specific features and plugins we've written into it. The ImageX service stack is fully open-sourced, and is built upon widely-supported industry standards (Node.js, Angular, etc.). Apart from being deployed as a standalone service stack, ImageX components are currently in use or expected to be deployed on: (1) the ODI-PPA portal serving astronomical images taken at the WIYN Observatory in near real-time; (2) the web portal serving microscopy images taken at the IU Electron Microscopy Center; (3) the RADY-SCA portal supporting radiology and medical imaging as well as neuroscience researchers at IU. © 2018 SPIE.
Training children aged 5–10 years in compliance control: tracing smaller figures yields better learning not specific to the scale of drawn figures.
Snapp-Childs, W.; Fath, A., J.; and Bingham, G., P.
Experimental Brain Research, 236(10): 2589-2601. 10 2018.
Paper
doi
link
bibtex
abstract
@article{
title = {Training children aged 5–10 years in compliance control: tracing smaller figures yields better learning not specific to the scale of drawn figures},
type = {article},
year = {2018},
keywords = {Compliance control,Manual control,Motor development,Prospective control,Specificity},
pages = {2589-2601},
volume = {236},
month = {10},
publisher = {Springer Verlag},
day = {1},
id = {ab147de2-fdc2-377c-a242-5014535bda00},
created = {2019-10-01T17:20:28.391Z},
accessed = {2019-08-19},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:31.125Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {false},
hidden = {false},
citation_key = {Snapp-Childs2018},
private_publication = {false},
abstract = {© 2018 Springer-Verlag GmbH Germany, part of Springer Nature Previously we developed a method that supports active movement generation to allow practice with improvement of good compliance control in tracing and drawing. We showed that the method allowed children with motor impairments to improve at a 3D tracing task to become as proficient as typically developing children and that the training improved 2D figure copying. In this study, we expanded the training protocol to include a wider variety of ages (5–10-year-olds) and we made the figures traced in training the same as in figure copying, but varied the scale of training and copying figures to assess the generality of learning. Forty-eight children were assigned to groups trained using large or small figures. All were tested before training with a tracing task and a copying task. Then, the children trained over five sessions in the tracing task with either small or large figures. Finally, the tracing and copying tasks were tested again following training. A mean speed measure was used to control for path length variations in the timed task. Performance on both tasks at both baseline and posttest varied as a function of the size of the figure and age. In addition, tracing performance also varied with the level of support. In particular, speeds were higher with more support, larger figures and older children. After training, performance improved. Speeds increased. In tracing, performance improved more for large figures traced by children who trained on large figures. In copying, however, performance only improved significantly for children who had trained on small figures and it improved equally for large and small figures. In conclusion, training by tracing smaller figures yielded better learning that was not, however, specific to the scale of drawn figures. Small figures exhibit greater mean curvature. We infer that it yielded better general improvement.},
bibtype = {article},
author = {Snapp-Childs, Winona and Fath, Aaron J. and Bingham, Geoffrey P.},
doi = {10.1007/s00221-018-5319-y},
journal = {Experimental Brain Research},
number = {10}
}
© 2018 Springer-Verlag GmbH Germany, part of Springer Nature Previously we developed a method that supports active movement generation to allow practice with improvement of good compliance control in tracing and drawing. We showed that the method allowed children with motor impairments to improve at a 3D tracing task to become as proficient as typically developing children and that the training improved 2D figure copying. In this study, we expanded the training protocol to include a wider variety of ages (5–10-year-olds) and we made the figures traced in training the same as in figure copying, but varied the scale of training and copying figures to assess the generality of learning. Forty-eight children were assigned to groups trained using large or small figures. All were tested before training with a tracing task and a copying task. Then, the children trained over five sessions in the tracing task with either small or large figures. Finally, the tracing and copying tasks were tested again following training. A mean speed measure was used to control for path length variations in the timed task. Performance on both tasks at both baseline and posttest varied as a function of the size of the figure and age. In addition, tracing performance also varied with the level of support. In particular, speeds were higher with more support, larger figures and older children. After training, performance improved. Speeds increased. In tracing, performance improved more for large figures traced by children who trained on large figures. In copying, however, performance only improved significantly for children who had trained on small figures and it improved equally for large and small figures. In conclusion, training by tracing smaller figures yielded better learning that was not, however, specific to the scale of drawn figures. Small figures exhibit greater mean curvature. We infer that it yielded better general improvement.
IQ-stations: Advances in state-of-the-art low cost immersive displays for research and development.
Sherman, W., R.; Whiting, E.; Money, J., H.; and Grover, S.
In
Proceedings of the Practice and Experience on Advanced Research Computing (PEARC '18), pages 5, 7 2018. Association for Computing Machinery
Paper
doi
link
bibtex
@inproceedings{
title = {IQ-stations: Advances in state-of-the-art low cost immersive displays for research and development},
type = {inproceedings},
year = {2018},
keywords = {Consumer hardware,IQ-Station,Tracking systems,Virtual reality},
pages = {5},
month = {7},
publisher = {Association for Computing Machinery},
day = {22},
id = {06398430-4c2b-319c-bc41-b748bfff857b},
created = {2019-10-01T17:20:30.200Z},
accessed = {2019-08-27},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:31.190Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {false},
hidden = {false},
citation_key = {Sherman2018},
private_publication = {false},
bibtype = {inproceedings},
author = {Sherman, William R. and Whiting, Eric and Money, James H. and Grover, Shane},
doi = {10.1145/3219104.3219106},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing (PEARC '18)}
}
Limited mutation-rate variation within the Paramecium aurelia species complex.
Long, H.; Doak, T., G.; and Lynch, M.
G3: Genes, Genomes, Genetics, 8(7): 2523-2526. 7 2018.
Paper
doi
link
bibtex
abstract
@article{
title = {Limited mutation-rate variation within the Paramecium aurelia species complex},
type = {article},
year = {2018},
keywords = {Ciliated protozoa mutationaccumulation,Neutral evolution},
pages = {2523-2526},
volume = {8},
month = {7},
publisher = {Genetics Society of America},
day = {1},
id = {e4392222-fcdf-3038-87f3-df208718c8ad},
created = {2019-10-01T17:20:30.695Z},
accessed = {2019-08-20},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:30.668Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Long2018},
private_publication = {false},
abstract = {© 2018 Long et al. Mutation is one of the most fundamental evolutionary forces. Studying variation in the mutation rate within and among closely-related species can help reveal mechanisms of genome divergence, but such variation is unstudied in the vast majority of organisms. Previous studies on ciliated protozoa have found extremely low mutation rates. In this study, using mutation-accumulation techniques combined with deep whole-genome sequencing, we explore the germline base-substitution mutation-rate variation of three cryptic species in the Paramecium aurelia species complex-P. biaurelia, P. sexaurelia, and P. tetraurelia. We find that there is extremely limited variation of the mutation rate and spectrum in the three species and confirm the extremely low mutation rate of ciliates.},
bibtype = {article},
author = {Long, Hongan and Doak, Thomas G. and Lynch, Michael},
doi = {10.1534/g3.118.200420},
journal = {G3: Genes, Genomes, Genetics},
number = {7}
}
© 2018 Long et al. Mutation is one of the most fundamental evolutionary forces. Studying variation in the mutation rate within and among closely-related species can help reveal mechanisms of genome divergence, but such variation is unstudied in the vast majority of organisms. Previous studies on ciliated protozoa have found extremely low mutation rates. In this study, using mutation-accumulation techniques combined with deep whole-genome sequencing, we explore the germline base-substitution mutation-rate variation of three cryptic species in the Paramecium aurelia species complex-P. biaurelia, P. sexaurelia, and P. tetraurelia. We find that there is extremely limited variation of the mutation rate and spectrum in the three species and confirm the extremely low mutation rate of ciliates.
Insights into an Extensively Fragmented Eukaryotic Genome: De Novo Genome Sequencing of the Multinuclear Ciliate Uroleptopsis citrina.
Zheng, W.; Wang, C.; Yan, Y.; Gao, F.; Doak, T., G.; and Song, W.
Genome Biology and Evolution, 10(3): 883-894. 2018.
doi
link
bibtex
abstract
@article{
title = {Insights into an Extensively Fragmented Eukaryotic Genome: De Novo Genome Sequencing of the Multinuclear Ciliate Uroleptopsis citrina},
type = {article},
year = {2018},
pages = {883-894},
volume = {10},
publisher = {Oxford University Press},
id = {4527e4a4-3cdc-3263-af11-a7f1fbede99b},
created = {2019-10-01T17:20:31.089Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:30.686Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Zheng2018},
source_type = {JOUR},
private_publication = {false},
abstract = {Ciliated protists are a large group of single-celled eukaryotes with separate germline and somatic nuclei in each cell. The somatic genome is developed from the zygotic nucleus through a series of chromosomal rearrangements, including fragmentation, DNA elimination, de novo telomere addition, and DNA amplification. This unique feature makes them perfect models for research in genome biology and evolution. However, genomic research of ciliates has been limited to a few species, owing to problems with DNA contamination and obstacles in cultivation. Here, we introduce a method combining telomere-primer PCR amplification and high-throughput sequencing, which can reduce DNA contamination and obtain genomic data efficiently. Based on this method, we report a draft somatic genome of a multimacronuclear ciliate, Uroleptopsis citrina. 1) The telomeric sequence in U. citrina is confirmed to be C4A4C4A4C4 by directly blunt-end cloning. 2) Genomic analysis of the resulting chromosomes shows a “one-gene one-chromosome” pattern, with a small number of multiple-gene chromosomes. 3) Amino acid usage is analyzed, and reassignment of stop codons is confirmed. 4) Chromosomal analysis shows an obvious asymmetrical GC skew and high bias between A and T in the subtelomeric regions of the sense-strand, with the detection of an 11-bp high AT motif region in the 3′ subtelomeric region. 5) The subtelomeric sequence also has an obvious 40 nt strand oscillation of nucleotide ratio. 6) In the 5′ subtelomeric region of the coding strand, the distribution of potential TATA-box regions is illustrated, which accumulate between 30 and 50 nt. This work provides a valuable reference for genomic research and furthers our understanding of the dynamic nature of unicellular eukaryotic genomes.},
bibtype = {article},
author = {Zheng, Weibo and Wang, Chundi and Yan, Ying and Gao, Feng and Doak, Thomas G and Song, Weibo},
doi = {https://doi.org/10.1093/gbe/evy055},
journal = {Genome Biology and Evolution},
number = {3}
}
Ciliated protists are a large group of single-celled eukaryotes with separate germline and somatic nuclei in each cell. The somatic genome is developed from the zygotic nucleus through a series of chromosomal rearrangements, including fragmentation, DNA elimination, de novo telomere addition, and DNA amplification. This unique feature makes them perfect models for research in genome biology and evolution. However, genomic research of ciliates has been limited to a few species, owing to problems with DNA contamination and obstacles in cultivation. Here, we introduce a method combining telomere-primer PCR amplification and high-throughput sequencing, which can reduce DNA contamination and obtain genomic data efficiently. Based on this method, we report a draft somatic genome of a multimacronuclear ciliate, Uroleptopsis citrina. 1) The telomeric sequence in U. citrina is confirmed to be C4A4C4A4C4 by directly blunt-end cloning. 2) Genomic analysis of the resulting chromosomes shows a “one-gene one-chromosome” pattern, with a small number of multiple-gene chromosomes. 3) Amino acid usage is analyzed, and reassignment of stop codons is confirmed. 4) Chromosomal analysis shows an obvious asymmetrical GC skew and high bias between A and T in the subtelomeric regions of the sense-strand, with the detection of an 11-bp high AT motif region in the 3′ subtelomeric region. 5) The subtelomeric sequence also has an obvious 40 nt strand oscillation of nucleotide ratio. 6) In the 5′ subtelomeric region of the coding strand, the distribution of potential TATA-box regions is illustrated, which accumulate between 30 and 50 nt. This work provides a valuable reference for genomic research and furthers our understanding of the dynamic nature of unicellular eukaryotic genomes.
Stilbenoid prenyltransferases define key steps in the diversification of peanut phytoalexins.
Yang, T.; Fang, L.; Sanders, S.; Jayanthi, S.; Rajan, G.; Podicheti, R.; Thallapuranam, S., K.; Mockaitis, K.; and Medina-Bolivar, F.
The Journal of biological chemistry, 293(1): 28-46. 2018.
Paper
Website
doi
link
bibtex
abstract
1 download
@article{
title = {Stilbenoid prenyltransferases define key steps in the diversification of peanut phytoalexins.},
type = {article},
year = {2018},
keywords = {Arachidin,arachidin,hairy root,peanut,plant biochemistry,prenylation,resveratrol,secondary metabolism,small molecule,stilbenoid,transcriptomics},
pages = {28-46},
volume = {293},
websites = {http://www.ncbi.nlm.nih.gov/pubmed/29158266,http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC5766904},
id = {00ebce8a-ca57-333c-baec-dd957b280018},
created = {2019-10-01T17:20:31.445Z},
accessed = {2019-08-27},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:30.835Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Yang2018},
private_publication = {false},
abstract = {Defense responses of peanut (Arachis hypogaea) to biotic and abiotic stresses include the synthesis of prenylated stilbenoids. Members of this compound class show several protective activities in human disease studies, and the list of potential therapeutic targets continues to expand. Despite their medical and biological importance, the biosynthetic pathways of prenylated stilbenoids remain to be elucidated, and the genes encoding stilbenoid-specific prenyltransferases have yet to be identified in any plant species. In this study, we combined targeted transcriptomic and metabolomic analyses to discover prenyltransferase genes in elicitor-treated peanut hairy root cultures. Transcripts encoding five enzymes were identified, and two of these were functionally characterized in a transient expression system consisting of Agrobacterium-infiltrated leaves of Nicotiana benthamiana We observed that one of these prenyltransferases, AhR4DT-1, catalyzes a key reaction in the biosynthesis of prenylated stilbenoids, in which resveratrol is prenylated at its C-4 position to form arachidin-2, whereas another, AhR3'DT-1, added the prenyl group to C-3' of resveratrol. Each of these prenyltransferases was highly specific for stilbenoid substrates, and we confirmed their subcellular location in the plastid by fluorescence microscopy. Structural analysis of the prenylated stilbenoids suggested that these two prenyltransferase activities represent the first committed steps in the biosynthesis of a large number of prenylated stilbenoids and their derivatives in peanut. In summary, we have identified five candidate prenyltransferases in peanut and confirmed that two of them are stilbenoid-specific, advancing our understanding of this specialized enzyme family and shedding critical light onto the biosynthesis of bioactive stilbenoids.},
bibtype = {article},
author = {Yang, Tianhong and Fang, Lingling and Sanders, Sheri and Jayanthi, Srinivas and Rajan, Gayathri and Podicheti, Ram and Thallapuranam, Suresh Kumar and Mockaitis, Keithanne and Medina-Bolivar, Fabricio},
doi = {10.1074/jbc.RA117.000564},
journal = {The Journal of biological chemistry},
number = {1}
}
Defense responses of peanut (Arachis hypogaea) to biotic and abiotic stresses include the synthesis of prenylated stilbenoids. Members of this compound class show several protective activities in human disease studies, and the list of potential therapeutic targets continues to expand. Despite their medical and biological importance, the biosynthetic pathways of prenylated stilbenoids remain to be elucidated, and the genes encoding stilbenoid-specific prenyltransferases have yet to be identified in any plant species. In this study, we combined targeted transcriptomic and metabolomic analyses to discover prenyltransferase genes in elicitor-treated peanut hairy root cultures. Transcripts encoding five enzymes were identified, and two of these were functionally characterized in a transient expression system consisting of Agrobacterium-infiltrated leaves of Nicotiana benthamiana We observed that one of these prenyltransferases, AhR4DT-1, catalyzes a key reaction in the biosynthesis of prenylated stilbenoids, in which resveratrol is prenylated at its C-4 position to form arachidin-2, whereas another, AhR3'DT-1, added the prenyl group to C-3' of resveratrol. Each of these prenyltransferases was highly specific for stilbenoid substrates, and we confirmed their subcellular location in the plastid by fluorescence microscopy. Structural analysis of the prenylated stilbenoids suggested that these two prenyltransferase activities represent the first committed steps in the biosynthesis of a large number of prenylated stilbenoids and their derivatives in peanut. In summary, we have identified five candidate prenyltransferases in peanut and confirmed that two of them are stilbenoid-specific, advancing our understanding of this specialized enzyme family and shedding critical light onto the biosynthesis of bioactive stilbenoids.
Escherichia coli cultures maintain stable subpopulation structure during long-term evolution.
Behringer, M., G.; Choi, B., I.; Miller, S., F.; Doak, T., G.; Karty, J., A.; Guo, W.; and Lynch, M.
Proceedings of the National Academy of Sciences of the United States of America, 115(20): E4642-E4650. 2018.
Website
doi
link
bibtex
abstract
@article{
title = {Escherichia coli cultures maintain stable subpopulation structure during long-term evolution},
type = {article},
year = {2018},
keywords = {Article; bacterial colonization; bacterial gene; b},
pages = {E4642-E4650},
volume = {115},
websites = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85046974773&doi=10.1073%2Fpnas.1708371115&partnerID=40&md5=f03cd74612020bba927dfd16284a2ae6},
publisher = {National Academy of Sciences},
id = {5cfc23ba-c9dd-30e6-9bc3-522d0ffde292},
created = {2019-10-01T17:20:32.089Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:20:32.089Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Behringer2018E4642},
source_type = {article},
notes = {cited By 0},
private_publication = {false},
abstract = {How genetic variation is generated and maintained remains a central question in evolutionary biology. When presented with a complex environment, microbes can take advantage of genetic variation to exploit new niches. Here we present a massively parallel experiment where WT and repair-deficient (ΔmutL) Escherichia coli populations have evolved over 3 y in a spatially heterogeneous and nutritionally complex environment. Metage-nomic sequencing revealed that these initially isogenic populations evolved and maintained stable subpopulation structure in just 10 mL of medium for up to 10,000 generations, consisting of up to five major haplotypes with many minor haplotypes. We characterized the genomic, transcriptomic, exometabolomic, and phenotypic differences between clonal isolates, revealing subpopulation structure driven primarily by spatial segregation followed by differential utilization of nutrients. In addition to genes regulating the import and catabolism of nutrients, major polymorphisms of note included insertion elements transposing into fimE (regulator of the type I fimbriae) and upstream of hns (global regulator of environmental-change and stress-response genes), both known to regulate biofilm formation. Interestingly, these genes have also been identified as critical to colonization in uro-pathogenic E. coli infections. Our findings illustrate the complexity that can arise and persist even in small cultures, raising the possibility that infections may often be promoted by an evolving and complex pathogen population. © 2018 National Academy of Sciences. All rights reserved.},
bibtype = {article},
author = {Behringer, M G and Choi, B I and Miller, S F and Doak, T G and Karty, J A and Guo, W and Lynch, M},
doi = {10.1073/pnas.1708371115},
journal = {Proceedings of the National Academy of Sciences of the United States of America},
number = {20}
}
How genetic variation is generated and maintained remains a central question in evolutionary biology. When presented with a complex environment, microbes can take advantage of genetic variation to exploit new niches. Here we present a massively parallel experiment where WT and repair-deficient (ΔmutL) Escherichia coli populations have evolved over 3 y in a spatially heterogeneous and nutritionally complex environment. Metage-nomic sequencing revealed that these initially isogenic populations evolved and maintained stable subpopulation structure in just 10 mL of medium for up to 10,000 generations, consisting of up to five major haplotypes with many minor haplotypes. We characterized the genomic, transcriptomic, exometabolomic, and phenotypic differences between clonal isolates, revealing subpopulation structure driven primarily by spatial segregation followed by differential utilization of nutrients. In addition to genes regulating the import and catabolism of nutrients, major polymorphisms of note included insertion elements transposing into fimE (regulator of the type I fimbriae) and upstream of hns (global regulator of environmental-change and stress-response genes), both known to regulate biofilm formation. Interestingly, these genes have also been identified as critical to colonization in uro-pathogenic E. coli infections. Our findings illustrate the complexity that can arise and persist even in small cultures, raising the possibility that infections may often be promoted by an evolving and complex pathogen population. © 2018 National Academy of Sciences. All rights reserved.
Return on Investment for Three Cyberinfrastructure Facilities: A Local Campus Supercomputer, the NSF-Funded Jetstream Cloud System, and XSEDE (the eXtreme Science and Engineering Discovery Environment).
Stewart, C., A.; Hancock, D., Y.; Wernert, J.; Link, M., R.; Wilkins-Diehr, N.; Miller, T.; Gaither, K.; and Snapp-Childs, W.
In
2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC), pages 223-236, 12 2018. IEEE
Paper
Website
doi
link
bibtex
abstract
@inproceedings{
title = {Return on Investment for Three Cyberinfrastructure Facilities: A Local Campus Supercomputer, the NSF-Funded Jetstream Cloud System, and XSEDE (the eXtreme Science and Engineering Discovery Environment)},
type = {inproceedings},
year = {2018},
keywords = {Cost benefit analysis,High performance computing,Scientific computing,Supercomputing},
pages = {223-236},
websites = {https://ieeexplore.ieee.org/document/8603169/},
month = {12},
publisher = {IEEE},
day = {4},
id = {1697f3d0-2337-3ee4-8856-d7a4d04fa5e3},
created = {2019-10-01T17:20:35.064Z},
accessed = {2019-08-14},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-09-09T18:58:47.660Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Stewart2018},
private_publication = {false},
abstract = {The economics of high performance computing are rapidly changing. Commercial cloud offerings, private research clouds, and pressure on the budgets of institutions of higher education and federally-funded research organizations are all contributing factors. As such, it has become a necessity that all expenses and investments be analyzed and considered carefully. In this paper we will analyze the return on investment (ROI) for three different kinds of cyberinfrastructure resources: the eXtreme Science and Engineering Discovery Environment (XSEDE); the NSF-funded Jetstream cloud system; and the Indiana University (IU) Big Red II supercomputer, funded exclusively by IU for use of the IU community and collaborators. We determined the ROI for these three resources by assigning financial values to services by either comparison with commercially available services, or by surveys of value of these resources to their users. In all three cases, the ROI for these very different types of cyberinfrastructure resources was well greater than 1-meaning that investors are getting more than $1 in returned value for every $1 invested. While there are many ways to measure the value and impact of investment in cyberinfrastructure resources, we are able to quantify the short-term ROI and show that it is a net positive for campuses and the federal government respectively.},
bibtype = {inproceedings},
author = {Stewart, Craig A. and Hancock, David Y. and Wernert, Julie and Link, Matthew R. and Wilkins-Diehr, Nancy and Miller, Therese and Gaither, Kelly and Snapp-Childs, Winona},
doi = {10.1109/UCC.2018.00031},
booktitle = {2018 IEEE/ACM 11th International Conference on Utility and Cloud Computing (UCC)}
}
The economics of high performance computing are rapidly changing. Commercial cloud offerings, private research clouds, and pressure on the budgets of institutions of higher education and federally-funded research organizations are all contributing factors. As such, it has become a necessity that all expenses and investments be analyzed and considered carefully. In this paper we will analyze the return on investment (ROI) for three different kinds of cyberinfrastructure resources: the eXtreme Science and Engineering Discovery Environment (XSEDE); the NSF-funded Jetstream cloud system; and the Indiana University (IU) Big Red II supercomputer, funded exclusively by IU for use of the IU community and collaborators. We determined the ROI for these three resources by assigning financial values to services by either comparison with commercially available services, or by surveys of value of these resources to their users. In all three cases, the ROI for these very different types of cyberinfrastructure resources was well greater than 1-meaning that investors are getting more than $1 in returned value for every $1 invested. While there are many ways to measure the value and impact of investment in cyberinfrastructure resources, we are able to quantify the short-term ROI and show that it is a net positive for campuses and the federal government respectively.
High Availability on Jetstream: Practices and Lessons Learned.
Lowe, J., M.; Fischer, J.; Sudarshan, S.; Turner, G.; Stewart, C., A.; and Hancock, D., Y.
In
Proceedings of the 9th Workshop on Scientific Cloud Computing (ScienceCloud'18), of
ScienceCloud'18, pages 4:1--4:7, 2018. ACM
Website
doi
link
bibtex
abstract
@inproceedings{
title = {High Availability on Jetstream: Practices and Lessons Learned},
type = {inproceedings},
year = {2018},
keywords = {Atmosphere,XSEDE,acm reference format,atmosphere,availability,cloud,george turner,hpc,jeremy fischer,john michael lowe,research,sanjana sudarshan,xsede},
pages = {4:1--4:7},
websites = {http://doi.acm.org/10.1145/3217880.3217884},
publisher = {ACM},
city = {New York, NY, USA},
series = {ScienceCloud'18},
id = {5d4797f1-4581-3845-971d-edfa984ebcc2},
created = {2019-10-01T17:20:35.320Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:20:35.320Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Lowe:2018:HAJ:3217880.3217884},
source_type = {inproceedings},
private_publication = {false},
abstract = {Research computing has traditionally used high performance computing (HPC) clusters and has been a service not given to high availability without a doubling of computational and storage capacity. System maintenance such as security patching, firmware updates, and other system upgrades generally meant that the system would be unavailable for the duration of the work unless one has redundant HPC systems and storage. While efforts were often made to limit downtimes, when it became necessary, maintenance windows might be one to two hours or as much as an entire day. As the National Science Foundation (NSF) began funding non-traditional research systems, looking at ways to provide higher availability for researchers became one focus for service providers. One of the design elements of Jetstream was to have geographic dispersion to maximize availability. This was the first step in a number of design elements intended to make Jetstream exceed the NSF's availability requirements. We will examine the design steps employed, the components of the system and how the availability for each was considered in deployment, how maintenance is handled, and the lessons learned from the design and implementation of the Jetstream cloud.},
bibtype = {inproceedings},
author = {Lowe, John Michael and Fischer, Jeremy and Sudarshan, Sanjana and Turner, George and Stewart, Craig A and Hancock, David Y},
doi = {10.1145/3217880.3217884},
booktitle = {Proceedings of the 9th Workshop on Scientific Cloud Computing (ScienceCloud'18)}
}
Research computing has traditionally used high performance computing (HPC) clusters and has been a service not given to high availability without a doubling of computational and storage capacity. System maintenance such as security patching, firmware updates, and other system upgrades generally meant that the system would be unavailable for the duration of the work unless one has redundant HPC systems and storage. While efforts were often made to limit downtimes, when it became necessary, maintenance windows might be one to two hours or as much as an entire day. As the National Science Foundation (NSF) began funding non-traditional research systems, looking at ways to provide higher availability for researchers became one focus for service providers. One of the design elements of Jetstream was to have geographic dispersion to maximize availability. This was the first step in a number of design elements intended to make Jetstream exceed the NSF's availability requirements. We will examine the design steps employed, the components of the system and how the availability for each was considered in deployment, how maintenance is handled, and the lessons learned from the design and implementation of the Jetstream cloud.
Methodologies and Practices for Adoption of a Novel National Research Environment.
Fischer, J.; Beck, B., W.; Sudarshan, S.; Turner, G.; Snapp-Childs, W.; Stewart, C., A.; and Hancock, D., Y.
In
Proceedings of the Practice and Experience on Advanced Research Computing (PEARC '18), of
PEARC '18, pages 21:1--21:7, 2018. ACM
Website
doi
link
bibtex
abstract
@inproceedings{
title = {Methodologies and Practices for Adoption of a Novel National Research Environment},
type = {inproceedings},
year = {2018},
keywords = {XSEDE,cloud,hpc,research},
pages = {21:1--21:7},
websites = {http://doi.acm.org/10.1145/3219104.3219115},
publisher = {ACM},
city = {New York, NY, USA},
series = {PEARC '18},
id = {e1170dc2-8029-3f4d-a796-d6e2b44f8a45},
created = {2019-10-01T17:20:38.579Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:20:38.579Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Fischer:2018:MPA:3219104.3219115},
source_type = {inproceedings},
private_publication = {false},
abstract = {There are numerous domains of science that have been using high performance computing (HPC) systems for decades. Historically, when new HPC resources are introduced, specific variations may require researchers to make minor adjustments to their workflows but the general usage and expectations remain much the same. This consistency means that domain scientists can generally move from system to system as necessary and as new resources come online, they can be fairly easily adopted by these researchers. However, as novel resources, such as cloud computing systems, become available, additional work may be required in order to help researchers find and use the resource. When the goal of a system's funding and deployment is to find non-traditional research groups that have been under-served by the national cyberinfrastructure, a different approach to system adoption and training is required. When Jetstream was funded by the NSF as the first production research cloud, it became clear that to attract non-traditional or under-served researchers, a very proactive approach would be required. Here we show how the Jetstream team 1) developed methods and practices for increasing awareness of the system to both traditional HPC users as well as under-served and non-traditional users of HPC systems, 2) developed training approaches which highlight the capabilities that a cloud system may offer that are different from traditional HPC systems. We also discuss areas of success and failure, and plans for future efforts.},
bibtype = {inproceedings},
author = {Fischer, Jeremy and Beck, Brian W and Sudarshan, Sanjana and Turner, George and Snapp-Childs, Winona and Stewart, Craig A and Hancock, David Y},
doi = {10.1145/3219104.3219115},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing (PEARC '18)}
}
There are numerous domains of science that have been using high performance computing (HPC) systems for decades. Historically, when new HPC resources are introduced, specific variations may require researchers to make minor adjustments to their workflows but the general usage and expectations remain much the same. This consistency means that domain scientists can generally move from system to system as necessary and as new resources come online, they can be fairly easily adopted by these researchers. However, as novel resources, such as cloud computing systems, become available, additional work may be required in order to help researchers find and use the resource. When the goal of a system's funding and deployment is to find non-traditional research groups that have been under-served by the national cyberinfrastructure, a different approach to system adoption and training is required. When Jetstream was funded by the NSF as the first production research cloud, it became clear that to attract non-traditional or under-served researchers, a very proactive approach would be required. Here we show how the Jetstream team 1) developed methods and practices for increasing awareness of the system to both traditional HPC users as well as under-served and non-traditional users of HPC systems, 2) developed training approaches which highlight the capabilities that a cloud system may offer that are different from traditional HPC systems. We also discuss areas of success and failure, and plans for future efforts.
XD Metrics on Demand Value Analytics: Visualizing the Impact of Internal Information Technology Investments on External Funding, Publications, and Collaboration Networks.
Scrivner, O.; Singh, G.; Bouchard, S., E.; Hutcheson, S., C.; Fulton, B.; Link, M., R.; and Börner, K.
Frontiers in Research Metrics and Analytics, 2. 1 2018.
Paper
Website
doi
link
bibtex
abstract
@article{
title = {XD Metrics on Demand Value Analytics: Visualizing the Impact of Internal Information Technology Investments on External Funding, Publications, and Collaboration Networks},
type = {article},
year = {2018},
volume = {2},
websites = {http://journal.frontiersin.org/article/10.3389/frma.2017.00010/full},
month = {1},
publisher = {Frontiers Media SA},
day = {29},
id = {251d5ed4-c9ec-381d-b9b4-8db9f0f35c1c},
created = {2019-10-01T17:20:40.312Z},
accessed = {2019-08-26},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:28.198Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Scrivner2018},
private_publication = {false},
abstract = {Many universities invest substantial resources in the design, deployment, and maintenance of campus-based cyberinfrastructure. To justify the expense, it is important that university administrators and others understand and communicate the value of these internal investments in terms of scholarly impact as measured by external funding, publications, and research collaborations. This paper introduces two visualizations and their usage in the Value Analytics (VA) module for Open XD Metrics on Demand (XDMoD). The VA module was developed by Indiana University’s (IU) Research Technologies division in conjunction with IU’s Cyberinfrastructure for Network Science Center (CNS) and the University at Buffalo’s Center for Computational Research (CCR). It interrelates quantitative measures of information technology (IT) usage, external funding, and publications in support of IT strategic decision making. This paper details the data, analysis workflows, and visual mappings used in the two VA visualizations that aim to communicate the value of different IT usage in terms of NSF and NIH funding, resulting publications, and associated research collaborations. To illustrate the feasibility of measuring IT values on research, we measured its financial and academic impact from the period between 2012 and 2017. The financial return on investment (ROI) is measured in terms of the funding, totaling $ 21,016,055 for NIH and NSF projects, and the academic ROI constitutes 1,531 NIH and NSF awards and 968 publications associated with 83 NSF and NIH awards. In addition, the results show that Medical Specialties, Brain Research, and Infectious Diseases are the top three scientific disciplines ranked by their publication records during the given time period.},
bibtype = {article},
author = {Scrivner, Olga and Singh, Gagandeep and Bouchard, Sara E. and Hutcheson, Scott C. and Fulton, Ben and Link, Matthew R. and Börner, Katy},
doi = {10.3389/frma.2017.00010},
journal = {Frontiers in Research Metrics and Analytics}
}
Many universities invest substantial resources in the design, deployment, and maintenance of campus-based cyberinfrastructure. To justify the expense, it is important that university administrators and others understand and communicate the value of these internal investments in terms of scholarly impact as measured by external funding, publications, and research collaborations. This paper introduces two visualizations and their usage in the Value Analytics (VA) module for Open XD Metrics on Demand (XDMoD). The VA module was developed by Indiana University’s (IU) Research Technologies division in conjunction with IU’s Cyberinfrastructure for Network Science Center (CNS) and the University at Buffalo’s Center for Computational Research (CCR). It interrelates quantitative measures of information technology (IT) usage, external funding, and publications in support of IT strategic decision making. This paper details the data, analysis workflows, and visual mappings used in the two VA visualizations that aim to communicate the value of different IT usage in terms of NSF and NIH funding, resulting publications, and associated research collaborations. To illustrate the feasibility of measuring IT values on research, we measured its financial and academic impact from the period between 2012 and 2017. The financial return on investment (ROI) is measured in terms of the funding, totaling $ 21,016,055 for NIH and NSF projects, and the academic ROI constitutes 1,531 NIH and NSF awards and 968 publications associated with 83 NSF and NIH awards. In addition, the results show that Medical Specialties, Brain Research, and Infectious Diseases are the top three scientific disciplines ranked by their publication records during the given time period.
A Computational Notebook Approach to Large-scale Text Analysis.
Ruan, G.; Gniady, T.; Kloster, D.; Wernert, E.; and Tuna, E.
In
Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18, of
PEARC '18, pages 1-8, 2018. ACM Press
Website
doi
link
bibtex
abstract
@inproceedings{
title = {A Computational Notebook Approach to Large-scale Text Analysis},
type = {inproceedings},
year = {2018},
keywords = {HPC,Spark,computational notebook,interactive analysis,scalability,text analysis},
pages = {1-8},
websites = {http://doi.acm.org/10.1145/3219104.3219153,http://dl.acm.org/citation.cfm?doid=3219104.3219153},
publisher = {ACM Press},
city = {New York, New York, USA},
series = {PEARC '18},
id = {eaa29482-e732-3483-96e7-45c85711cd2e},
created = {2019-10-01T17:20:43.081Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:20:43.081Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Ruan:2018:CNA:3219104.3219153},
source_type = {inproceedings},
private_publication = {false},
abstract = {Large-scale text analysis algorithms are important to many fields as they interrogate reams of textual data to extract evidence, correlations, and trends not readily discoverable by a human reader. Unfortunately, there is often an expertise mismatch between computational researchers who have the technical and programming skills necessary to develop workflows at scale and domain scholars who have knowledge of the literary, historical, scientific, or social factors that can affect data as it is manipulated. Our work focuses on the use of scalable computational notebooks as a model to bridge the accessibility gap for domain scholars, putting the power of HPC resources directly in the hands of the researchers who have scholarly questions. The computational notebook approach offers many benefits, including: fine-grained control through modularized functions, interactive analysis that puts the "human in the loop", scalable analysis that leverages Spark-as-a-Service, and complexity hiding interfaces that minimize the need for HPC expertise. In addition, the notebook approach makes it easy to share, reproduce, and sustain research workflows. We illustrate the applicability of our approach with usage scenarios on HPC systems as well as within a restricted computing environment to access sensitive, in-copyright data, and demonstrate the usefulness of the notebook approach with three examples from three different domains and data sources. These sources include historical topic trends in ten thousand scientific articles, sentiment analysis of tweets, and literary analysis of the copyrighted works of Kurt Vonnegut using non-consumptive techniques.},
bibtype = {inproceedings},
author = {Ruan, Guangchen and Gniady, Tassie and Kloster, David and Wernert, Eric and Tuna, Esen},
doi = {10.1145/3219104.3219153},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18}
}
Large-scale text analysis algorithms are important to many fields as they interrogate reams of textual data to extract evidence, correlations, and trends not readily discoverable by a human reader. Unfortunately, there is often an expertise mismatch between computational researchers who have the technical and programming skills necessary to develop workflows at scale and domain scholars who have knowledge of the literary, historical, scientific, or social factors that can affect data as it is manipulated. Our work focuses on the use of scalable computational notebooks as a model to bridge the accessibility gap for domain scholars, putting the power of HPC resources directly in the hands of the researchers who have scholarly questions. The computational notebook approach offers many benefits, including: fine-grained control through modularized functions, interactive analysis that puts the "human in the loop", scalable analysis that leverages Spark-as-a-Service, and complexity hiding interfaces that minimize the need for HPC expertise. In addition, the notebook approach makes it easy to share, reproduce, and sustain research workflows. We illustrate the applicability of our approach with usage scenarios on HPC systems as well as within a restricted computing environment to access sensitive, in-copyright data, and demonstrate the usefulness of the notebook approach with three examples from three different domains and data sources. These sources include historical topic trends in ten thousand scientific articles, sentiment analysis of tweets, and literary analysis of the copyrighted works of Kurt Vonnegut using non-consumptive techniques.
Toward sustainable deployment of distributed services on the cloud: dockerized ODI-PPA on Jetstream.
Bao, Y.; Gopu, A.; Perigo, R.; and Young, M., D.
In
SPIE ASTRONOMICAL TELESCOPES + INSTRUMENTATION 10-15 June 2018 Austin, Texas, United States, pages 108, 7 2018. SPIE-Intl Soc Optical Eng
Paper
doi
link
bibtex
abstract
@inproceedings{
title = {Toward sustainable deployment of distributed services on the cloud: dockerized ODI-PPA on Jetstream},
type = {inproceedings},
year = {2018},
pages = {108},
month = {7},
publisher = {SPIE-Intl Soc Optical Eng},
day = {6},
id = {7ac34819-1928-3a63-90f7-91b70849f28e},
created = {2019-10-01T17:20:43.947Z},
accessed = {2019-08-19},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:28.053Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {false},
hidden = {false},
citation_key = {Bao2018},
private_publication = {false},
abstract = {The One Degree Imager - Portal, Pipeline and Archive (ODI-PPA) - a mature and fully developed product - has been a workhorse for astronomers observing on the WIYN ODI. It not only provides access to data stored in a secure archive, it also has a rich search and visualization interface, as well as integrated pipeline capabilities connected with supercomputers at Indiana University in a manner transparent to the user. As part of our ongoing sustainability review process, and given the increasing age of the ODI-PPA codebase, we have considered various approaches to modernization. While industry currently trends toward Node.js based architectures, we concluded that porting an entire legacy PHP and Python-based system like ODI-PPA with its complex and distributed service stack would require too significant an amount of human development/testing/deployment hours. Aging deployment hardware with tight budgets is another issue we identified, a common one especially when deploying complex distributed service stacks. In this paper, we present DockStream (https://jsportal.odi.iu.edu), an elegant solution that addresses both of the aforementioned issues. Using ODI-PPA as a case study, we present a proof of concept solution combining a suite of Docker containers built for each PPA service and a mechanism to acquire cost-free computational and storage resources. The dockerized ODI-PPA services can be deployed on one Dockerenabled host or several depending on the availability of hardware resources and the expected levels of use. In this paper, we describe the process of designing, creating, and deploying such custom containers. The NSF-funded Jetstream led by the Indiana University Pervasive Technology Institute (PTI), provides cloud-based, on-demand computing and data analysis resources, and a pathway to tackle the issue of insufficient hardware refreshment funds. We briefly describe the process to acquiring computational and storage resources on Jetstream, and the use of the Atmosphere web interface to create and maintain virtual machines on Jetstream. Finally, we present a summary of security refinements to a dockerized service stack on the cloud using nginx, custom docker networks, and Linux firewalls that significant decrease the risk of security vulnerabilities and incidents while improving scalability.},
bibtype = {inproceedings},
author = {Bao, Yuanzhi and Gopu, Arvind and Perigo, Raymond and Young, Michael D.},
doi = {10.1117/12.2313647},
booktitle = {SPIE ASTRONOMICAL TELESCOPES + INSTRUMENTATION 10-15 June 2018 Austin, Texas, United States}
}
The One Degree Imager - Portal, Pipeline and Archive (ODI-PPA) - a mature and fully developed product - has been a workhorse for astronomers observing on the WIYN ODI. It not only provides access to data stored in a secure archive, it also has a rich search and visualization interface, as well as integrated pipeline capabilities connected with supercomputers at Indiana University in a manner transparent to the user. As part of our ongoing sustainability review process, and given the increasing age of the ODI-PPA codebase, we have considered various approaches to modernization. While industry currently trends toward Node.js based architectures, we concluded that porting an entire legacy PHP and Python-based system like ODI-PPA with its complex and distributed service stack would require too significant an amount of human development/testing/deployment hours. Aging deployment hardware with tight budgets is another issue we identified, a common one especially when deploying complex distributed service stacks. In this paper, we present DockStream (https://jsportal.odi.iu.edu), an elegant solution that addresses both of the aforementioned issues. Using ODI-PPA as a case study, we present a proof of concept solution combining a suite of Docker containers built for each PPA service and a mechanism to acquire cost-free computational and storage resources. The dockerized ODI-PPA services can be deployed on one Dockerenabled host or several depending on the availability of hardware resources and the expected levels of use. In this paper, we describe the process of designing, creating, and deploying such custom containers. The NSF-funded Jetstream led by the Indiana University Pervasive Technology Institute (PTI), provides cloud-based, on-demand computing and data analysis resources, and a pathway to tackle the issue of insufficient hardware refreshment funds. We briefly describe the process to acquiring computational and storage resources on Jetstream, and the use of the Atmosphere web interface to create and maintain virtual machines on Jetstream. Finally, we present a summary of security refinements to a dockerized service stack on the cloud using nginx, custom docker networks, and Linux firewalls that significant decrease the risk of security vulnerabilities and incidents while improving scalability.
Data Capsule Appliance for Restricted Data in Libraries.
Withana, S.; Kouper, I.; and Plale, B., A.
In
Workshop on Cyberinfrastructure and Machine Learning for Digital Libraries and Archives, in conjunction with Joint Conference on Digital Libraries 2018, 2018.
Website
link
bibtex
@inproceedings{
title = {Data Capsule Appliance for Restricted Data in Libraries},
type = {inproceedings},
year = {2018},
websites = {https://www.tacc.utexas.edu/documents/1084364/1627230/06_Plale-DC-IMLS-CMD18.pdf/792ac021-b8b8-432d-aceb-4a8d8c9a6dac},
city = {Fort Worth, TX},
id = {5110758a-7762-3ad0-b481-57371ba7784a},
created = {2019-10-01T17:20:46.224Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:20:46.224Z},
read = {true},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Withana2018},
private_publication = {false},
bibtype = {inproceedings},
author = {Withana, Sachith and Kouper, Inna and Plale, Beth A.},
booktitle = {Workshop on Cyberinfrastructure and Machine Learning for Digital Libraries and Archives, in conjunction with Joint Conference on Digital Libraries 2018}
}
Subject headings and beyond: Mapping the HathiTrust Digital Library content for wider use.
Edelblute, T.; Zoss, A.; and Kouper, I.
In
HathiTrust Research Center UnCamp 2018, 2018.
Website
link
bibtex
@inproceedings{
title = {Subject headings and beyond: Mapping the HathiTrust Digital Library content for wider use},
type = {inproceedings},
year = {2018},
websites = {https://osf.io/ak9u8/},
city = {Berkeley, CA},
id = {46cd618a-8ea7-34fe-a19d-961d9255b34b},
created = {2019-10-01T17:20:51.873Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:20:51.873Z},
read = {true},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Edelblute2018},
private_publication = {false},
bibtype = {inproceedings},
author = {Edelblute, Trevor and Zoss, Angela and Kouper, Inna},
booktitle = {HathiTrust Research Center UnCamp 2018}
}
Rice Galaxy: an open resource for plant science.
Juanillas, V., M., J.; Dereeper, A.; Beaume, N.; Droc, G.; Dizon, J.; Mendoza, J., R.; Perdon, J., P.; Mansueto, L.; Triplett, L.; and Lang, J.
bioRxiv,358754. 2018.
link
bibtex
@article{
title = {Rice Galaxy: an open resource for plant science},
type = {article},
year = {2018},
pages = {358754},
publisher = {Cold Spring Harbor Laboratory},
id = {256d5614-4bfa-37b2-9be3-569521c3e09f},
created = {2019-10-01T17:20:52.123Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:33.942Z},
read = {true},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Juanillas2018},
source_type = {JOUR},
private_publication = {false},
bibtype = {article},
author = {Juanillas, Venice Margarette J and Dereeper, Alexis and Beaume, Nicolas and Droc, Gaetan and Dizon, Joshua and Mendoza, John Robert and Perdon, Jon Peter and Mansueto, Locedie and Triplett, Lindsay and Lang, Jillian},
journal = {bioRxiv}
}
Narrative visualization.
Kouper, I.
In
2018 Midwest Big Data Summer School, 2018.
Website
link
bibtex
@inproceedings{
title = {Narrative visualization},
type = {inproceedings},
year = {2018},
websites = {https://www.researchgate.net/publication/325271444_Narrative_Visualization},
city = {Ames, Iowa},
id = {0e727d8b-c226-36eb-9e8a-4b3b120c6755},
created = {2019-10-01T17:20:52.574Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:20:52.574Z},
read = {true},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Kouper2018},
private_publication = {false},
bibtype = {inproceedings},
author = {Kouper, Inna},
booktitle = {2018 Midwest Big Data Summer School}
}
Restricted data types used in secure computing environments.
Kouper, I.; and Mitchell, E.
In
HathiTrust Research Center UnCamp 2018, 2018.
link
bibtex
@inproceedings{
title = {Restricted data types used in secure computing environments},
type = {inproceedings},
year = {2018},
id = {253bf04b-6c9c-34d4-8349-b362f2b472cf},
created = {2019-10-01T17:20:52.807Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:20:52.807Z},
read = {true},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Kouper2018a},
private_publication = {false},
bibtype = {inproceedings},
author = {Kouper, Inna and Mitchell, Erik},
booktitle = {HathiTrust Research Center UnCamp 2018}
}
Structure of pion photoproduction amplitudes.
Mathieu, V.; Nys, J.; Fernández-Ramírez, C.; Blin, A., H.; Jackura, A.; Pilloni, A.; Szczepaniak, A.; and Fox, G.
Physical Review D, 98(1): 014041. 2018.
Website
doi
link
bibtex
@article{
title = {Structure of pion photoproduction amplitudes},
type = {article},
year = {2018},
keywords = {doi:10.1103/PhysRevD.98.014041 url:https://doi.org},
pages = {014041},
volume = {98},
websites = {https://link.aps.org/doi/10.1103/PhysRevD.98.014041},
publisher = {American Physical Society},
id = {b8e6deae-84b9-342a-8fd9-f25d1b4f9a7e},
created = {2019-10-01T17:20:53.992Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:32.833Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Mathieu2018a},
private_publication = {false},
bibtype = {article},
author = {Mathieu, V. and Nys, J. and Fernández-Ramírez, C. and Blin, A. N. Hiller and Jackura, A. and Pilloni, A. and Szczepaniak, A. P. and Fox, G.},
doi = {10.1103/PhysRevD.98.014041},
journal = {Physical Review D},
number = {1}
}
Machine Learning for Parameter Auto-tuning in Molecular Dynamics Simulations: Efficient Dynamics of Ions near Polarizable Nanoparticles.
Kadupitiya, J.; Fox, G., C.; and Jadhao, V.
Technical Report 2018.
Paper
Website
doi
link
bibtex
abstract
@techreport{
title = {Machine Learning for Parameter Auto-tuning in Molecular Dynamics Simulations: Efficient Dynamics of Ions near Polarizable Nanoparticles},
type = {techreport},
year = {2018},
keywords = {Auto-tuning,Energy Minimization,Hybrid MPI/OpenMP,Machine Learning,Nanoscale Simulations,Parallel Computing},
pages = {15},
websites = {www.sagepub.com/,http://dsc.soic.indiana.edu/publications/Manuscript.IJHPCA.Nov2018.pdf},
id = {93e36f16-de3c-38ac-8569-20194aa0e1dd},
created = {2019-10-01T17:20:54.076Z},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:32.762Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Kadupitiya2018},
private_publication = {false},
abstract = {Simulating the dynamics of ions near polarizable nanoparticles (NPs) is extremely challenging due to the need to solve the Poisson equation at every simulation timestep. Recently, a molecular dynamics (MD) method based on a dynamical optimization framework bypassed this obstacle by representing the polarization charge density as virtual dynamic variables, and evolving them in parallel with the physical dynamics of ions. We highlight the computational gains accessible with the integration of machine learning (ML) methods for parameter prediction in MD simulations by demonstrating how they were realized in MD simulations of ions near polarizable NPs. An artificial neural network based regression model was integrated with MD and predicted the optimal simulation timestep and critical parameters characterizing the virtual system on-the-fly with 94.3% success. The integration of ML method with hybrid OpenMP/MPI parallelized MD simulations generated accurate dynamics of thousands of ions in the presence of polarizable NPs for over 10 million steps (with a maximum simulated physical time over 30 ns) while reducing the computational time from thousands of hours to tens of hours yielding a maximum speedup of ≈ 3 from ML-only acceleration and a maximum overall speedup of ≈ 600 from ML-hybrid Open/MPI combined method. Extraction of ionic structure in concentrated electrolytes near oil-water emulsions demonstrates the success of the method. The approach can be generalized to select optimal parameters in other molecular dynamics applications and energy minimization problems.},
bibtype = {techreport},
author = {Kadupitiya, Jcs and Fox, Geoffrey C and Jadhao, Vikram},
doi = {10.1177/ToBeAssigned}
}
Simulating the dynamics of ions near polarizable nanoparticles (NPs) is extremely challenging due to the need to solve the Poisson equation at every simulation timestep. Recently, a molecular dynamics (MD) method based on a dynamical optimization framework bypassed this obstacle by representing the polarization charge density as virtual dynamic variables, and evolving them in parallel with the physical dynamics of ions. We highlight the computational gains accessible with the integration of machine learning (ML) methods for parameter prediction in MD simulations by demonstrating how they were realized in MD simulations of ions near polarizable NPs. An artificial neural network based regression model was integrated with MD and predicted the optimal simulation timestep and critical parameters characterizing the virtual system on-the-fly with 94.3% success. The integration of ML method with hybrid OpenMP/MPI parallelized MD simulations generated accurate dynamics of thousands of ions in the presence of polarizable NPs for over 10 million steps (with a maximum simulated physical time over 30 ns) while reducing the computational time from thousands of hours to tens of hours yielding a maximum speedup of ≈ 3 from ML-only acceleration and a maximum overall speedup of ≈ 600 from ML-hybrid Open/MPI combined method. Extraction of ionic structure in concentrated electrolytes near oil-water emulsions demonstrates the success of the method. The approach can be generalized to select optimal parameters in other molecular dynamics applications and energy minimization problems.
Vector meson photoproduction with a linearly polarized beam.
Mathieu, V.; Nys, J.; Fernández-Ramírez, C.; Jackura, A.; Pilloni, A.; Sherrill, N.; Szczepaniak, A., P.; and Fox, G.
Physical Review D, 97(9). 2018.
Website
doi
link
bibtex
abstract
@article{
title = {Vector meson photoproduction with a linearly polarized beam},
type = {article},
year = {2018},
volume = {97},
websites = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85049067436&doi=10.1103%2FPhysRevD.97.094003&partnerID=40&md5=c3e8c7a8d17dcaf2f3e6e356802ef69c},
publisher = {American Physical Society},
id = {fa9c47f0-8641-384e-bdcd-40a289fea8f3},
created = {2019-10-01T17:20:54.176Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:20:54.176Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Mathieu2018},
source_type = {article},
notes = {cited By 0},
private_publication = {false},
abstract = {We propose a model based on Regge theory to describe photoproduction of light vector mesons. We fit the SLAC data and make predictions for the energy and momentum-transfer dependence of the spin-density matrix elements in photoproduction of ω, ρ0 and φ mesons at Eγ∼8.5 GeV, which are soon to be measured at Jefferson Lab. © 2018 authors. Published by the American Physical Society. Published by the American Physical Society under the terms of the »https://creativecommons.org/licenses/by/4.0/» Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI. Funded by SCOAP3.},
bibtype = {article},
author = {Mathieu, V and Nys, J and Fernández-Ramírez, C and Jackura, A and Pilloni, A and Sherrill, N and Szczepaniak, A P and Fox, G},
doi = {10.1103/PhysRevD.97.094003},
journal = {Physical Review D},
number = {9}
}
We propose a model based on Regge theory to describe photoproduction of light vector mesons. We fit the SLAC data and make predictions for the energy and momentum-transfer dependence of the spin-density matrix elements in photoproduction of ω, ρ0 and φ mesons at Eγ∼8.5 GeV, which are soon to be measured at Jefferson Lab. © 2018 authors. Published by the American Physical Society. Published by the American Physical Society under the terms of the »https://creativecommons.org/licenses/by/4.0/» Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI. Funded by SCOAP3.
Crossover analysis and automated layer-tracking assessment of the extracted DEM of the basal topography of the canadian arctic archipelago ice-cap.
Al-Ibadi, M.; Sprick, J.; Athinarapu, S.; Berger, V.; Stumpf, T.; Paden, J.; Leuschen, C.; Rodriguez, F.; Xu, M.; Crandall, D.; Fox, G.; Burgess, D.; Sharp, M.; Copland, L.; and Van Wychen, W.
2018 IEEE Radar Conference, RadarConf 2018,862-867. 2018.
doi
link
bibtex
abstract
@article{
title = {Crossover analysis and automated layer-tracking assessment of the extracted DEM of the basal topography of the canadian arctic archipelago ice-cap},
type = {article},
year = {2018},
keywords = {DEM,SAR,Synthetic aperture radar imaging,ice,ice-bottom tracking,tomography},
pages = {862-867},
id = {16f65691-dd6e-33eb-ab4a-7fc604924405},
created = {2019-10-01T17:20:54.553Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:33.251Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Al-Ibadi2018},
private_publication = {false},
abstract = {© 2018 IEEE. In 2014, as part of the NASA Operation IceBridge project, the Center for Remote Sensing of Ice Sheets operated a multi-beam synthetic aperture radar depth sounder/imager over the Canadian Arctic Archipelago (CAA) to generate digital elevation models (DEMs) of the glacial basal topography. In this work, we briefly describe the processing steps that led to the generation of these DEMs, algorithm improvements over previously published results, and assess the results from two different perspectives. First, we evaluate the self-consistency of the DEMs where flight paths cross over each other and two measurements are made at the same location. Secondly, we compare the quality of the outputs of the ice-bottom tracker before and after applying manual corrections to the tracker results; the tracker is an algorithm that we implemented to automatically track the ice-bottom. Even though the CAA ice-caps are mountainous areas, where the scenes often have ice and no ice regions, which makes the imaging complicated, the statistical results show good tracking performance and a good match between the overlapped DEMs, where the mean error of the crossover DEMs is 37±9 m.},
bibtype = {article},
author = {Al-Ibadi, Mohanad and Sprick, Jordan and Athinarapu, Sravya and Berger, Victor and Stumpf, Theresa and Paden, John and Leuschen, Carl and Rodriguez, Fernando and Xu, Mingze and Crandall, David and Fox, Geoffrey and Burgess, David and Sharp, Martin and Copland, Luke and Van Wychen, Wesley},
doi = {10.1109/RADAR.2018.8378673},
journal = {2018 IEEE Radar Conference, RadarConf 2018}
}
© 2018 IEEE. In 2014, as part of the NASA Operation IceBridge project, the Center for Remote Sensing of Ice Sheets operated a multi-beam synthetic aperture radar depth sounder/imager over the Canadian Arctic Archipelago (CAA) to generate digital elevation models (DEMs) of the glacial basal topography. In this work, we briefly describe the processing steps that led to the generation of these DEMs, algorithm improvements over previously published results, and assess the results from two different perspectives. First, we evaluate the self-consistency of the DEMs where flight paths cross over each other and two measurements are made at the same location. Secondly, we compare the quality of the outputs of the ice-bottom tracker before and after applying manual corrections to the tracker results; the tracker is an algorithm that we implemented to automatically track the ice-bottom. Even though the CAA ice-caps are mountainous areas, where the scenes often have ice and no ice regions, which makes the imaging complicated, the statistical results show good tracking performance and a good match between the overlapped DEMs, where the mean error of the crossover DEMs is 37±9 m.
Object Detection by a Super-Resolution Method and a Convolutional Neural Networks.
Na, B.; and Fox, G., C.
In
2018 IEEE International Conference on Big Data, Big Data, pages 2263-2269, 1 2018. Institute of Electrical and Electronics Engineers Inc.
Paper
doi
link
bibtex
abstract
@inproceedings{
title = {Object Detection by a Super-Resolution Method and a Convolutional Neural Networks},
type = {inproceedings},
year = {2018},
keywords = {CNN,convolution neural networks,deep learning,machine learning,object detection,super-resolution},
pages = {2263-2269},
month = {1},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
day = {22},
id = {452c5a40-913a-3376-9eea-672542e6d6d6},
created = {2019-10-01T17:20:55.050Z},
accessed = {2019-08-21},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:33.323Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Na2018},
private_publication = {false},
abstract = {Recently with many blurless or slightly blurred images, convolutional neural networks classify objects with around 90 percent classification rates, even if there are variable sized images. However, small object regions or cropping of images make object detection or classification difficult and decreases the detection rates. In many methods related to convolutional neural network (CNN), Bilinear or Bicubic algorithms are popularly used to interpolate region of interests. To overcome the limitations of these algorithms, we introduce a super-resolution method applied to the cropped regions or candidates, and this leads to improve recognition rates for object detection and classification. Large object candidates comparable in size of the full image have good results for object detections using many popular conventional methods. However, for smaller region candidates, using our super-resolution preprocessing and region candidates, allows a CNN to outperform conventional methods in the number of detected objects when tested on the VOC2007 and MSO datasets.},
bibtype = {inproceedings},
author = {Na, Bokyoon and Fox, Geoffrey C.},
doi = {10.1109/BigData.2018.8622135},
booktitle = {2018 IEEE International Conference on Big Data, Big Data}
}
Recently with many blurless or slightly blurred images, convolutional neural networks classify objects with around 90 percent classification rates, even if there are variable sized images. However, small object regions or cropping of images make object detection or classification difficult and decreases the detection rates. In many methods related to convolutional neural network (CNN), Bilinear or Bicubic algorithms are popularly used to interpolate region of interests. To overcome the limitations of these algorithms, we introduce a super-resolution method applied to the cropped regions or candidates, and this leads to improve recognition rates for object detection and classification. Large object candidates comparable in size of the full image have good results for object detections using many popular conventional methods. However, for smaller region candidates, using our super-resolution preprocessing and region candidates, allows a CNN to outperform conventional methods in the number of detected objects when tested on the VOC2007 and MSO datasets.
Evaluating the scientific impact of XSEDE.
Wang, F.; Fox, G., C.; Von Laszewski, G.; Furlani, T., R.; Gallo, S., M.; Whitson, T.; and DeLeon, R., L.
In
Proceedings of the Practice and Experience on Advanced Research Computing (PEARC '18), 7 2018. Association for Computing Machinery
Paper
doi
link
bibtex
abstract
@inproceedings{
title = {Evaluating the scientific impact of XSEDE},
type = {inproceedings},
year = {2018},
keywords = {Bibliometrics,H-index,Scientific impact,Technology Audit Service,XDMoD,XSEDE},
month = {7},
publisher = {Association for Computing Machinery},
day = {22},
id = {20b72ad4-d5bb-362b-93c0-b0800fbf4f77},
created = {2019-10-01T17:20:55.350Z},
accessed = {2019-09-03},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:33.270Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {false},
hidden = {false},
citation_key = {Wang2018},
private_publication = {false},
abstract = {We use the bibliometrics approach to evaluate the scientific impact of XSEDE. By utilizing publication data from various sources, e.g., ISI Web of Science and Microsoft Academic Graph, we calculate the impact metrics of XSEDE publications and show how they compare with non-XSEDE publication from the same field of study, or non-XSEDE peers from the same journal issue. We explain the dataset and data soruces involved and how we retrieved, cleaned, and curated millions of related publication entries. We then introduce the metrics we used for evaluation and comparison, and the methods used to calculate them. Detailed analysis results of Field Weighted Citation Impact (FWCI) and the peers comparison will be presented and discussed. We also explain how the same approaches could be used to evaluate publications from a similar organization or institute, to demonstrate the general applicability of the present evaluation approach providing impact even beyond XSEDE.},
bibtype = {inproceedings},
author = {Wang, Fugang and Fox, Geoffrey C. and Von Laszewski, Gregor and Furlani, Thomas R. and Gallo, Steven M. and Whitson, Timothy and DeLeon, Robert L.},
doi = {10.1145/3219104.3219124},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing (PEARC '18)}
}
We use the bibliometrics approach to evaluate the scientific impact of XSEDE. By utilizing publication data from various sources, e.g., ISI Web of Science and Microsoft Academic Graph, we calculate the impact metrics of XSEDE publications and show how they compare with non-XSEDE publication from the same field of study, or non-XSEDE peers from the same journal issue. We explain the dataset and data soruces involved and how we retrieved, cleaned, and curated millions of related publication entries. We then introduce the metrics we used for evaluation and comparison, and the methods used to calculate them. Detailed analysis results of Field Weighted Citation Impact (FWCI) and the peers comparison will be presented and discussed. We also explain how the same approaches could be used to evaluate publications from a similar organization or institute, to demonstrate the general applicability of the present evaluation approach providing impact even beyond XSEDE.
Evaluation of Production Serverless Computing Environments.
Lee, H.; Satyam, K.; and Fox, G.
In
IEEE International Conference on Cloud Computing, CLOUD, volume 2018-July, pages 442-450, 9 2018. IEEE Computer Society
Paper
doi
link
bibtex
abstract
@inproceedings{
title = {Evaluation of Production Serverless Computing Environments},
type = {inproceedings},
year = {2018},
keywords = {Amazon Lambda,Event-driven Computing,FaaS,Google Functions,IBM OpenWhisk,Microsoft Azure Functions,Serverless},
pages = {442-450},
volume = {2018-July},
month = {9},
publisher = {IEEE Computer Society},
day = {7},
id = {24abffda-00d9-3409-b1ce-c6f2eb1f4a07},
created = {2019-10-01T17:20:55.549Z},
accessed = {2019-09-04},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:33.536Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Lee2018a},
private_publication = {false},
abstract = {Serverless computing provides a small runtime container to execute lines of codes without infrastructure management which is similar to Platform as a Service (PaaS) but a functional level. Amazon started the event-driven compute named Lambda functions in 2014 with a 25 concurrent limitation, but it now supports at least a thousand of concurrent invocation to process event messages generated by resources like databases, storage and system logs. Other providers, i.e., Google, Microsoft, and IBM offer a dynamic scaling manager to handle parallel requests of stateless functions in which additional containers are provisioning on new compute nodes for distribution. However, while functions are often developed for microservices and lightweight workload, they are associated with distributed data processing using the concurrent invocations. We claim that the current serverless computing environments can support dynamic applications in parallel when a partitioned task is executable on a small function instance. We present results of throughput, network bandwidth, a file I/O and compute performance regarding the concurrent invocations. We deployed a series of functions for distributed data processing to address the elasticity and then demonstrated the differences between serverless computing and virtual machines for cost efficiency and resource utilization.},
bibtype = {inproceedings},
author = {Lee, Hyungro and Satyam, Kumar and Fox, Geoffrey},
doi = {10.1109/CLOUD.2018.00062},
booktitle = {IEEE International Conference on Cloud Computing, CLOUD}
}
Serverless computing provides a small runtime container to execute lines of codes without infrastructure management which is similar to Platform as a Service (PaaS) but a functional level. Amazon started the event-driven compute named Lambda functions in 2014 with a 25 concurrent limitation, but it now supports at least a thousand of concurrent invocation to process event messages generated by resources like databases, storage and system logs. Other providers, i.e., Google, Microsoft, and IBM offer a dynamic scaling manager to handle parallel requests of stateless functions in which additional containers are provisioning on new compute nodes for distribution. However, while functions are often developed for microservices and lightweight workload, they are associated with distributed data processing using the concurrent invocations. We claim that the current serverless computing environments can support dynamic applications in parallel when a partitioned task is executable on a small function instance. We present results of throughput, network bandwidth, a file I/O and compute performance regarding the concurrent invocations. We deployed a series of functions for distributed data processing to address the elasticity and then demonstrated the differences between serverless computing and virtual machines for cost efficiency and resource utilization.
Automated tracking of 2D and 3D ice radar imagery using Viterbi and TRW-S.
Berger, V.; Xu, M.; Chu, S.; Crandall, D.; Paden, J.; and Fox, G., C.
In
International Geoscience and Remote Sensing Symposium (IGARSS), volume 2018-July, pages 4162-4165, 10 2018. Institute of Electrical and Electronics Engineers Inc.
Paper
doi
link
bibtex
abstract
@inproceedings{
title = {Automated tracking of 2D and 3D ice radar imagery using Viterbi and TRW-S},
type = {inproceedings},
year = {2018},
keywords = {Glaciology,Ice thickness,Ice-bottom tracking,Image classification,Radar tomography},
pages = {4162-4165},
volume = {2018-July},
month = {10},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
day = {31},
id = {5011258d-5477-30ff-9625-616395a58f6e},
created = {2019-10-01T17:20:55.671Z},
accessed = {2019-09-04},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:33.744Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Berger2018},
private_publication = {false},
abstract = {We present improvements to existing implementations of the Viterbi and TRW-S algorithms applied to ice-bottom layer tracking on 2D and 3D radar imagery, respectively. Along with an explanation of our modifications and the reasoning behind them, we present a comparison between our results, the results obtained with the original implementations, and those obtained with other proposed methods of performing ice-bottom layer tracking.},
bibtype = {inproceedings},
author = {Berger, Victor and Xu, Mingze and Chu, Shane and Crandall, David and Paden, John and Fox, Geoffrey C.},
doi = {10.1109/IGARSS.2018.8519411},
booktitle = {International Geoscience and Remote Sensing Symposium (IGARSS)}
}
We present improvements to existing implementations of the Viterbi and TRW-S algorithms applied to ice-bottom layer tracking on 2D and 3D radar imagery, respectively. Along with an explanation of our modifications and the reasoning behind them, we present a comparison between our results, the results obtained with the original implementations, and those obtained with other proposed methods of performing ice-bottom layer tracking.
Finding and counting tree-like subgraphs using MapReduce.
Zhao, Z.; Chen, L.; Avram, M.; Li, M.; Wang, G.; Butt, A.; Khan, M.; Marathe, M.; Qiu, J.; and Vullikanti, A.
IEEE Transactions on Multi-Scale Computing Systems, 4(3): 217-230. 7 2018.
Website
doi
link
bibtex
abstract
@article{
title = {Finding and counting tree-like subgraphs using MapReduce},
type = {article},
year = {2018},
pages = {217-230},
volume = {4},
websites = {https://ieeexplore.ieee.org/document/8090537/},
month = {7},
day = {1},
id = {1abad963-1212-3047-a2d7-667a53ae393e},
created = {2019-10-01T17:20:57.186Z},
accessed = {2019-08-21},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:32.716Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Zhao2018},
folder_uuids = {36d8ccf4-7085-47fa-8ab9-897283d082c5,3b35931e-fb6d-48f9-8e01-87ee16ef0331},
private_publication = {false},
abstract = {IEEE Several variants of the subgraph isomorphism problem, e.g., finding, counting and estimating frequencies of subgraphs in networks arise in a number of real world applications, such as web analysis, disease diffusion prediction and social network analysis. These problems are computationally challenging in having to scale to very large networks with millions of vertices. In this paper, we present SAHAD, a MapReduce algorithm for detecting and counting trees of bounded size using the elegant color coding technique developed by N. Alon et al. SAHAD is a randomized algorithm, and we show rigorous bounds on the approximation quality and the performance of it. SAHAD scales to very large networks comprising of < formula > < tex > $10^7$ < /tex > < /formula > - < formula > < tex > $10^8$ < /tex > < /formula > edges and tree-like (acyclic) templates with up to 12 vertices. Further, we extend our results by implementing SAHAD in the Harp framework, which is more of a high performance computing environment. The new implementation gives 100x improvement in performance over the standard Hadoop implementation and achieves better performance than state-of-the-art MPI solutions on larger graphs.},
bibtype = {article},
author = {Zhao, Zhao and Chen, Langshi and Avram, Mihai and Li, Meng and Wang, Guanying and Butt, Ali and Khan, Maleq and Marathe, Madhav and Qiu, Judy and Vullikanti, Anil},
doi = {10.1109/TMSCS.2017.2768426},
journal = {IEEE Transactions on Multi-Scale Computing Systems},
number = {3}
}
IEEE Several variants of the subgraph isomorphism problem, e.g., finding, counting and estimating frequencies of subgraphs in networks arise in a number of real world applications, such as web analysis, disease diffusion prediction and social network analysis. These problems are computationally challenging in having to scale to very large networks with millions of vertices. In this paper, we present SAHAD, a MapReduce algorithm for detecting and counting trees of bounded size using the elegant color coding technique developed by N. Alon et al. SAHAD is a randomized algorithm, and we show rigorous bounds on the approximation quality and the performance of it. SAHAD scales to very large networks comprising of < formula > < tex > $10^7$ < /tex > < /formula > - < formula > < tex > $10^8$ < /tex > < /formula > edges and tree-like (acyclic) templates with up to 12 vertices. Further, we extend our results by implementing SAHAD in the Harp framework, which is more of a high performance computing environment. The new implementation gives 100x improvement in performance over the standard Hadoop implementation and achieves better performance than state-of-the-art MPI solutions on larger graphs.
Deep hybrid wavelet network for ice boundary detection in radra imagery.
Kamangir, H.; Rahnemoonfar, M.; Dobbs, D.; Paden, J.; and Fox, G.
In
International Geoscience and Remote Sensing Symposium (IGARSS), volume 2018-July, pages 3449-3452, 10 2018. Institute of Electrical and Electronics Engineers Inc.
Paper
doi
link
bibtex
abstract
@inproceedings{
title = {Deep hybrid wavelet network for ice boundary detection in radra imagery},
type = {inproceedings},
year = {2018},
keywords = {Deep learning,Holistically nested edge detection,Ice Boundary detection,Radar,Wavelet transform},
pages = {3449-3452},
volume = {2018-July},
month = {10},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
day = {31},
id = {93918f87-4ab8-37b7-8607-f8e885d01bc9},
created = {2019-10-01T17:20:57.298Z},
accessed = {2019-09-03},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:32.568Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Kamangir2018},
private_publication = {false},
abstract = {This paper proposes a deep convolutional neural network approach to detect ice surface and bottom layers from radar imagery. Radar images are capable to penetrate the ice surface and provide us with valuable information from the underlying layers of the ice surface. In recent years, deep hierarchical learning techniques for object detection and segmentation greatly improved the performance of traditional techniques based on hand-crafted feature engineering. We designed a deep convo-lutional network to produce the images of the surface and bottom ice boundary. Our network takes advantage of undecimated wavelet transform to provide the highest level of information from radar images, as well as multilayer and multi-scale optimized architecture. In this work, radar images from 2009-2016 NASA Operation IceBridge Mission are used to train and test the network. Our network outperformed the state-of-the art accuracy.},
bibtype = {inproceedings},
author = {Kamangir, Hamid and Rahnemoonfar, Maryam and Dobbs, Dugan and Paden, John and Fox, Geoffrey},
doi = {10.1109/IGARSS.2018.8518617},
booktitle = {International Geoscience and Remote Sensing Symposium (IGARSS)}
}
This paper proposes a deep convolutional neural network approach to detect ice surface and bottom layers from radar imagery. Radar images are capable to penetrate the ice surface and provide us with valuable information from the underlying layers of the ice surface. In recent years, deep hierarchical learning techniques for object detection and segmentation greatly improved the performance of traditional techniques based on hand-crafted feature engineering. We designed a deep convo-lutional network to produce the images of the surface and bottom ice boundary. Our network takes advantage of undecimated wavelet transform to provide the highest level of information from radar images, as well as multilayer and multi-scale optimized architecture. In this work, radar images from 2009-2016 NASA Operation IceBridge Mission are used to train and test the network. Our network outperformed the state-of-the art accuracy.
Big data and extreme-scale computing: Pathways to Convergence-Toward a shaping strategy for a future software and data ecosystem for scientific inquiry.
Asch, M.; Moore, T.; Badia, R.; Beck, M.; Beckman, P.; Bidot, T.; Bodin, F.; Cappello, F.; Choudhary, A.; de Supinski, B.; Deelman, E.; Dongarra, J.; Dubey, A.; Fox, G.; Fu, H.; Girona, S.; Gropp, W.; Heroux, M.; Ishikawa, Y.; Keahey, K.; Keyes, D.; Kramer, W.; Lavignon, J., F.; Lu, Y.; Matsuoka, S.; Mohr, B.; Reed, D.; Requena, S.; Saltz, J.; Schulthess, T.; Stevens, R.; Swany, M.; Szalay, A.; Tang, W.; Varoquaux, G.; Vilotte, J., P.; Wisniewski, R.; Xu, Z.; and Zacharov, I.
Volume 32 2018.
doi
link
bibtex
abstract
@book{
title = {Big data and extreme-scale computing: Pathways to Convergence-Toward a shaping strategy for a future software and data ecosystem for scientific inquiry},
type = {book},
year = {2018},
source = {International Journal of High Performance Computing Applications},
keywords = {Big data,extreme-scale computing,future software,high-end data analysis,traditional HPC},
pages = {435-479},
volume = {32},
issue = {4},
id = {5dffcc0c-bc3c-3cd7-b2bf-2cfcc04edd31},
created = {2019-10-01T17:21:01.192Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:31.903Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Asch2018},
private_publication = {false},
abstract = {© The Author(s) 2018. Over the past four years, the Big Data and Exascale Computing (BDEC) project organized a series of five international workshops that aimed to explore the ways in which the new forms of data-centric discovery introduced by the ongoing revolution in high-end data analysis (HDA) might be integrated with the established, simulation-centric paradigm of the high-performance computing (HPC) community. Based on those meetings, we argue that the rapid proliferation of digital data generators, the unprecedented growth in the volume and diversity of the data they generate, and the intense evolution of the methods for analyzing and using that data are radically reshaping the landscape of scientific computing. The most critical problems involve the logistics of wide-area, multistage workflows that will move back and forth across the computing continuum, between the multitude of distributed sensors, instruments and other devices at the networks edge, and the centralized resources of commercial clouds and HPC centers. We suggest that the prospects for the future integration of technological infrastructures and research ecosystems need to be considered at three different levels. First, we discuss the convergence of research applications and workflows that establish a research paradigm that combines both HPC and HDA, where ongoing progress is already motivating efforts at the other two levels. Second, we offer an account of some of the problems involved with creating a converged infrastructure for peripheral environments, that is, a shared infrastructure that can be deployed throughout the network in a scalable manner to meet the highly diverse requirements for processing, communication, and buffering/storage of massive data workflows of many different scientific domains. Third, we focus on some opportunities for software ecosystem convergence in big, logically centralized facilities that execute large-scale simulations and models and/or perform large-scale data analytics. We close by offering some conclusions and recommendations for future investment and policy review.},
bibtype = {book},
author = {Asch, M. and Moore, T. and Badia, R. and Beck, M. and Beckman, P. and Bidot, T. and Bodin, F. and Cappello, F. and Choudhary, A. and de Supinski, B. and Deelman, E. and Dongarra, J. and Dubey, A. and Fox, G. and Fu, H. and Girona, S. and Gropp, W. and Heroux, M. and Ishikawa, Y. and Keahey, K. and Keyes, D. and Kramer, W. and Lavignon, J. F. and Lu, Y. and Matsuoka, S. and Mohr, B. and Reed, D. and Requena, S. and Saltz, J. and Schulthess, T. and Stevens, R. and Swany, M. and Szalay, A. and Tang, W. and Varoquaux, G. and Vilotte, J. P. and Wisniewski, R. and Xu, Z. and Zacharov, I.},
doi = {10.1177/1094342018778123}
}
© The Author(s) 2018. Over the past four years, the Big Data and Exascale Computing (BDEC) project organized a series of five international workshops that aimed to explore the ways in which the new forms of data-centric discovery introduced by the ongoing revolution in high-end data analysis (HDA) might be integrated with the established, simulation-centric paradigm of the high-performance computing (HPC) community. Based on those meetings, we argue that the rapid proliferation of digital data generators, the unprecedented growth in the volume and diversity of the data they generate, and the intense evolution of the methods for analyzing and using that data are radically reshaping the landscape of scientific computing. The most critical problems involve the logistics of wide-area, multistage workflows that will move back and forth across the computing continuum, between the multitude of distributed sensors, instruments and other devices at the networks edge, and the centralized resources of commercial clouds and HPC centers. We suggest that the prospects for the future integration of technological infrastructures and research ecosystems need to be considered at three different levels. First, we discuss the convergence of research applications and workflows that establish a research paradigm that combines both HPC and HDA, where ongoing progress is already motivating efforts at the other two levels. Second, we offer an account of some of the problems involved with creating a converged infrastructure for peripheral environments, that is, a shared infrastructure that can be deployed throughout the network in a scalable manner to meet the highly diverse requirements for processing, communication, and buffering/storage of massive data workflows of many different scientific domains. Third, we focus on some opportunities for software ecosystem convergence in big, logically centralized facilities that execute large-scale simulations and models and/or perform large-scale data analytics. We close by offering some conclusions and recommendations for future investment and policy review.
Task Scheduling in Big Data - Review, Research Challenges, and Prospects.
Govindarajan, K.; Kamburugamuve, S.; Wickramasinghe, P.; Abeykoon, V.; and Fox, G.
In
2017 9th International Conference on Advanced Computing, ICoAC 2017, pages 165-173, 8 2018. Institute of Electrical and Electronics Engineers Inc.
Paper
doi
link
bibtex
abstract
1 download
@inproceedings{
title = {Task Scheduling in Big Data - Review, Research Challenges, and Prospects},
type = {inproceedings},
year = {2018},
keywords = {Big Data,Dataflow,MapReduce,Static and Dynamic Task Scheduling,Task Scheduling Model,Twister2},
pages = {165-173},
month = {8},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
day = {20},
id = {1caf2685-0aa3-3c64-a879-97271e93d71d},
created = {2019-10-01T17:21:01.239Z},
accessed = {2019-09-04},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:31.443Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Govindarajan2018},
private_publication = {false},
abstract = {—In a Big data computing, the processing of data requires a large amount of CPU cycles and network bandwidth and disk I/O. Dataflow is a programming model for processing Big data which consists of tasks organized in a graph structure. Scheduling these tasks is one of the key active research areas which mainly aims to place the tasks on available resources. It is essential to effectively schedule the tasks, in a manner that minimizes task completion time and increases utilization of resources. In recent years, various researchers have discussed and presented different task scheduling algorithms. In this research study, we have investigated the state-of-art of various types of task scheduling algorithms, scheduling considerations for batch and streaming processing, and task scheduling algorithms in the well-known open-source big data platforms. Furthermore, this study proposes a new task scheduling system to alleviate the problems persists in the existing task scheduling for big data.},
bibtype = {inproceedings},
author = {Govindarajan, Kannan and Kamburugamuve, Supun and Wickramasinghe, Pulasthi and Abeykoon, Vibhatha and Fox, Geoffrey},
doi = {10.1109/ICoAC.2017.8441494},
booktitle = {2017 9th International Conference on Advanced Computing, ICoAC 2017}
}
—In a Big data computing, the processing of data requires a large amount of CPU cycles and network bandwidth and disk I/O. Dataflow is a programming model for processing Big data which consists of tasks organized in a graph structure. Scheduling these tasks is one of the key active research areas which mainly aims to place the tasks on available resources. It is essential to effectively schedule the tasks, in a manner that minimizes task completion time and increases utilization of resources. In recent years, various researchers have discussed and presented different task scheduling algorithms. In this research study, we have investigated the state-of-art of various types of task scheduling algorithms, scheduling considerations for batch and streaming processing, and task scheduling algorithms in the well-known open-source big data platforms. Furthermore, this study proposes a new task scheduling system to alleviate the problems persists in the existing task scheduling for big data.
Features of πΔ photoproduction at high energies.
Nys, J.; Mathieu, V.; Fernández-Ramírez, C.; Jackura, A.; Mikhasenko, M.; Pilloni, A.; Sherrill, N.; Ryckebusch, J.; Szczepaniak, A., P.; Fox, G., C.; and Center, J., P., A.
Physics Letters, Section B: Nuclear, Elementary Particle and High-Energy Physics, 779: 77-81. 2018.
Website
doi
link
bibtex
abstract
@article{
title = {Features of πΔ photoproduction at high energies},
type = {article},
year = {2018},
pages = {77-81},
volume = {779},
websites = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85044396630&doi=10.1016%2Fj.physletb.2018.01.075&partnerID=40&md5=12e6f3f9ea386dbf28749cd0713aa855},
publisher = {Elsevier B.V.},
id = {d8476aea-069c-3fcb-ae84-b03c75d5bbca},
created = {2019-10-01T17:21:01.444Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:21:01.444Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Nys201877},
source_type = {article},
notes = {cited By 0},
private_publication = {false},
abstract = {Hybrid/exotic meson spectroscopy searches at Jefferson Lab require the accurate theoretical description of the production mechanism in peripheral photoproduction. We develop a model for πΔ photoproduction at high energies (5≤Elab≤16 GeV) that incorporates both the absorbed pion and natural-parity cut contributions. We fit the available observables, providing a good description of the energy and angular dependencies of the experimental data. We also provide predictions for the photon beam asymmetry of charged pions at Elab=9 GeV which is expected to be measured by GlueX and CLAS12 experiments in the near future. © 2018 The Author},
bibtype = {article},
author = {Nys, J and Mathieu, V and Fernández-Ramírez, C and Jackura, A and Mikhasenko, M and Pilloni, A and Sherrill, N and Ryckebusch, J and Szczepaniak, A P and Fox, Geoffrey Charles and Center, Joint Physics Analysis},
doi = {10.1016/j.physletb.2018.01.075},
journal = {Physics Letters, Section B: Nuclear, Elementary Particle and High-Energy Physics}
}
Hybrid/exotic meson spectroscopy searches at Jefferson Lab require the accurate theoretical description of the production mechanism in peripheral photoproduction. We develop a model for πΔ photoproduction at high energies (5≤Elab≤16 GeV) that incorporates both the absorbed pion and natural-parity cut contributions. We fit the available observables, providing a good description of the energy and angular dependencies of the experimental data. We also provide predictions for the photon beam asymmetry of charged pions at Elab=9 GeV which is expected to be measured by GlueX and CLAS12 experiments in the near future. © 2018 The Author
Analyticity Constraints for Hadron Amplitudes: Going High to Heal Low Energy Issues.
Mathieu, V.; Nys, J.; Pilloni, A.; Fernández-Ramírez, C.; Jackura, A.; Mikhasenko, M.; Pauk, V.; Szczepaniak, A., P.; and Fox, G.
Europhysics Letters, 122(4): 41001-p1-p5. 2018.
Website
link
bibtex
abstract
@article{
title = {Analyticity Constraints for Hadron Amplitudes: Going High to Heal Low Energy Issues},
type = {article},
year = {2018},
pages = {41001-p1-p5},
volume = {122},
websites = {http://arxiv.org/abs/1708.07779},
id = {ab2c1e5a-8274-3bb1-adbc-3c5e90009426},
created = {2019-10-01T17:21:01.590Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:32.147Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Mathieu2018b},
private_publication = {false},
abstract = {Analyticity constitutes a rigid constraint on hadron scattering amplitudes. This property is used to relate models in different energy regimes. Using meson photoproduction as a benchmark, we show how to test contemporary low energy models directly against high energy data. This method pinpoints deficiencies of the models and treads a path to further improvement. The implementation of this technique enables one to produce more stable and reliable partial waves for future use in hadron spectroscopy and new physics searches.},
bibtype = {article},
author = {Mathieu, V. and Nys, J. and Pilloni, A. and Fernández-Ramírez, C. and Jackura, A. and Mikhasenko, M. and Pauk, V. and Szczepaniak, A. P. and Fox, G.},
journal = {Europhysics Letters},
number = {4}
}
Analyticity constitutes a rigid constraint on hadron scattering amplitudes. This property is used to relate models in different energy regimes. Using meson photoproduction as a benchmark, we show how to test contemporary low energy models directly against high energy data. This method pinpoints deficiencies of the models and treads a path to further improvement. The implementation of this technique enables one to produce more stable and reliable partial waves for future use in hadron spectroscopy and new physics searches.
Contributions to High-Performance Big Data Computing.
Fox, G.; Qiu, J.; Crandall, D.; Laszewski, G., V.; Beckstein, O.; Paden, J.; Paraskevakos, I.; Jha, S.; Wang, F.; Marathe, M.; Vullikanti, A.; and Cheatham, T.
Technical Report 2018.
Paper
Website
link
bibtex
abstract
@techreport{
title = {Contributions to High-Performance Big Data Computing},
type = {techreport},
year = {2018},
keywords = {Big Data,Biomolecular simulations,Clouds,Graph Analytics,HPC,MIDAS,Network Science,Pathology,Polar Science,SPIDAL},
websites = {http://dsc.soic.indiana.edu/publications/FormattedSPIDALPaperJune2019.pdf,https://www.researchgate.net/publication/328090399_Contributions_to_High-Performance_Big_Data_Computing},
id = {18ff2b58-d289-3380-8a8c-61e4d8ece87b},
created = {2019-10-01T17:21:01.803Z},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:32.348Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {false},
hidden = {false},
citation_key = {Fox2018},
private_publication = {false},
abstract = {Our project is at the interface of Big Data and HPC-High-Performance Big Data computing and this paper describes a collaboration between 7 collaborating Universities at Arizona State, Indiana (lead), Kansas, Rutgers, Stony Brook, Virginia Tech, and Utah. It addresses the intersection of High-performance and Big Data computing with several different application areas or communities driving the requirements for software systems and algorithms. We describe the base architecture, including the HPC-ABDS, High-Performance Computing enhanced Apache Big Data Stack, and an application use case study identifying key features that determine software and algorithm requirements. We summarize middleware including Harp-DAAL collective communication layer, Twister2 Big Data toolkit, and pilot jobs. Then we present the SPIDAL Scalable Parallel Interoperable Data Analytics Library and our work for it in core machine-learning, image processing and the application communities, Network science, Polar Science, Biomolecular Simulations, Pathology, and Spatial systems. We describe basic algorithms and their integration in end-to-end use cases.},
bibtype = {techreport},
author = {Fox, Geoffrey and Qiu, Judy and Crandall, David and Laszewski, Gregor Von and Beckstein, Oliver and Paden, John and Paraskevakos, Ioannis and Jha, Shantenu and Wang, Fusheng and Marathe, Madhav and Vullikanti, Anil and Cheatham, Thomas}
}
Our project is at the interface of Big Data and HPC-High-Performance Big Data computing and this paper describes a collaboration between 7 collaborating Universities at Arizona State, Indiana (lead), Kansas, Rutgers, Stony Brook, Virginia Tech, and Utah. It addresses the intersection of High-performance and Big Data computing with several different application areas or communities driving the requirements for software systems and algorithms. We describe the base architecture, including the HPC-ABDS, High-Performance Computing enhanced Apache Big Data Stack, and an application use case study identifying key features that determine software and algorithm requirements. We summarize middleware including Harp-DAAL collective communication layer, Twister2 Big Data toolkit, and pilot jobs. Then we present the SPIDAL Scalable Parallel Interoperable Data Analytics Library and our work for it in core machine-learning, image processing and the application communities, Network science, Polar Science, Biomolecular Simulations, Pathology, and Spatial systems. We describe basic algorithms and their integration in end-to-end use cases.
Detecting ice layers in radar images with deep learning.
Hamid Kamangir, Maryam Rahnemoonfar, Dugan Dobbs, J Paden, G., F.
Technical Report 2018.
Website
link
bibtex
abstract
@techreport{
title = {Detecting ice layers in radar images with deep learning},
type = {techreport},
year = {2018},
pages = {2-5},
issue = {April},
websites = {https://pdfs.semanticscholar.org/e24d/e190e0e01e0e53003fa83daeb4557859f9f6.pdf},
id = {fb72b8b1-5f54-3eb5-96b9-2e0835f2b601},
created = {2019-10-01T17:21:01.973Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:32.071Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {HamidKamangirMaryamRahnemoonfarDuganDobbsJPaden2018},
private_publication = {false},
abstract = {This paper proposes a Deep Convolutional Neural Network approach to detect Ice surface and bottom layers from radar imagery. Radar images are capable to penetrate the earth surface and provide us with valuable information from the underlying layers of ice surface. In recent years, deep hierarchical learning techniques for object detection and segmentation greatly improved the performance of traditional techniques based on hand-crafted feature engineering. We designed a hybrid Deep Convolutional Network to produce the images of surface and bottom ice boundary as outputs. Our network takes advantage of undecimated wavelet transform to provide the highest level of information from radar images, as well as multi-layer and multi-scale optimized architecture. In this work, radar images from 2009-2016 NASA Operation IceBridge Mission are used to train and test the network. Our network outperformed the state-of-the art accuracy.},
bibtype = {techreport},
author = {Hamid Kamangir, Maryam Rahnemoonfar, Dugan Dobbs, J Paden, Geoffrey Fox}
}
This paper proposes a Deep Convolutional Neural Network approach to detect Ice surface and bottom layers from radar imagery. Radar images are capable to penetrate the earth surface and provide us with valuable information from the underlying layers of ice surface. In recent years, deep hierarchical learning techniques for object detection and segmentation greatly improved the performance of traditional techniques based on hand-crafted feature engineering. We designed a hybrid Deep Convolutional Network to produce the images of surface and bottom ice boundary as outputs. Our network takes advantage of undecimated wavelet transform to provide the highest level of information from radar images, as well as multi-layer and multi-scale optimized architecture. In this work, radar images from 2009-2016 NASA Operation IceBridge Mission are used to train and test the network. Our network outperformed the state-of-the art accuracy.
Multi-task Spatiotemporal Neural Networks for Structured Surface Reconstruction.
Xu, M.; Fan, C.; Paden, J., D.; Fox, G., C.; and Crandall, D., J.
In
Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, volume 2018-Janua, pages 1273-1282, 5 2018. Institute of Electrical and Electronics Engineers Inc.
Paper
doi
link
bibtex
abstract
@inproceedings{
title = {Multi-task Spatiotemporal Neural Networks for Structured Surface Reconstruction},
type = {inproceedings},
year = {2018},
pages = {1273-1282},
volume = {2018-Janua},
month = {5},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
day = {3},
id = {8d825f5a-82a4-3345-885e-4258ee92181b},
created = {2019-10-01T17:21:02.013Z},
accessed = {2019-09-04},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:32.100Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Xu2018},
private_publication = {false},
abstract = {Deep learning methods have surpassed the performance of traditional techniques on a wide range of problems in computer vision, but nearly all of this work has studied consumer photos, where precisely correct output is often not critical. It is less clear how well these techniques may apply on structured prediction problems where fine-grained output with high precision is required, such as in scientific imaging domains. Here we consider the problem of segmenting echogram radar data collected from the polar ice sheets, which is challenging because segmentation boundaries are often very weak and there is a high degree of noise. We propose a multi-task spatiotemporal neural network that combines 3D ConvNets and Recurrent Neural Networks (RNNs) to estimate ice surface boundaries from sequences of tomographic radar images. We show that our model outperforms the state-of-the-art on this problem by (1) avoiding the need for hand-tuned parameters, (2) extracting multiple surfaces (ice-air and ice-bed) simultaneously, (3) requiring less non-visual metadata, and (4) being about 6 times faster.},
bibtype = {inproceedings},
author = {Xu, Mingze and Fan, Chenyou and Paden, John D. and Fox, Geoffrey C. and Crandall, David J.},
doi = {10.1109/WACV.2018.00144},
booktitle = {Proceedings - 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018}
}
Deep learning methods have surpassed the performance of traditional techniques on a wide range of problems in computer vision, but nearly all of this work has studied consumer photos, where precisely correct output is often not critical. It is less clear how well these techniques may apply on structured prediction problems where fine-grained output with high precision is required, such as in scientific imaging domains. Here we consider the problem of segmenting echogram radar data collected from the polar ice sheets, which is challenging because segmentation boundaries are often very weak and there is a high degree of noise. We propose a multi-task spatiotemporal neural network that combines 3D ConvNets and Recurrent Neural Networks (RNNs) to estimate ice surface boundaries from sequences of tomographic radar images. We show that our model outperforms the state-of-the-art on this problem by (1) avoiding the need for hand-tuned parameters, (2) extracting multiple surfaces (ice-air and ice-bed) simultaneously, (3) requiring less non-visual metadata, and (4) being about 6 times faster.
Twister: Net - Communication Library for Big Data Processing in HPC and Cloud Environments.
Kamburugamuve, S.; Wickramasinghe, P.; Govindarajan, K.; Uyar, A.; Gunduz, G.; Abeykoon, V.; and Fox, G.
In
IEEE International Conference on Cloud Computing, CLOUD, volume 2018-July, pages 383-391, 9 2018. IEEE Computer Society
Paper
doi
link
bibtex
abstract
@inproceedings{
title = {Twister: Net - Communication Library for Big Data Processing in HPC and Cloud Environments},
type = {inproceedings},
year = {2018},
keywords = {Big-data,Collectives,HPC,MPI,Streaming},
pages = {383-391},
volume = {2018-July},
month = {9},
publisher = {IEEE Computer Society},
day = {7},
id = {5805ab79-5f24-3379-ac55-473415607664},
created = {2019-10-01T17:21:02.072Z},
accessed = {2019-09-04},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:32.126Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Kamburugamuve2018},
private_publication = {false},
abstract = {Streaming processing and batch data processing are the dominant forms of big data analytics today, with numerous systems such as Hadoop, Spark, and Heron designed to process the ever-increasing explosion of data. Generally, these systems are developed as single projects with aspects such as communication, task management, and data management integrated together. By contrast, we take a component-based approach to big data by developing the essential features of a big data system as independent components with polymorphic implementations to support different requirements. Consequently, we recognize the requirements of both dataflow used in popular Apache Systems and the Bulk Synchronous Processing communication style common in High-Performance Computing (HPC) for different applications. Message Passing Interface (MPI) implementations are dominant in HPC but there are no such standard libraries available for big data. Twister:Net is a stand-alone, highly optimized dataflow style parallel communication library which can be used by big data systems or advanced users. Twister:Net can work both in cloud environments using TCP or HPC environments using MPI implementations. This paper introduces Twister:Net and compares it with existing systems to highlight its design and performance. © 2018 IEEE.},
bibtype = {inproceedings},
author = {Kamburugamuve, Supun and Wickramasinghe, Pulasthi and Govindarajan, Kannan and Uyar, Ahmet and Gunduz, Gurhan and Abeykoon, Vibhatha and Fox, Geoffrey},
doi = {10.1109/CLOUD.2018.00055},
booktitle = {IEEE International Conference on Cloud Computing, CLOUD}
}
Streaming processing and batch data processing are the dominant forms of big data analytics today, with numerous systems such as Hadoop, Spark, and Heron designed to process the ever-increasing explosion of data. Generally, these systems are developed as single projects with aspects such as communication, task management, and data management integrated together. By contrast, we take a component-based approach to big data by developing the essential features of a big data system as independent components with polymorphic implementations to support different requirements. Consequently, we recognize the requirements of both dataflow used in popular Apache Systems and the Bulk Synchronous Processing communication style common in High-Performance Computing (HPC) for different applications. Message Passing Interface (MPI) implementations are dominant in HPC but there are no such standard libraries available for big data. Twister:Net is a stand-alone, highly optimized dataflow style parallel communication library which can be used by big data systems or advanced users. Twister:Net can work both in cloud environments using TCP or HPC environments using MPI implementations. This paper introduces Twister:Net and compares it with existing systems to highlight its design and performance. © 2018 IEEE.
Anatomy of machine learning algorithm implementations in MPI, Spark, and Flink.
Kamburugamuve, S.; Wickramasinghe, P.; Ekanayake, S.; and Fox, G., C.
International Journal of High Performance Computing Applications, 32(1): 61-73. 2018.
Website
doi
link
bibtex
abstract
@article{
title = {Anatomy of machine learning algorithm implementations in MPI, Spark, and Flink},
type = {article},
year = {2018},
keywords = {Artificial intelligence; Big data; Data flow analy,Data flow modeling; Flink; High performance compu,Learning algorithms},
pages = {61-73},
volume = {32},
websites = {https://www.scopus.com/inward/record.uri?eid=2-s2.0-85039854240&doi=10.1177%2F1094342017712976&partnerID=40&md5=0a1048e69609d95f438e0b2f01466624},
publisher = {SAGE Publications Inc.},
id = {fee0ce1f-2b00-3b30-aa0a-c17641db8593},
created = {2019-10-01T17:21:02.665Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:21:02.665Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Kamburugamuve201861},
source_type = {article},
notes = {cited By 1},
private_publication = {false},
abstract = {With the ever-increasing need to analyze large amounts of data to get useful insights, it is essential to develop complex parallel machine learning algorithms that can scale with data and number of parallel processes. These algorithms need to run on large data sets as well as they need to be executed with minimal time in order to extract useful information in a time-constrained environment. Message passing interface (MPI) is a widely used model for developing such algorithms in high-performance computing paradigm, while Apache Spark and Apache Flink are emerging as big data platforms for large-scale parallel machine learning. Even though these big data frameworks are designed differently, they follow the data flow model for execution and user APIs. Data flow model offers fundamentally different capabilities than the MPI execution model, but the same type of parallelism can be used in applications developed in both models. This article presents three distinct machine learning algorithms implemented in MPI, Spark, and Flink and compares their performance and identifies strengths and weaknesses in each platform. © 2017, © The Author(s) 2017.},
bibtype = {article},
author = {Kamburugamuve, S and Wickramasinghe, P and Ekanayake, S and Fox, Geoffrey Charles},
doi = {10.1177/1094342017712976},
journal = {International Journal of High Performance Computing Applications},
number = {1}
}
With the ever-increasing need to analyze large amounts of data to get useful insights, it is essential to develop complex parallel machine learning algorithms that can scale with data and number of parallel processes. These algorithms need to run on large data sets as well as they need to be executed with minimal time in order to extract useful information in a time-constrained environment. Message passing interface (MPI) is a widely used model for developing such algorithms in high-performance computing paradigm, while Apache Spark and Apache Flink are emerging as big data platforms for large-scale parallel machine learning. Even though these big data frameworks are designed differently, they follow the data flow model for execution and user APIs. Data flow model offers fundamentally different capabilities than the MPI execution model, but the same type of parallelism can be used in applications developed in both models. This article presents three distinct machine learning algorithms implemented in MPI, Spark, and Flink and compares their performance and identifies strengths and weaknesses in each platform. © 2017, © The Author(s) 2017.
Global analysis of charge exchange meson production at high energies.
Nys, J.; Hiller Blin, A., N.; Mathieu, V.; Fernández-Ramírez, C.; Jackura, A.; Pilloni, A.; Ryckebusch, J.; Szczepaniak, A., P.; and Fox, G.
. 2018.
Paper
doi
link
bibtex
abstract
@article{
title = {Global analysis of charge exchange meson production at high energies},
type = {article},
year = {2018},
keywords = {doi:10.1103/PhysRevD.98.034020 url:https://doi.org},
id = {2f92bdd6-79a9-3829-84ca-a27996b929c2},
created = {2019-10-01T17:21:02.857Z},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:32.387Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {false},
hidden = {false},
citation_key = {Nys2018},
private_publication = {false},
abstract = {Many experiments that are conducted to study the hadron spectrum rely on peripheral resonance production. Hereby, the rapidity gap allows the process to be viewed as an independent fragmentation of the beam and the target, with the beam fragmentation dominated by production and decays of meson resonances. We test this separation by determining the kinematic regimes that are dominated by factorizable contributions, indicating the most favorable regions to perform this kind of experiments. In doing so, we use a Regge model to analyze the available world data of charge exchange meson production with beam momentum above 5 GeV in the laboratory frame that are not dominated by either pion or Pomeron exchanges. We determine the Regge residues and point out the kinematic regimes which are dominated by factorizable contributions.},
bibtype = {article},
author = {Nys, J and Hiller Blin, A N and Mathieu, V and Fernández-Ramírez, C and Jackura, A and Pilloni, A and Ryckebusch, J and Szczepaniak, A P and Fox, G},
doi = {10.1103/PhysRevD.98.034020}
}
Many experiments that are conducted to study the hadron spectrum rely on peripheral resonance production. Hereby, the rapidity gap allows the process to be viewed as an independent fragmentation of the beam and the target, with the beam fragmentation dominated by production and decays of meson resonances. We test this separation by determining the kinematic regimes that are dominated by factorizable contributions, indicating the most favorable regions to perform this kind of experiments. In doing so, we use a Regge model to analyze the available world data of charge exchange meson production with beam momentum above 5 GeV in the laboratory frame that are not dominated by either pion or Pomeron exchanges. We determine the Regge residues and point out the kinematic regimes which are dominated by factorizable contributions.
Task-parallel analysis of molecular dynamics trajectories.
Paraskevakos, I.; Chantzialexiou, G.; Luckow, A.; Cheatham, T., E.; Khoshlessan, M.; Beckstein, O.; Fox, G., C.; and Jha, S.
In
Proceedings of the 47th International Conference on Parallel Processing (ICPP 2018), 8 2018. Association for Computing Machinery
Paper
doi
link
bibtex
abstract
@inproceedings{
title = {Task-parallel analysis of molecular dynamics trajectories},
type = {inproceedings},
year = {2018},
keywords = {Data analytics,MD analysis,MD simulations analysis,Task-parallel},
month = {8},
publisher = {Association for Computing Machinery},
day = {13},
id = {ec60cc2e-ca18-3e16-8fac-313503fbbe74},
created = {2019-10-01T17:21:02.886Z},
accessed = {2019-09-03},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:32.324Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {false},
hidden = {false},
citation_key = {Paraskevakos2018},
private_publication = {false},
abstract = {Different parallel frameworks for implementing data analysis applications have been proposed by the HPC and Big Data communities. In this paper, we investigate three task-parallel frameworks: Spark, Dask and RADICAL-Pilot with respect to their ability to support data analytics on HPC resources and compare them with MPI. We investigate the data analysis requirements of Molecular Dynamics (MD) simulations which are significant consumers of supercomputing cycles, producing immense amounts of data. A typical large-scale MD simulation of a physical system of O(100k) atoms over \musecs can produce from O(10) GB to O(1000) GBs of data. We propose and evaluate different approaches for parallelization of a representative set of MD trajectory analysis algorithms, in particular the computation of path similarity and leaflet identification. We evaluate Spark, Dask and RADICAL-Pilot with respect to their abstractions and runtime engine capabilities to support these algorithms. We provide a conceptual basis for comparing and understanding different frameworks that enable users to select the optimal system for each application. We also provide a quantitative performance analysis of the different algorithms across the three frameworks.},
bibtype = {inproceedings},
author = {Paraskevakos, Ioannis and Chantzialexiou, George and Luckow, Andre and Cheatham, Thomas E. and Khoshlessan, Mahzad and Beckstein, Oliver and Fox, Geoffrey C. and Jha, Shantenu},
doi = {10.1145/3225058.3225128},
booktitle = {Proceedings of the 47th International Conference on Parallel Processing (ICPP 2018)}
}
Different parallel frameworks for implementing data analysis applications have been proposed by the HPC and Big Data communities. In this paper, we investigate three task-parallel frameworks: Spark, Dask and RADICAL-Pilot with respect to their ability to support data analytics on HPC resources and compare them with MPI. We investigate the data analysis requirements of Molecular Dynamics (MD) simulations which are significant consumers of supercomputing cycles, producing immense amounts of data. A typical large-scale MD simulation of a physical system of O(100k) atoms over \musecs can produce from O(10) GB to O(1000) GBs of data. We propose and evaluate different approaches for parallelization of a representative set of MD trajectory analysis algorithms, in particular the computation of path similarity and leaflet identification. We evaluate Spark, Dask and RADICAL-Pilot with respect to their abstractions and runtime engine capabilities to support these algorithms. We provide a conceptual basis for comparing and understanding different frameworks that enable users to select the optimal system for each application. We also provide a quantitative performance analysis of the different algorithms across the three frameworks.
Searching the Sequence Read Archive Using Jetstream and Wrangler.
Levi, K.; Rynge, M.; Abeysinghe, E.; and Edwards, R., A.
In
Proceedings of the Practice and Experience on Advanced Research Computing (PEARC '18), of
PEARC '18, pages 50:1--50:7, 7 2018. ACM
Paper
Website
doi
link
bibtex
abstract
@inproceedings{
title = {Searching the Sequence Read Archive Using Jetstream and Wrangler},
type = {inproceedings},
year = {2018},
keywords = {Apache Airavata,Bacteriophage,Credential Store,Jetstream,Metagenomics,Metagenomics Discovery Challenge,SRA,SRA Gateway,SciGaP,Search SRA,Sequence Read Archive,Wrangler},
pages = {50:1--50:7},
websites = {http://doi.acm.org/10.1145/3219104.3229278},
month = {7},
publisher = {ACM},
day = {22},
city = {New York, NY, USA},
series = {PEARC '18},
id = {0aa8664d-88c2-338b-bb19-e0153c638275},
created = {2019-10-01T17:21:06.731Z},
accessed = {2019-09-12},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:30:53.978Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Levi:2018:SSR:3219104.3229278},
source_type = {inproceedings},
private_publication = {false},
abstract = {The Sequence Read Archive (SRA), the world’s largest database of sequences, hosts approximately 10 petabases (1016 bp) of sequence data and is growing at the alarming rate of 10 TB per day. Yet this rich trove of data is inaccessible to most researchers: searching through the SRA requires large storage and computing facilities that are beyond the capacity of most laboratories. Enabling scientists to analyze existing sequence data will provide insight into ecology, medicine, and industrial applications. In this project we specifi- cally focus on metagenomic sequences (whole community data sets from different environments). We are developing a set of tools to enable biologists to mine the metagenomes in the SRA using the NSF-funded cloud computing resources, Jetstream and Wrangler. We have developed a proof-of-principle pipeline to demonstrate the feasibility of the approach. We are leveraging our existing in- frastructure to enable all scientists to access the SRA metagenomes regardless of their computational ability and are working to create a stable pipeline with a science gateway portal that is accessible to all researchers.تنيىسبت},
bibtype = {inproceedings},
author = {Levi, Kyle and Rynge, Mats and Abeysinghe, Eroma and Edwards, Robert A},
doi = {10.1145/3219104.3229278},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing (PEARC '18)}
}
The Sequence Read Archive (SRA), the world’s largest database of sequences, hosts approximately 10 petabases (1016 bp) of sequence data and is growing at the alarming rate of 10 TB per day. Yet this rich trove of data is inaccessible to most researchers: searching through the SRA requires large storage and computing facilities that are beyond the capacity of most laboratories. Enabling scientists to analyze existing sequence data will provide insight into ecology, medicine, and industrial applications. In this project we specifi- cally focus on metagenomic sequences (whole community data sets from different environments). We are developing a set of tools to enable biologists to mine the metagenomes in the SRA using the NSF-funded cloud computing resources, Jetstream and Wrangler. We have developed a proof-of-principle pipeline to demonstrate the feasibility of the approach. We are leveraging our existing in- frastructure to enable all scientists to access the SRA metagenomes regardless of their computational ability and are working to create a stable pipeline with a science gateway portal that is accessible to all researchers.تنيىسبت
Grid Technology for Supporting Health Education and Measuring the Health Outcome.
Sukhija, N.; Datta, A., K.; Sevin, S.; Coulter, E.; Datta, A., K.; and Coulter, E.
In
Proceedings of the Practice and Experience on Advanced Research Computing (PEARC '18), of
PEARC '18, pages 89:1--89:4, 7 2018. ACM
Paper
Website
doi
link
bibtex
abstract
@inproceedings{
title = {Grid Technology for Supporting Health Education and Measuring the Health Outcome},
type = {inproceedings},
year = {2018},
keywords = {Community health,Cyberinfrastructure,Data grid,Data integration,Grid computing,Health education,IRODS,Mobile technology,Portal,Virtual,XSEDE,community health,cyberinfrastructure,data grid,data integration,grid computing,health education,iRODS,mobile technology,portal,virtual},
pages = {89:1--89:4},
websites = {http://doi.acm.org/10.1145/3219104.3229247},
month = {7},
publisher = {ACM},
day = {22},
city = {New York, NY, USA},
series = {PEARC '18},
id = {b451500f-8b78-390a-adc2-ac93018d72dc},
created = {2019-10-01T17:21:06.910Z},
accessed = {2019-08-26},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:30:52.752Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Sukhija:2018:GTS:3219104.3229247},
source_type = {inproceedings},
private_publication = {false},
abstract = {In this paper, we present our developed health-IT solution that address these challenges: developing the strategies not only to store such a vast amount of data but also making those available to the researchers for further analysis that can measure the outcome on the participant's health. The developed community health grid (C-Grid) solution to store, manage and share large amounts of these instruction materials and participant's health related data, where the remote management and analysis of this data grid is performed using iRODS, the Integrated Rule-Oriented Data System is discussed and presented in this paper.},
bibtype = {inproceedings},
author = {Sukhija, Nitin and Datta, Arun K. and Sevin, Sonny and Coulter, Eric and Datta, Arun K. and Coulter, Eric},
doi = {10.1145/3219104.3229247},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing (PEARC '18)}
}
In this paper, we present our developed health-IT solution that address these challenges: developing the strategies not only to store such a vast amount of data but also making those available to the researchers for further analysis that can measure the outcome on the participant's health. The developed community health grid (C-Grid) solution to store, manage and share large amounts of these instruction materials and participant's health related data, where the remote management and analysis of this data grid is performed using iRODS, the Integrated Rule-Oriented Data System is discussed and presented in this paper.
The Report of the 2018 NSF Cybersecurity Summit for Large Facilities and Cyberinfrastructure.
Andrew Adams, Jeannette Dopheide, Mark Krenz, James Marsteller, V., W.; and and John Zage
Technical Report 2018.
Paper
Website
link
bibtex
@techreport{
title = {The Report of the 2018 NSF Cybersecurity Summit for Large Facilities and Cyberinfrastructure},
type = {techreport},
year = {2018},
websites = {http://hdl.handle.net/2022/22588.},
id = {d046dd10-b0f1-3ffb-a184-366947174419},
created = {2019-10-01T17:21:18.861Z},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:34.471Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {AndrewAdamsJeannetteDopheideMarkKrenzJamesMarsteller2018},
private_publication = {false},
bibtype = {techreport},
author = {Andrew Adams, Jeannette Dopheide, Mark Krenz, James Marsteller, Von Welch and and John Zage, undefined}
}
Trusted CI Annual Report for 2018.
Adams, A.; Avila, K.; Atkins, J.; Basney, J.; Bohland, L.; Borecky, D.; Cowles, R.; Dopheide, J.; Fleury, T.; Harbour, G.; Heymann, E.; Hudson, F.; Jackson, C.; Kiser, R.; Krenz, M.; Marsteller, J.; Miller, B.; Raquel, W.; Ruff, P.; Russell, S.; Shah, Z.; Shankar, A.; Sons, S.; Welch, V.; and Zage, J.
Technical Report 2018.
Paper
Website
link
bibtex
@techreport{
title = {Trusted CI Annual Report for 2018},
type = {techreport},
year = {2018},
websites = {hp://hdl.handle.net/2022/22597},
id = {8521c330-0ce3-30de-87d8-ceb3ed5be4d5},
created = {2019-10-01T17:21:21.337Z},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:34.283Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Adams2018},
private_publication = {false},
bibtype = {techreport},
author = {Adams, Andrew and Avila, Kay and Atkins, Joel and Basney, Jim and Bohland, Leslee and Borecky, Diana and Cowles, Robert and Dopheide, Jeannee and Fleury, Terry and Harbour, Grayson and Heymann, Elisa and Hudson, Florence and Jackson, Craig and Kiser, Ryan and Krenz, Mark and Marsteller, Jim and Miller, Barton and Raquel, Warren and Ruff, Preston and Russell, Scoo and Shah, Zalak and Shankar, Anurag and Sons, Susan and Welch, Von and Zage, John}
}
A New Science Gateway to Provide Decision Support on Carbon Capture and Storage Technologies.
Wang, Y.; Pamidighantam, S.; Yaw, S.; Abeysinghe, E.; Marru, S.; Christie, M.; Ellett, K.; Pierce, M.; and Middleton, R.
In
Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18, of
PEARC '18, pages 1-3, 2018. ACM Press
Website
doi
link
bibtex
abstract
@inproceedings{
title = {A New Science Gateway to Provide Decision Support on Carbon Capture and Storage Technologies},
type = {inproceedings},
year = {2018},
keywords = {Carbon capture,Science gateways},
pages = {1-3},
websites = {http://doi.acm.org/10.1145/3219104.3229244,http://dl.acm.org/citation.cfm?doid=3219104.3229244},
publisher = {ACM Press},
city = {New York, New York, USA},
series = {PEARC '18},
id = {9a7802b5-afcb-3485-bcc1-6a50f5bb5320},
created = {2019-10-01T17:21:22.663Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:21:22.663Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Wang:2018:NSG:3219104.3229244},
source_type = {inproceedings},
private_publication = {false},
abstract = {Carbon dioxide capture and storage (CCS) is a promising technology for mitigating climate change, and its implementation is considered critical to meeting threshold targets for global warming in the 21st century. We have developed a new science gateway application for the successful modeling software known as SimCCS that is used for evaluating complex, integrated CCS infrastructure. Using the Apache Airavata middleware and high-performance computing resources made available by the Extreme Science and Engineering Discovery Environment, we built the SimCCS Gateway to expand the tool's scalability for decision support and risk assessment. Case studies developed for evaluating a proposed CCS technology at Duke Energy's Gibson Station coal-fired power plant in southwest Indiana demonstrate its improved ability in data analysis as well as risk assessment at various uncertainty levels. Further work is continuing to expand the functionality of both web and desktop clients, and to develop an active user group community in research and industry via the SimCCS Gateway interface.},
bibtype = {inproceedings},
author = {Wang, Yinzhi and Pamidighantam, Sudhakar and Yaw, Sean and Abeysinghe, Eroma and Marru, Suresh and Christie, Marcus and Ellett, Kevin and Pierce, Marlon and Middleton, Richard},
doi = {10.1145/3219104.3229244},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18}
}
Carbon dioxide capture and storage (CCS) is a promising technology for mitigating climate change, and its implementation is considered critical to meeting threshold targets for global warming in the 21st century. We have developed a new science gateway application for the successful modeling software known as SimCCS that is used for evaluating complex, integrated CCS infrastructure. Using the Apache Airavata middleware and high-performance computing resources made available by the Extreme Science and Engineering Discovery Environment, we built the SimCCS Gateway to expand the tool's scalability for decision support and risk assessment. Case studies developed for evaluating a proposed CCS technology at Duke Energy's Gibson Station coal-fired power plant in southwest Indiana demonstrate its improved ability in data analysis as well as risk assessment at various uncertainty levels. Further work is continuing to expand the functionality of both web and desktop clients, and to develop an active user group community in research and industry via the SimCCS Gateway interface.
Science Gateway Implementation at the University of South Dakota.
Madison, J., D.; Abeysinghe, E.; Pamidighantam, S.; Marru, S.; Christie, M.; Jennewein, D., M.; and Pierce, M.
In
Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18, pages 1-4, 7 2018. ACM Press
Paper
Website
doi
link
bibtex
abstract
@inproceedings{
title = {Science Gateway Implementation at the University of South Dakota},
type = {inproceedings},
year = {2018},
keywords = {Apache Airavata,Gateway,Keycloak,SciGaP},
pages = {1-4},
websites = {http://dl.acm.org/citation.cfm?doid=3219104.3229265},
month = {7},
publisher = {ACM Press},
day = {22},
city = {New York, New York, USA},
id = {c9d11ec4-7bfb-3ded-8c6c-ab4bc1411d9d},
created = {2019-10-01T17:21:23.986Z},
accessed = {2019-08-19},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:27.546Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Madison2018},
private_publication = {false},
abstract = {Science Gateways are virtual environments that accelerate scientific discovery by enabling scientific communities to more easily and effectively utilize distributed computing and data resources. Successful Science Gateways provide access to sophisticated and powerful resources, while shielding their users from the underlying complexities. Here we present work completed by the University of South Dakota (USD) Research Computing Group in conjunction with the Science Gateways Community Institute (SGCI) [1] and Indiana University on setting up a Science Gateway to access USD's high-performance computing resources. These resources are now available to both faculty and students and allow ease of access and use of USD's distributed computing and data resources. The implementation of this gateway project has been multifaceted and has included placement of federated user login, user facilitation and outreach, and integration of USD's cyberinfrastructure resources. We present this project as an example for other research computing groups so that they may learn from our successes and the challenges that we have overcome in providing this user resource. Additionally, this project serves to exemplify the importance of creating a broad user base of research computing infrastructure through the development of alternative user interfaces such as Science Gateways.},
bibtype = {inproceedings},
author = {Madison, Joseph D. and Abeysinghe, Eroma and Pamidighantam, Sudhakar and Marru, Suresh and Christie, Marcus and Jennewein, Douglas M. and Pierce, Marlon},
doi = {10.1145/3219104.3229265},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18}
}
Science Gateways are virtual environments that accelerate scientific discovery by enabling scientific communities to more easily and effectively utilize distributed computing and data resources. Successful Science Gateways provide access to sophisticated and powerful resources, while shielding their users from the underlying complexities. Here we present work completed by the University of South Dakota (USD) Research Computing Group in conjunction with the Science Gateways Community Institute (SGCI) [1] and Indiana University on setting up a Science Gateway to access USD's high-performance computing resources. These resources are now available to both faculty and students and allow ease of access and use of USD's distributed computing and data resources. The implementation of this gateway project has been multifaceted and has included placement of federated user login, user facilitation and outreach, and integration of USD's cyberinfrastructure resources. We present this project as an example for other research computing groups so that they may learn from our successes and the challenges that we have overcome in providing this user resource. Additionally, this project serves to exemplify the importance of creating a broad user base of research computing infrastructure through the development of alternative user interfaces such as Science Gateways.
Django Content Management System Evaluation and Integration with Apache Airavata.
Adithela, S., P.; Christie, M.; Marru, S.; and Pierce, M.
In
Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18, pages 1-4, 7 2018. ACM Press
Paper
Website
doi
link
bibtex
abstract
@inproceedings{
title = {Django Content Management System Evaluation and Integration with Apache Airavata},
type = {inproceedings},
year = {2018},
keywords = {ACM proceedings,Apache Airavata,Django Framework,Science Gateway,Text tagging,Wagtail CMS},
pages = {1-4},
websites = {http://dl.acm.org/citation.cfm?doid=3219104.3229272},
month = {7},
publisher = {ACM Press},
day = {22},
city = {New York, New York, USA},
id = {c6abbe69-ea24-3a68-bae1-288f60799003},
created = {2019-10-01T17:21:24.843Z},
accessed = {2019-08-19},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:27.127Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Adithela2018},
private_publication = {false},
abstract = {Apache Airavata is an open-source software framework that enables scientific researchers to compose, manage, execute and monitor large-scale applications and workflows on distributed computing resources. Airavata is currently leveraged by many science gateways to perform computations on shared clusters. Currently, Gateway Administrators managing content on their websites will require the assistance of the Airavata Developer Team to make the slightest of change to their website. This paper will overcome this challenge by presenting the benefits of integrating a content management system. It will also briefly evaluate various options available for choosing a Content Management Platform which complies with the Airavata Architecture Standards. This feature will enable researchers with minimal web design knowledge to easily manage content across their gateway. It is also poised to drastically increase the productivity of the Airavata developer team and the gateway administrators. © 2018 Association for Computing Machinery.},
bibtype = {inproceedings},
author = {Adithela, Stephen Paul and Christie, Marcus and Marru, Suresh and Pierce, Marlon},
doi = {10.1145/3219104.3229272},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18}
}
Apache Airavata is an open-source software framework that enables scientific researchers to compose, manage, execute and monitor large-scale applications and workflows on distributed computing resources. Airavata is currently leveraged by many science gateways to perform computations on shared clusters. Currently, Gateway Administrators managing content on their websites will require the assistance of the Airavata Developer Team to make the slightest of change to their website. This paper will overcome this challenge by presenting the benefits of integrating a content management system. It will also briefly evaluate various options available for choosing a Content Management Platform which complies with the Airavata Architecture Standards. This feature will enable researchers with minimal web design knowledge to easily manage content across their gateway. It is also poised to drastically increase the productivity of the Airavata developer team and the gateway administrators. © 2018 Association for Computing Machinery.
Radar Determination of Fault Slip and Location in Partially Decorrelated Images.
Parker, J.; Glasscoe, M.; Donnellan, A.; Stough, T.; Pierce, M.; and Wang, J.
Earthquakes and Multi-hazards Around the Pacific Rim, Vol. I, pages 101-116. Springer, 2018.
link
bibtex
@inbook{
type = {inbook},
year = {2018},
pages = {101-116},
publisher = {Springer},
id = {e0e289b6-bcaa-38e7-9be3-96b2820b9089},
created = {2019-10-01T17:21:26.366Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:27.268Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Parker2018},
source_type = {CHAP},
private_publication = {false},
bibtype = {inbook},
author = {Parker, Jay and Glasscoe, Margaret and Donnellan, Andrea and Stough, Timothy and Pierce, Marlon and Wang, Jun},
chapter = {Radar Determination of Fault Slip and Location in Partially Decorrelated Images},
title = {Earthquakes and Multi-hazards Around the Pacific Rim, Vol. I}
}
Your Good Health is a Workforce Issue.
Stewart, C., A.; and Krefeldt, M.
In
Proceedings of the Practice and Experience on Advanced Research Computing (PEARC '18), of
PEARC '18, pages 75:1--75:8, 2018. ACM
Website
doi
link
bibtex
abstract
@inproceedings{
title = {Your Good Health is a Workforce Issue},
type = {inproceedings},
year = {2018},
keywords = {SAD,Standard American Diet,acm reference format,cancer,health,life balance,preventive testing,sad,standard american diet,stress,work,work/life balance,workforce development},
pages = {75:1--75:8},
websites = {http://doi.acm.org/10.1145/3219104.3219107},
publisher = {ACM},
city = {New York, NY, USA},
series = {PEARC '18},
id = {5ce71fed-56ff-3ad9-b68f-36fcd7e5e191},
created = {2019-10-01T17:21:27.662Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:21:27.662Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Stewart:2018:YGH:3219104.3219107},
source_type = {inproceedings},
private_publication = {false},
abstract = {The high performance computing (HPC), cyberinfrastructure, and research and academic information technology communities are small - too small to fulfill current needs for such professionals in the US. Members of this community are also often under a lot of stress, and with that can come health problems. The senior author was diagnosed with Stage IV cancer in early 2017. In this paper, we share what we have learned about health management in general and dealing with cancer in particular, focusing on lessons that are portable to other members of the HPC, cyberinfrastructure, and research and academic information technology communities. We also make recommendations to the National Science Foundation regarding changes the NSF could make to reduce some of the stress this community feels on a day-in, day-out basis. The key point of this report is to provide information to members of the cyberinfrastructure community that they might not already have - and might not receive from their primary care physicians - that will help them live longer and healthier lives. While our own experiences are based on one of the author's diagnosis of cancer, the information presented here should be of general value to all in terms of strategies for reducing and detecting long-term health risks. Our hope is that this information will help you be as healthy as possible until you reach retirement age and then healthy during a well-deserved and long period of retirement!},
bibtype = {inproceedings},
author = {Stewart, Craig A and Krefeldt, Marion},
doi = {10.1145/3219104.3219107},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing (PEARC '18)}
}
The high performance computing (HPC), cyberinfrastructure, and research and academic information technology communities are small - too small to fulfill current needs for such professionals in the US. Members of this community are also often under a lot of stress, and with that can come health problems. The senior author was diagnosed with Stage IV cancer in early 2017. In this paper, we share what we have learned about health management in general and dealing with cancer in particular, focusing on lessons that are portable to other members of the HPC, cyberinfrastructure, and research and academic information technology communities. We also make recommendations to the National Science Foundation regarding changes the NSF could make to reduce some of the stress this community feels on a day-in, day-out basis. The key point of this report is to provide information to members of the cyberinfrastructure community that they might not already have - and might not receive from their primary care physicians - that will help them live longer and healthier lives. While our own experiences are based on one of the author's diagnosis of cancer, the information presented here should be of general value to all in terms of strategies for reducing and detecting long-term health risks. Our hope is that this information will help you be as healthy as possible until you reach retirement age and then healthy during a well-deserved and long period of retirement!
Using a Science Gateway to Deliver SimVascular Software As a Service for Classroom Instruction.
Wilson, N., M.; Marru, S.; Abeysinghe, E.; Christie, M., A.; Maher, G., D.; Updegrove, A., R.; Pierce, M.; and Marsden, A., L.
In
Proceedings of the Practice and Experience on Advanced Research Computing, of
PEARC '18, pages 102:1--102:4, 2018. ACM
Website
doi
link
bibtex
abstract
@inproceedings{
title = {Using a Science Gateway to Deliver SimVascular Software As a Service for Classroom Instruction},
type = {inproceedings},
year = {2018},
keywords = {ACM proceedings,Apache Airavata,Science Gateway},
pages = {102:1--102:4},
websites = {http://doi.acm.org/10.1145/3219104.3229242},
publisher = {ACM},
city = {New York, NY, USA},
series = {PEARC '18},
id = {da3e301a-0208-3822-ba51-ece0d36db76f},
created = {2019-10-01T17:21:29.609Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:21:29.609Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Wilson:2018:USG:3219104.3229242},
source_type = {inproceedings},
private_publication = {false},
abstract = {SimVascular (http://www.simvascular.org) is open source software enabling users to construct image-based, patient-specific anatomic models and perform realistic blood flow simulation useful in disease research, medical device design, and surgical planning. The software consists of two core executables: a front-end application and a flow solver. The front-end application enables users to create patient-specific anatomic models from imaging data, generate finite-element meshes, prescribe boundary conditions, and set up an analysis. The finite-element based blood flow solver utilizes MPI and is massively scalable. SimVascular has been successfully integrated into graduate level courses on cardiovascular modeling at multiple institutions including Stanford, UC Berkeley, Purdue, and Marquette to introduce state-of-the-art modeling to the students and provide a basis for hands-on projects. While the front-end application can be installed and run on a laptop, the flow solver requires high performance computing (HPC) for realistic problem sizes. This provides a significant challenge for instructors as many students are unfamiliar with HPC, and local resources might be limited or difficult to administer. There is also a need to provide user and group management capabilities for courses: students should authenticate using campus credentials, instructors should be able to access students' work, and students' access to computing allocations should be limited. Our poster will detail an Apache Airavata-based science gateway to address these needs. XSEDE's Comet provides the backend computing power. This approach allows the SimVascular team to provision HPC resources and install and maintain the software providing students access at institutions across the country. The science gateway interface provides access to SimVascular's flow solver, while allowing students to use SimVascular's desktop interfaces.},
bibtype = {inproceedings},
author = {Wilson, Nathan M and Marru, Suresh and Abeysinghe, Eroma and Christie, Marcus A and Maher, Gabriel D and Updegrove, Adam R and Pierce, Marlon and Marsden, Alison L},
doi = {10.1145/3219104.3229242},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing}
}
SimVascular (http://www.simvascular.org) is open source software enabling users to construct image-based, patient-specific anatomic models and perform realistic blood flow simulation useful in disease research, medical device design, and surgical planning. The software consists of two core executables: a front-end application and a flow solver. The front-end application enables users to create patient-specific anatomic models from imaging data, generate finite-element meshes, prescribe boundary conditions, and set up an analysis. The finite-element based blood flow solver utilizes MPI and is massively scalable. SimVascular has been successfully integrated into graduate level courses on cardiovascular modeling at multiple institutions including Stanford, UC Berkeley, Purdue, and Marquette to introduce state-of-the-art modeling to the students and provide a basis for hands-on projects. While the front-end application can be installed and run on a laptop, the flow solver requires high performance computing (HPC) for realistic problem sizes. This provides a significant challenge for instructors as many students are unfamiliar with HPC, and local resources might be limited or difficult to administer. There is also a need to provide user and group management capabilities for courses: students should authenticate using campus credentials, instructors should be able to access students' work, and students' access to computing allocations should be limited. Our poster will detail an Apache Airavata-based science gateway to address these needs. XSEDE's Comet provides the backend computing power. This approach allows the SimVascular team to provision HPC resources and install and maintain the software providing students access at institutions across the country. The science gateway interface provides access to SimVascular's flow solver, while allowing students to use SimVascular's desktop interfaces.
PHASTA Science Gateway for High Performance Computational Fluid Dynamics.
Smith, C., W.; Abeysinghe, E.; Marru, S.; and Jansen, K., E.
In
Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18, pages 1-4, 7 2018. ACM Press
Paper
Website
doi
link
bibtex
abstract
@inproceedings{
title = {PHASTA Science Gateway for High Performance Computational Fluid Dynamics},
type = {inproceedings},
year = {2018},
keywords = {Apache Airavata,CFD,Paralllel Unstructured Mesh,Pervasive Technology Institute,Science Gateway,Science Gateways Research Center},
pages = {1-4},
websites = {http://dl.acm.org/citation.cfm?doid=3219104.3229243},
month = {7},
publisher = {ACM Press},
day = {22},
city = {New York, New York, USA},
id = {467c8437-2ddb-3d8e-96f0-d91b2dcf4881},
created = {2019-10-01T17:21:30.353Z},
accessed = {2019-09-12},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:46.407Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Smith2018},
private_publication = {false},
abstract = {The Parallel Hierarchic Adaptive Stabilized Transient Analysis (PHASTA) software supports modeling compressible or incompressible, laminar or turbulent, steady or unsteady flows in 3D using unstructured grids. PHASTA has been applied to industrial and academic flows on complex, as-designed geometric models with over one billion mesh elements using upwards of one million compute cores. The PHASTA Science Gateway (phasta.scigap.org) brings these increasingly critical technologies to a larger user base by providing a central hub for simulation execution, simulation data management, and documentation. Researchers and engineers using the gateway can easily define and execute simulations on the TACC Stampede2 Skylake and Knights Landing nodes without being burdened by the details of remote access, the job scheduler, and filesystem configuration. In addition to simplifying the simulation execution process, the gateway creates a searchable archive of past jobs that can be shared with other users to support reproducibility and increase productivity. Our poster presents the construction of the gateway with Apache Airavata, the simulation definition process, applications it currently supports, and our ongoing efforts to expand functionality, the user base, and the community.},
bibtype = {inproceedings},
author = {Smith, Cameron W. and Abeysinghe, Eroma and Marru, Suresh and Jansen, Kenneth E.},
doi = {10.1145/3219104.3229243},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18}
}
The Parallel Hierarchic Adaptive Stabilized Transient Analysis (PHASTA) software supports modeling compressible or incompressible, laminar or turbulent, steady or unsteady flows in 3D using unstructured grids. PHASTA has been applied to industrial and academic flows on complex, as-designed geometric models with over one billion mesh elements using upwards of one million compute cores. The PHASTA Science Gateway (phasta.scigap.org) brings these increasingly critical technologies to a larger user base by providing a central hub for simulation execution, simulation data management, and documentation. Researchers and engineers using the gateway can easily define and execute simulations on the TACC Stampede2 Skylake and Knights Landing nodes without being burdened by the details of remote access, the job scheduler, and filesystem configuration. In addition to simplifying the simulation execution process, the gateway creates a searchable archive of past jobs that can be shared with other users to support reproducibility and increase productivity. Our poster presents the construction of the gateway with Apache Airavata, the simulation definition process, applications it currently supports, and our ongoing efforts to expand functionality, the user base, and the community.
Simplifying Access to Campus Resources at Southern Illinois University with a Science Gateway.
Sunkara, S., S.; Langin, C.; Pierce, M.; Abeysinghe, E.; Pamidighantam, S.; and Marru, S.
In
Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18, of
PEARC '18, pages 1-4, 2018. ACM Press
Website
doi
link
bibtex
abstract
@inproceedings{
title = {Simplifying Access to Campus Resources at Southern Illinois University with a Science Gateway},
type = {inproceedings},
year = {2018},
keywords = {Apache Airavata,MaSuRCA,Science Gateway},
pages = {1-4},
websites = {http://doi.acm.org/10.1145/3219104.3229252,http://dl.acm.org/citation.cfm?doid=3219104.3229252},
publisher = {ACM Press},
city = {New York, New York, USA},
series = {PEARC '18},
id = {520a034a-d0f2-3f63-9666-5aae497e287b},
created = {2019-10-01T17:21:30.464Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:21:30.464Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Sunkara:2018:SAC:3219104.3229252},
source_type = {inproceedings},
private_publication = {false},
abstract = {Not all the researchers are comfortable in using High Performance Computing (HPC) Systems. Southern Illinois University's Office of Information Technology (OIT) Research Computing assists our researchers to get on these systems and use them for their research activities. One such potential use case at SIU involves the group of researchers from Life Sciences who were trying to use MaSuRCA [3], a genome-sequencing tool for their research. Although the leaders of this research are well versed in using BigDog [4](SIU's HPC Cluster), other fellow researchers did have issues in using the cluster for their work. It was time to look for efficient ways of enabling them to use the cluster. We examined using Science Gateways which can help our researchers to use the computational cluster without logging on to a Linux-based HPC system. OIT is currently collaborating with the Science Gateways Research Center at Indiana University (IU) on the use of Apache Airavata [1] as a Science Gateway framework for the MaSuRCA user community at SIU. The IU team members provide hosting and operations for Apache Airavata middleware as part of the SciGaP.org project. IU collaborators also provide a basic science gateway user interface, the PGA. The SIU gateway, although hosted off campus, is integrated with SIU's BigDog cluster.},
bibtype = {inproceedings},
author = {Sunkara, Sai Susheel and Langin, Chet and Pierce, Marlon and Abeysinghe, Eroma and Pamidighantam, Sudhakar and Marru, Suresh},
doi = {10.1145/3219104.3229252},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing - PEARC '18}
}
Not all the researchers are comfortable in using High Performance Computing (HPC) Systems. Southern Illinois University's Office of Information Technology (OIT) Research Computing assists our researchers to get on these systems and use them for their research activities. One such potential use case at SIU involves the group of researchers from Life Sciences who were trying to use MaSuRCA [3], a genome-sequencing tool for their research. Although the leaders of this research are well versed in using BigDog [4](SIU's HPC Cluster), other fellow researchers did have issues in using the cluster for their work. It was time to look for efficient ways of enabling them to use the cluster. We examined using Science Gateways which can help our researchers to use the computational cluster without logging on to a Linux-based HPC system. OIT is currently collaborating with the Science Gateways Research Center at Indiana University (IU) on the use of Apache Airavata [1] as a Science Gateway framework for the MaSuRCA user community at SIU. The IU team members provide hosting and operations for Apache Airavata middleware as part of the SciGaP.org project. IU collaborators also provide a basic science gateway user interface, the PGA. The SIU gateway, although hosted off campus, is integrated with SIU's BigDog cluster.
Building a Science Gateway For Processing and Modeling Sequencing Data Via Apache Airavata.
Wang, Z.; Christie, M., A.; Abeysinghe, E.; Chu, T.; Marru, S.; Pierce, M.; and Danko, C., G.
In
Proceedings of the Practice and Experience on Advanced Research Computing (PEARC '18), of
PEARC '18, pages 39:1--39:7, 2018. ACM
Website
doi
link
bibtex
abstract
@inproceedings{
title = {Building a Science Gateway For Processing and Modeling Sequencing Data Via Apache Airavata},
type = {inproceedings},
year = {2018},
keywords = {Apache Airavata,Next Generation Sequencing,Science gateway,cloud computing,sequencing data,software-as-a-service},
pages = {39:1--39:7},
websites = {http://doi.acm.org/10.1145/3219104.3219141},
publisher = {ACM},
city = {New York, NY, USA},
series = {PEARC '18},
id = {dc6648d0-be5e-3642-8535-a6360c187f25},
created = {2019-10-01T17:21:30.492Z},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2019-10-01T17:21:30.492Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Wang:2018:BSG:3219104.3219141},
source_type = {inproceedings},
private_publication = {false},
abstract = {The amount of DNA sequencing data has been exponentially growing during the past decade due to advances in sequencing technology. Processing and modeling large amounts of sequencing data can be computationally intractable for desktop computing platforms. High performance computing (HPC) resources offer advantages in terms of computing power, and can be a general solution to these problems. Using HPCs directly for computational needs requires skilled users who know their way around HPCs and acquiring such skills take time. Science gateways acts as the middle layer between users and HPCs, providing users with the resources to accomplish compute-intensive tasks without requiring specialized expertise. We developed a web-based computing platform for genome biologists by customizing the PHP Gateway for Airavata (PGA) framework that accesses publicly accessible HPC resources via Apache Airavata. This web computing platform takes advantage of the Extreme Science and Engineering Discovery Environment (XSEDE) which provides the resources for gateway development, including access to CPU, GPU, and storage resources. We used this platform to develop a gateway for the dREG algorithm, an online computing tool for finding functional regions in mammalian genomes using nascent RNA sequencing data. The dREG gateway provides its users a free, powerful and user-friendly GPU computing resource based on XSEDE, circumventing the need of specialized knowledge about installation, configuration, and execution on an HPC for biologists. The dREG gateway is available at: https://dREG.dnasequence.org/.},
bibtype = {inproceedings},
author = {Wang, Zhong and Christie, Marcus A and Abeysinghe, Eroma and Chu, Tinyi and Marru, Suresh and Pierce, Marlon and Danko, Charles G},
doi = {10.1145/3219104.3219141},
booktitle = {Proceedings of the Practice and Experience on Advanced Research Computing (PEARC '18)}
}
The amount of DNA sequencing data has been exponentially growing during the past decade due to advances in sequencing technology. Processing and modeling large amounts of sequencing data can be computationally intractable for desktop computing platforms. High performance computing (HPC) resources offer advantages in terms of computing power, and can be a general solution to these problems. Using HPCs directly for computational needs requires skilled users who know their way around HPCs and acquiring such skills take time. Science gateways acts as the middle layer between users and HPCs, providing users with the resources to accomplish compute-intensive tasks without requiring specialized expertise. We developed a web-based computing platform for genome biologists by customizing the PHP Gateway for Airavata (PGA) framework that accesses publicly accessible HPC resources via Apache Airavata. This web computing platform takes advantage of the Extreme Science and Engineering Discovery Environment (XSEDE) which provides the resources for gateway development, including access to CPU, GPU, and storage resources. We used this platform to develop a gateway for the dREG algorithm, an online computing tool for finding functional regions in mammalian genomes using nascent RNA sequencing data. The dREG gateway provides its users a free, powerful and user-friendly GPU computing resource based on XSEDE, circumventing the need of specialized knowledge about installation, configuration, and execution on an HPC for biologists. The dREG gateway is available at: https://dREG.dnasequence.org/.
Big provenance stream processing for data intensive computations.
Suriarachchi, I.; Withana, S.; and Plale, B.
In
Proceedings - IEEE 14th International Conference on eScience, e-Science 2018, pages 245-255, 12 2018. Institute of Electrical and Electronics Engineers Inc.
doi
link
bibtex
abstract
@inproceedings{
title = {Big provenance stream processing for data intensive computations},
type = {inproceedings},
year = {2018},
keywords = {Big Data,Big Provenance,Stream Processing},
pages = {245-255},
month = {12},
publisher = {Institute of Electrical and Electronics Engineers Inc.},
day = {24},
id = {02faffe9-74a8-370f-9c62-67aef6795cd5},
created = {2020-04-22T21:44:56.973Z},
accessed = {2020-04-21},
file_attached = {false},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-05-11T14:43:32.831Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {false},
hidden = {false},
citation_key = {Suriarachchi2018},
private_publication = {false},
abstract = {In the business and research landscape of today, data analysis consumes public and proprietary data from numerous sources, and utilizes any one or more of popular data-parallel frameworks such as Hadoop, Spark and Flink. In the Data Lake setting these frameworks co-exist. Our earlier work has shown that data provenance in Data Lakes can aid with both traceability and management. The sheer volume of fine-grained provenance generated in a multi-framework application motivates the need for on-the-fly provenance processing. We introduce a new parallel stream processing algorithm that reduces fine-grained provenance while preserving backward and forward provenance. The algorithm is resilient to provenance events arriving out-of-order. It is evaluated using several strategies for partitioning a provenance stream. The evaluation shows that the parallel algorithm performs well in processing out-of-order provenance streams, with good scalability and accuracy.},
bibtype = {inproceedings},
author = {Suriarachchi, Isuru and Withana, Sachith and Plale, Beth},
doi = {10.1109/eScience.2018.00039},
booktitle = {Proceedings - IEEE 14th International Conference on eScience, e-Science 2018}
}
In the business and research landscape of today, data analysis consumes public and proprietary data from numerous sources, and utilizes any one or more of popular data-parallel frameworks such as Hadoop, Spark and Flink. In the Data Lake setting these frameworks co-exist. Our earlier work has shown that data provenance in Data Lakes can aid with both traceability and management. The sheer volume of fine-grained provenance generated in a multi-framework application motivates the need for on-the-fly provenance processing. We introduce a new parallel stream processing algorithm that reduces fine-grained provenance while preserving backward and forward provenance. The algorithm is resilient to provenance events arriving out-of-order. It is evaluated using several strategies for partitioning a provenance stream. The evaluation shows that the parallel algorithm performs well in processing out-of-order provenance streams, with good scalability and accuracy.
ABI Sustaining: The National Center for Genome Analysis Support 2018 Annual Report.
Doak, T., G.; Stewart, C., A.; and Michaels, S., D.
Technical Report 9 2018.
Paper
Website
link
bibtex
abstract
@techreport{
title = {ABI Sustaining: The National Center for Genome Analysis Support 2018 Annual Report},
type = {techreport},
year = {2018},
keywords = {NCGAS,National Science Foundation},
websites = {http://creativecommons.org/licenses/by/4.0/.},
month = {9},
day = {10},
id = {91895bbb-5bce-3750-8a88-83f0d62b156c},
created = {2020-09-09T20:50:48.696Z},
accessed = {2020-09-09},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-09-15T22:44:01.372Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {true},
hidden = {false},
citation_key = {Doak2018},
folder_uuids = {3b35931e-fb6d-48f9-8e01-87ee16ef0331},
private_publication = {false},
abstract = {National Science Foundation ABI-1458641},
bibtype = {techreport},
author = {Doak, Thomas G. and Stewart, Craig A. and Michaels, Scott D}
}
National Science Foundation ABI-1458641
Summary of the National Center for Genome Analysis Support (NCGAS) 2018 de Novo Transcriptome Workflow and Workshop.
Sanders, S., A.; Ganote, C., L.; Papudeshi, B.; Stewart, C., A.; and Doak, T., G.
Technical Report 6 2018.
Paper
Website
link
bibtex
abstract
@techreport{
title = {Summary of the National Center for Genome Analysis Support (NCGAS) 2018 de Novo Transcriptome Workflow and Workshop},
type = {techreport},
year = {2018},
keywords = {NCGAS,NSF Report,Transcriptome Assembly,Workshop},
websites = {https://scholarworks.iu.edu/dspace/handle/2022/22254},
month = {6},
day = {7},
id = {d81eb260-ae70-3b34-929a-07aca2b774bc},
created = {2020-09-09T21:03:42.928Z},
accessed = {2020-09-09},
file_attached = {true},
profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
last_modified = {2020-09-15T22:44:01.385Z},
read = {false},
starred = {false},
authored = {true},
confirmed = {false},
hidden = {false},
citation_key = {Sanders2018},
folder_uuids = {3b35931e-fb6d-48f9-8e01-87ee16ef0331},
private_publication = {false},
abstract = {The National Center for Genome Analysis Support (NCGAS) held a workshop entitled "de novo Assembly of Transcriptomes using HPC Resources" on April 30th, 2018 through May 1, 2018. This workshop was in serving NCGAS's mission of enabling the biological research community to analyze, understand, and make use of the genomic information now available by packaging our now seven years of experience assisting with de novo transcriptome assemblies and running High Performance Computing (HPC) resources into a documented, easily approachable workflow for our users. The workshop covered common questions and problems that our users have had in HC (such as job handling, resource availability, data management, and troubleshooting) and in the construction of transcriptomes (such as software choices, combination of assemblies, and downstream analyses). The two-day workshop also highlighted the available resources for US scientists, concentrating heavily on available XSEDE resources for analyses, visualization, and archiving of data.},
bibtype = {techreport},
author = {Sanders, Sheri A and Ganote, Carrie L and Papudeshi, Bhavya and Stewart, Craig A and Doak, Thomas G}
}
The National Center for Genome Analysis Support (NCGAS) held a workshop entitled "de novo Assembly of Transcriptomes using HPC Resources" on April 30th, 2018 through May 1, 2018. This workshop was in serving NCGAS's mission of enabling the biological research community to analyze, understand, and make use of the genomic information now available by packaging our now seven years of experience assisting with de novo transcriptome assemblies and running High Performance Computing (HPC) resources into a documented, easily approachable workflow for our users. The workshop covered common questions and problems that our users have had in HC (such as job handling, resource availability, data management, and troubleshooting) and in the construction of transcriptomes (such as software choices, combination of assemblies, and downstream analyses). The two-day workshop also highlighted the available resources for US scientists, concentrating heavily on available XSEDE resources for analyses, visualization, and archiving of data.