Contributions to High-Performance Big Data Computing. Fox, G., Qiu, J., Crandall, D., Laszewski, G., V., Beckstein, O., Paden, J., Paraskevakos, I., Jha, S., Wang, F., Marathe, M., Vullikanti, A., & Cheatham, T. Technical Report 2018.
Contributions to High-Performance Big Data Computing [pdf]Paper  Contributions to High-Performance Big Data Computing [link]Website  abstract   bibtex   
Our project is at the interface of Big Data and HPC-High-Performance Big Data computing and this paper describes a collaboration between 7 collaborating Universities at Arizona State, Indiana (lead), Kansas, Rutgers, Stony Brook, Virginia Tech, and Utah. It addresses the intersection of High-performance and Big Data computing with several different application areas or communities driving the requirements for software systems and algorithms. We describe the base architecture, including the HPC-ABDS, High-Performance Computing enhanced Apache Big Data Stack, and an application use case study identifying key features that determine software and algorithm requirements. We summarize middleware including Harp-DAAL collective communication layer, Twister2 Big Data toolkit, and pilot jobs. Then we present the SPIDAL Scalable Parallel Interoperable Data Analytics Library and our work for it in core machine-learning, image processing and the application communities, Network science, Polar Science, Biomolecular Simulations, Pathology, and Spatial systems. We describe basic algorithms and their integration in end-to-end use cases.

Downloads: 0