Twister: Net - Communication Library for Big Data Processing in HPC and Cloud Environments. Kamburugamuve, S., Wickramasinghe, P., Govindarajan, K., Uyar, A., Gunduz, G., Abeykoon, V., & Fox, G. In IEEE International Conference on Cloud Computing, CLOUD, volume 2018-July, pages 383-391, 9, 2018. IEEE Computer Society.
Twister: Net - Communication Library for Big Data Processing in HPC and Cloud Environments [pdf]Paper  doi  abstract   bibtex   
Streaming processing and batch data processing are the dominant forms of big data analytics today, with numerous systems such as Hadoop, Spark, and Heron designed to process the ever-increasing explosion of data. Generally, these systems are developed as single projects with aspects such as communication, task management, and data management integrated together. By contrast, we take a component-based approach to big data by developing the essential features of a big data system as independent components with polymorphic implementations to support different requirements. Consequently, we recognize the requirements of both dataflow used in popular Apache Systems and the Bulk Synchronous Processing communication style common in High-Performance Computing (HPC) for different applications. Message Passing Interface (MPI) implementations are dominant in HPC but there are no such standard libraries available for big data. Twister:Net is a stand-alone, highly optimized dataflow style parallel communication library which can be used by big data systems or advanced users. Twister:Net can work both in cloud environments using TCP or HPC environments using MPI implementations. This paper introduces Twister:Net and compares it with existing systems to highlight its design and performance. © 2018 IEEE.

Downloads: 0