Survey of Distributed Computing Frameworks for Supporting Big Data Analysis. Sun, X., He, Y., Wu, D., & Huang, J. Z. Big Data Mining and Analytics, 6(2):154–169, June, 2023. Conference Name: Big Data Mining and Analytics
Paper doi abstract bibtex Distributed computing frameworks are the fundamental component of distributed computing systems. They provide an essential way to support the efficient processing of big data on clusters or cloud. The size of big data increases at a pace that is faster than the increase in the big data processing capacity of clusters. Thus, distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in terabytes. In performing such tasks, these frameworks face three challenges: computational inefficiency due to high I/O and communication costs, non-scalability to big data due to memory limit, and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming model. New distributed computing frameworks need to be developed to conquer these challenges. In this paper, we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data analysis. In addition, we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.
@article{sun_survey_2023,
title = {Survey of {Distributed} {Computing} {Frameworks} for {Supporting} {Big} {Data} {Analysis}},
volume = {6},
issn = {2096-0654},
url = {https://ieeexplore.ieee.org/abstract/document/10026506},
doi = {10.26599/BDMA.2022.9020014},
abstract = {Distributed computing frameworks are the fundamental component of distributed computing systems. They provide an essential way to support the efficient processing of big data on clusters or cloud. The size of big data increases at a pace that is faster than the increase in the big data processing capacity of clusters. Thus, distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in terabytes. In performing such tasks, these frameworks face three challenges: computational inefficiency due to high I/O and communication costs, non-scalability to big data due to memory limit, and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming model. New distributed computing frameworks need to be developed to conquer these challenges. In this paper, we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data analysis. In addition, we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.},
number = {2},
urldate = {2023-11-14},
journal = {Big Data Mining and Analytics},
author = {Sun, Xudong and He, Yulin and Wu, Dingming and Huang, Joshua Zhexue},
month = jun,
year = {2023},
note = {Conference Name: Big Data Mining and Analytics},
keywords = {Survey, Distributed computing, Framework},
pages = {154--169},
file = {Sun et al_2023_Survey of Distributed Computing Frameworks for Supporting Big Data Analysis.pdf:C\:\\Users\\Guillaume\\Zotero\\storage\\RJ6VSNBY\\Sun et al_2023_Survey of Distributed Computing Frameworks for Supporting Big Data Analysis.pdf:application/pdf},
}
Downloads: 0
{"_id":"4pFXS8FNQNvffJCdQ","bibbaseid":"sun-he-wu-huang-surveyofdistributedcomputingframeworksforsupportingbigdataanalysis-2023","author_short":["Sun, X.","He, Y.","Wu, D.","Huang, J. Z."],"bibdata":{"bibtype":"article","type":"article","title":"Survey of Distributed Computing Frameworks for Supporting Big Data Analysis","volume":"6","issn":"2096-0654","url":"https://ieeexplore.ieee.org/abstract/document/10026506","doi":"10.26599/BDMA.2022.9020014","abstract":"Distributed computing frameworks are the fundamental component of distributed computing systems. They provide an essential way to support the efficient processing of big data on clusters or cloud. The size of big data increases at a pace that is faster than the increase in the big data processing capacity of clusters. Thus, distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in terabytes. In performing such tasks, these frameworks face three challenges: computational inefficiency due to high I/O and communication costs, non-scalability to big data due to memory limit, and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming model. New distributed computing frameworks need to be developed to conquer these challenges. In this paper, we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data analysis. In addition, we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.","number":"2","urldate":"2023-11-14","journal":"Big Data Mining and Analytics","author":[{"propositions":[],"lastnames":["Sun"],"firstnames":["Xudong"],"suffixes":[]},{"propositions":[],"lastnames":["He"],"firstnames":["Yulin"],"suffixes":[]},{"propositions":[],"lastnames":["Wu"],"firstnames":["Dingming"],"suffixes":[]},{"propositions":[],"lastnames":["Huang"],"firstnames":["Joshua","Zhexue"],"suffixes":[]}],"month":"June","year":"2023","note":"Conference Name: Big Data Mining and Analytics","keywords":"Survey, Distributed computing, Framework","pages":"154–169","file":"Sun et al_2023_Survey of Distributed Computing Frameworks for Supporting Big Data Analysis.pdf:C\\:\\\\Users\\\\Guillaume\\\\Zotero\\\\storage\\\\RJ6VSNBY\\§un et al_2023_Survey of Distributed Computing Frameworks for Supporting Big Data Analysis.pdf:application/pdf","bibtex":"@article{sun_survey_2023,\n\ttitle = {Survey of {Distributed} {Computing} {Frameworks} for {Supporting} {Big} {Data} {Analysis}},\n\tvolume = {6},\n\tissn = {2096-0654},\n\turl = {https://ieeexplore.ieee.org/abstract/document/10026506},\n\tdoi = {10.26599/BDMA.2022.9020014},\n\tabstract = {Distributed computing frameworks are the fundamental component of distributed computing systems. They provide an essential way to support the efficient processing of big data on clusters or cloud. The size of big data increases at a pace that is faster than the increase in the big data processing capacity of clusters. Thus, distributed computing frameworks based on the MapReduce computing model are not adequate to support big data analysis tasks which often require running complex analytical algorithms on extremely big data sets in terabytes. In performing such tasks, these frameworks face three challenges: computational inefficiency due to high I/O and communication costs, non-scalability to big data due to memory limit, and limited analytical algorithms because many serial algorithms cannot be implemented in the MapReduce programming model. New distributed computing frameworks need to be developed to conquer these challenges. In this paper, we review MapReduce-type distributed computing frameworks that are currently used in handling big data and discuss their problems when conducting big data analysis. In addition, we present a non-MapReduce distributed computing framework that has the potential to overcome big data analysis challenges.},\n\tnumber = {2},\n\turldate = {2023-11-14},\n\tjournal = {Big Data Mining and Analytics},\n\tauthor = {Sun, Xudong and He, Yulin and Wu, Dingming and Huang, Joshua Zhexue},\n\tmonth = jun,\n\tyear = {2023},\n\tnote = {Conference Name: Big Data Mining and Analytics},\n\tkeywords = {Survey, Distributed computing, Framework},\n\tpages = {154--169},\n\tfile = {Sun et al_2023_Survey of Distributed Computing Frameworks for Supporting Big Data Analysis.pdf:C\\:\\\\Users\\\\Guillaume\\\\Zotero\\\\storage\\\\RJ6VSNBY\\\\Sun et al_2023_Survey of Distributed Computing Frameworks for Supporting Big Data Analysis.pdf:application/pdf},\n}\n\n","author_short":["Sun, X.","He, Y.","Wu, D.","Huang, J. Z."],"key":"sun_survey_2023","id":"sun_survey_2023","bibbaseid":"sun-he-wu-huang-surveyofdistributedcomputingframeworksforsupportingbigdataanalysis-2023","role":"author","urls":{"Paper":"https://ieeexplore.ieee.org/abstract/document/10026506"},"keyword":["Survey","Distributed computing","Framework"],"metadata":{"authorlinks":{}},"html":""},"bibtype":"article","biburl":"https://bibbase.org/network/files/taiAEmmZxo42FqfQN","dataSources":["qG4exu4XyJLWyPN7m"],"keywords":["survey","distributed computing","framework"],"search_terms":["survey","distributed","computing","frameworks","supporting","big","data","analysis","sun","he","wu","huang"],"title":"Survey of Distributed Computing Frameworks for Supporting Big Data Analysis","year":2023}