A hierarchical framework for cross-domain MapReduce execution

A hierarchical framework for cross-domain MapReduce execution. Luo, Y., Guo, Z., Sun, Y., Plale, B., Qiu, J., & Li, W. In ECMLS'11 - Proceedings of the 2nd International Workshop on Emerging Computational Methods for the Life Sciences, 2011.
doi abstract bibtex

The MapReduce programming model provides an easy way to execute pleasantly parallel applications. Many data-intensive life science applications fit this programming model and benefit from the scalability that can be delivered using this model. One such application is AutoDock, which consists of a suite of automated tools for predicting the bound conformations of flexible ligands to macromolecular targets. However, researchers also need sufficient computation and storage resources to fully enjoy the benefit of MapReduce. For example, a typical AutoDock based virtual screening experiment usually consists of a very large number of docking processes from multiple ligands and is often time consuming to run on a single MapReduce cluster. Although commercial clouds can provide virtually unlimited computation and storage resources on-demand, due to financial, security and possibly other concerns, many researchers still run experiments on a number of small clusters with limited number of nodes that cannot unleash the full power of MapReduce. In this paper, we present a hierarchical MapReduce framework that gathers computation resources from different clusters and run MapReduce jobs across them. The global controller in our framework splits the data set and dispatches them to multiple "local" MapReduce clusters, and balances the workload by assigning tasks in accordance to the capabilities of each cluster and of each node. The local results are then returned back to the global controller for global reduction. Our experimental evaluation using AutoDock over MapReduce shows that our load-balancing algorithm makes promising workload distribution across multiple clusters, and thus minimizes overall execution time span of the entire MapReduce execution. © Copyright 2011 ACM.

@inproceedings{
 title = {A hierarchical framework for cross-domain MapReduce execution},
 type = {inproceedings},
 year = {2011},
 id = {19fd1a67-3331-3b50-b40e-a4832653f43b},
 created = {2019-10-01T17:20:59.863Z},
 file_attached = {false},
 profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
 last_modified = {2019-10-01T17:24:01.247Z},
 read = {true},
 starred = {false},
 authored = {true},
 confirmed = {true},
 hidden = {false},
 citation_key = {Luo2011},
 folder_uuids = {73f994b4-a3be-4035-a6dd-3802077ce863,36d8ccf4-7085-47fa-8ab9-897283d082c5},
 private_publication = {false},
 abstract = {The MapReduce programming model provides an easy way to execute pleasantly parallel applications. Many data-intensive life science applications fit this programming model and benefit from the scalability that can be delivered using this model. One such application is AutoDock, which consists of a suite of automated tools for predicting the bound conformations of flexible ligands to macromolecular targets. However, researchers also need sufficient computation and storage resources to fully enjoy the benefit of MapReduce. For example, a typical AutoDock based virtual screening experiment usually consists of a very large number of docking processes from multiple ligands and is often time consuming to run on a single MapReduce cluster. Although commercial clouds can provide virtually unlimited computation and storage resources on-demand, due to financial, security and possibly other concerns, many researchers still run experiments on a number of small clusters with limited number of nodes that cannot unleash the full power of MapReduce. In this paper, we present a hierarchical MapReduce framework that gathers computation resources from different clusters and run MapReduce jobs across them. The global controller in our framework splits the data set and dispatches them to multiple "local" MapReduce clusters, and balances the workload by assigning tasks in accordance to the capabilities of each cluster and of each node. The local results are then returned back to the global controller for global reduction. Our experimental evaluation using AutoDock over MapReduce shows that our load-balancing algorithm makes promising workload distribution across multiple clusters, and thus minimizes overall execution time span of the entire MapReduce execution. © Copyright 2011 ACM.},
 bibtype = {inproceedings},
 author = {Luo, Y. and Guo, Z. and Sun, Y. and Plale, B. and Qiu, J. and Li, W.W.},
 doi = {10.1145/1996023.1996026},
 booktitle = {ECMLS'11 - Proceedings of the 2nd International Workshop on Emerging Computational Methods for the Life Sciences}
}

Downloads: 0

{"_id":"J7qWvGiA2QiWgrzCv","bibbaseid":"luo-guo-sun-plale-qiu-li-ahierarchicalframeworkforcrossdomainmapreduceexecution-2011","downloads":0,"creationDate":"2018-03-12T19:10:27.482Z","title":"A hierarchical framework for cross-domain MapReduce execution","author_short":["Luo, Y.","Guo, Z.","Sun, Y.","Plale, B.","Qiu, J.","Li, W."],"year":2011,"bibtype":"inproceedings","biburl":"https://bibbase.org/service/mendeley/42d295c0-0737-38d6-8b43-508cab6ea85d","bibdata":{"title":"A hierarchical framework for cross-domain MapReduce execution","type":"inproceedings","year":"2011","id":"19fd1a67-3331-3b50-b40e-a4832653f43b","created":"2019-10-01T17:20:59.863Z","file_attached":false,"profile_id":"42d295c0-0737-38d6-8b43-508cab6ea85d","last_modified":"2019-10-01T17:24:01.247Z","read":"true","starred":false,"authored":"true","confirmed":"true","hidden":false,"citation_key":"Luo2011","folder_uuids":"73f994b4-a3be-4035-a6dd-3802077ce863,36d8ccf4-7085-47fa-8ab9-897283d082c5","private_publication":false,"abstract":"The MapReduce programming model provides an easy way to execute pleasantly parallel applications. Many data-intensive life science applications fit this programming model and benefit from the scalability that can be delivered using this model. One such application is AutoDock, which consists of a suite of automated tools for predicting the bound conformations of flexible ligands to macromolecular targets. However, researchers also need sufficient computation and storage resources to fully enjoy the benefit of MapReduce. For example, a typical AutoDock based virtual screening experiment usually consists of a very large number of docking processes from multiple ligands and is often time consuming to run on a single MapReduce cluster. Although commercial clouds can provide virtually unlimited computation and storage resources on-demand, due to financial, security and possibly other concerns, many researchers still run experiments on a number of small clusters with limited number of nodes that cannot unleash the full power of MapReduce. In this paper, we present a hierarchical MapReduce framework that gathers computation resources from different clusters and run MapReduce jobs across them. The global controller in our framework splits the data set and dispatches them to multiple \"local\" MapReduce clusters, and balances the workload by assigning tasks in accordance to the capabilities of each cluster and of each node. The local results are then returned back to the global controller for global reduction. Our experimental evaluation using AutoDock over MapReduce shows that our load-balancing algorithm makes promising workload distribution across multiple clusters, and thus minimizes overall execution time span of the entire MapReduce execution. © Copyright 2011 ACM.","bibtype":"inproceedings","author":"Luo, Y. and Guo, Z. and Sun, Y. and Plale, B. and Qiu, J. and Li, W.W.","doi":"10.1145/1996023.1996026","booktitle":"ECMLS'11 - Proceedings of the 2nd International Workshop on Emerging Computational Methods for the Life Sciences","bibtex":"@inproceedings{\n title = {A hierarchical framework for cross-domain MapReduce execution},\n type = {inproceedings},\n year = {2011},\n id = {19fd1a67-3331-3b50-b40e-a4832653f43b},\n created = {2019-10-01T17:20:59.863Z},\n file_attached = {false},\n profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},\n last_modified = {2019-10-01T17:24:01.247Z},\n read = {true},\n starred = {false},\n authored = {true},\n confirmed = {true},\n hidden = {false},\n citation_key = {Luo2011},\n folder_uuids = {73f994b4-a3be-4035-a6dd-3802077ce863,36d8ccf4-7085-47fa-8ab9-897283d082c5},\n private_publication = {false},\n abstract = {The MapReduce programming model provides an easy way to execute pleasantly parallel applications. Many data-intensive life science applications fit this programming model and benefit from the scalability that can be delivered using this model. One such application is AutoDock, which consists of a suite of automated tools for predicting the bound conformations of flexible ligands to macromolecular targets. However, researchers also need sufficient computation and storage resources to fully enjoy the benefit of MapReduce. For example, a typical AutoDock based virtual screening experiment usually consists of a very large number of docking processes from multiple ligands and is often time consuming to run on a single MapReduce cluster. Although commercial clouds can provide virtually unlimited computation and storage resources on-demand, due to financial, security and possibly other concerns, many researchers still run experiments on a number of small clusters with limited number of nodes that cannot unleash the full power of MapReduce. In this paper, we present a hierarchical MapReduce framework that gathers computation resources from different clusters and run MapReduce jobs across them. The global controller in our framework splits the data set and dispatches them to multiple \"local\" MapReduce clusters, and balances the workload by assigning tasks in accordance to the capabilities of each cluster and of each node. The local results are then returned back to the global controller for global reduction. Our experimental evaluation using AutoDock over MapReduce shows that our load-balancing algorithm makes promising workload distribution across multiple clusters, and thus minimizes overall execution time span of the entire MapReduce execution. © Copyright 2011 ACM.},\n bibtype = {inproceedings},\n author = {Luo, Y. and Guo, Z. and Sun, Y. and Plale, B. and Qiu, J. and Li, W.W.},\n doi = {10.1145/1996023.1996026},\n booktitle = {ECMLS'11 - Proceedings of the 2nd International Workshop on Emerging Computational Methods for the Life Sciences}\n}","author_short":["Luo, Y.","Guo, Z.","Sun, Y.","Plale, B.","Qiu, J.","Li, W."],"biburl":"https://bibbase.org/service/mendeley/42d295c0-0737-38d6-8b43-508cab6ea85d","bibbaseid":"luo-guo-sun-plale-qiu-li-ahierarchicalframeworkforcrossdomainmapreduceexecution-2011","role":"author","urls":{},"metadata":{"authorlinks":{}},"downloads":0},"search_terms":["hierarchical","framework","cross","domain","mapreduce","execution","luo","guo","sun","plale","qiu","li"],"keywords":[],"authorIDs":[],"dataSources":["zgahneP4uAjKbudrQ","ya2CyA73rpZseyrZ8","2252seNhipfTmjEBQ"]}