Using XDMoD to facilitate XSEDE operations, planning and analysis. Furlani, T., Schneider, B., Jones, M., Towns, J., Hart, D., Gallo, S., Deleon, R., Lu, C., Ghadersohi, A., Gentner, R., Patra, A., Laszewski, G., Wang, F., Palmer, J., & Simakov, N. In ACM International Conference Proceeding Series, 2013.
doi  abstract   bibtex   
The XDMoD auditing tool provides, for the first time, a comprehensive tool to measure both utilization and performance of high-end cyberinfrastructure (CI), with initial focus on XSEDE. Here, we demonstrate, through several case studies, its utility for providing important metrics regarding resource utilization and performance of TeraGrid/XSEDE that can be used for detailed analysis and planning as well as improving operational efficiency and performance. Measuring the utilization of high-end cyberinfrastructure such as XSEDE helps provide a detailed understanding of how a given CI resource is being utilized and can lead to improved performance of the resource in terms of job throughput or any number of desired job characteristics. In the case studies considered here, a detailed historical analysis of XSEDE usage data using XDMoD clearly demonstrates the tremendous growth in the number of users, overall usage, and scale of the simulations routinely carried out. Not surprisingly, physics, chemistry, and the engineering disciplines are shown to be heavy users of the resources. However, as the data clearly show, molecular biosciences are now a significant and growing user of XSEDE resources, accounting for more than 20 percent of all SUs consumed in 2012. XDMoD shows that the resources required by the various scientific disciplines are very different. Physics, Astronomical sciences, and Atmospheric sciences tend to solve large problems requiring many cores. Molecular biosciences applications on the other hand, require many cycles but do not employ core counts that are as large. Such distinctions are important in guiding future cyberinfrastructure design decisions. XDMoD's implementation of a novel application kernel-based auditing system to measure overall CI system performance and quality of service is shown, through several examples, to provide a useful means to automatically detect under performing hardware and software. This capability is especially critical given the complex composition of today's advanced CI. Examples include an application kernel based on a widely used quantum chemistry program that uncovered a software bug in the I/O stack of a commercial parallel file system, which was subsequently fixed by the vendor in the form of a software patch that is now part of their standard release. This error, which resulted in dramatically increased execution times as well as outright job failure, would likely have gone unnoticed for sometime and was only uncovered as a result of implementation of XDMoD's suite of application kernels. © 2013 by the Association for Computing Machinery, Inc.
@inproceedings{
 title = {Using XDMoD to facilitate XSEDE operations, planning and analysis},
 type = {inproceedings},
 year = {2013},
 id = {9d4d5742-0a86-3a1b-837f-8e9204c0ab4f},
 created = {2019-10-01T17:20:58.173Z},
 file_attached = {false},
 profile_id = {42d295c0-0737-38d6-8b43-508cab6ea85d},
 last_modified = {2019-10-01T17:24:36.175Z},
 read = {false},
 starred = {false},
 authored = {true},
 confirmed = {true},
 hidden = {false},
 citation_key = {Furlani2013a},
 folder_uuids = {82975498-107c-4bb3-bb76-f87bce3e9f6e},
 private_publication = {false},
 abstract = {The XDMoD auditing tool provides, for the first time, a comprehensive tool to measure both utilization and performance of high-end cyberinfrastructure (CI), with initial focus on XSEDE. Here, we demonstrate, through several case studies, its utility for providing important metrics regarding resource utilization and performance of TeraGrid/XSEDE that can be used for detailed analysis and planning as well as improving operational efficiency and performance. Measuring the utilization of high-end cyberinfrastructure such as XSEDE helps provide a detailed understanding of how a given CI resource is being utilized and can lead to improved performance of the resource in terms of job throughput or any number of desired job characteristics. In the case studies considered here, a detailed historical analysis of XSEDE usage data using XDMoD clearly demonstrates the tremendous growth in the number of users, overall usage, and scale of the simulations routinely carried out. Not surprisingly, physics, chemistry, and the engineering disciplines are shown to be heavy users of the resources. However, as the data clearly show, molecular biosciences are now a significant and growing user of XSEDE resources, accounting for more than 20 percent of all SUs consumed in 2012. XDMoD shows that the resources required by the various scientific disciplines are very different. Physics, Astronomical sciences, and Atmospheric sciences tend to solve large problems requiring many cores. Molecular biosciences applications on the other hand, require many cycles but do not employ core counts that are as large. Such distinctions are important in guiding future cyberinfrastructure design decisions. XDMoD's implementation of a novel application kernel-based auditing system to measure overall CI system performance and quality of service is shown, through several examples, to provide a useful means to automatically detect under performing hardware and software. This capability is especially critical given the complex composition of today's advanced CI. Examples include an application kernel based on a widely used quantum chemistry program that uncovered a software bug in the I/O stack of a commercial parallel file system, which was subsequently fixed by the vendor in the form of a software patch that is now part of their standard release. This error, which resulted in dramatically increased execution times as well as outright job failure, would likely have gone unnoticed for sometime and was only uncovered as a result of implementation of XDMoD's suite of application kernels. © 2013 by the Association for Computing Machinery, Inc.},
 bibtype = {inproceedings},
 author = {Furlani, T.R. and Schneider, B.L. and Jones, M.D. and Towns, J. and Hart, D.L. and Gallo, S.M. and Deleon, R.L. and Lu, C.-D. and Ghadersohi, A. and Gentner, R.J. and Patra, A.K. and Laszewski, G.V. and Wang, F. and Palmer, J.T. and Simakov, N.},
 doi = {10.1145/2484762.2484763},
 booktitle = {ACM International Conference Proceeding Series}
}

Downloads: 0