One network metric datastore to track them all: the OSG network metric service. Quick, R., Babik, M., Fajardo, E., M., Gross, K., Hayashi, S., Krenz, M., Lee, T., Mckee, S., Pipes, C., & Teige, S. Journal of Physics: Conf. Series, 898:82044, 2017.
One network metric datastore to track them all: the OSG network metric service [pdf]Paper  doi  abstract   bibtex   
The Open Science Grid (OSG) relies upon the network as a critical part of the distributed infrastructures it enables. In 2012, OSG added a new focus area in networking with a goal of becoming the primary source of network information for its members and collaborators. This includes gathering, organizing, and providing network metrics to guarantee effective network usage and prompt detection and resolution of any network issues, including connection failures, congestion, and traffic routing. In September of 2015, this service was deployed into the OSG production environment. We will report on the creation, implementation, testing, and deployment of the OSG Networking Service. Starting from organizing the deployment of perfSONAR toolkits within OSG and its partners, to the challenges of orchestrating regular testing between sites, to reliably gathering the resulting network metrics and making them available for users, virtual organizations, and higher level services, all aspects of implementation will be reviewed. In particular, several higher-level services were developed to bring the OSG network service to its full potential. These include a web-based mesh configuration system, which allows central scheduling and management of all the network tests performed by the instances; a set of probes to continually gather metrics from the remote instances and publish it to different sources; a central network datastore (esmond), which provides interfaces to access the network monitoring information in close to real time and historically (up to a year) giving the state of the tests; and a perfSONAR infrastructure monitor system, ensuring the current perfSONAR instances are correctly configured and operating as intended. We will also describe the challenges we encountered in ongoing operations of the network service and how we have evolved our procedures to address those challenges. Finally we will describe our plans for future extensions and improvements to the service.

Downloads: 0