EPJ Web Conf.
Volume 245, 202024th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2019)
|Number of page(s)||6|
|Section||3 - Middleware and Distributed Computing|
|Published online||16 November 2020|
Big data solutions for CMS computing monitoring and analytics
Universidad de los Andes, Colombia
2 Cornell University, Ithaca NY, 14850 USA
3 Istituto Nazionale di Fisica Nucleare, via Pietro Giuria 1, 10125 Torino, Italy
* e-mail: firstname.lastname@example.org
Published online: 16 November 2020
The CMS computing infrastructure is composed of several subsystems that accomplish complex tasks such as workload and data management, transfers, submission of user and centrally managed production requests. Till recently, most subsystems were monitored through custom tools and web applications, and logging information was scattered over several sources and typically accessible only by experts. In the last year, CMS computing fostered the adoption of common big data solutions based on open-source, scalable, and no-SQL tools, such as Hadoop, InfluxDB, and ElasticSearch, available through the CERN IT infrastructure. Such systems allow for the easy deployment of monitoring and accounting applications using visualisation tools such as Kibana and Grafana. Alarms can be raised when anomalous conditions in the monitoring data are met, and the relevant teams are automatically notified. Data sources from different subsystems are used to build complex workflows and predictive analytics (such as data popularity, smart caching, transfer latency), and for performance studies. We describe the full software architecture and data flow, the CMS computing data sources and monitoring applications, and show how the stored data can be used to gain insights into the various subsystems by exploiting scalable solutions based on Spark.
© The Authors, published by EDP Sciences, 2020
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.