EPJ Web Conf.
Volume 214, 201923rd International Conference on Computing in High Energy and Nuclear Physics (CHEP 2018)
|Number of page(s)||8|
|Section||T8 - Networks & facilities|
|Published online||17 September 2019|
MONIT: Monitoring the CERN Data Centres and the WLCG Infrastructure
CERN, European Laboratory for Particle Physics,
2 2 Universidad de Oviedo, Oviedo, Spain
3 Wroclaw University of Science and Technologys, Wroclaw, Poland
* e-mail: email@example.com
Published online: 17 September 2019
The new unified monitoring architecture (MONIT) for the CERN Data Centres and for the WLCG Infrastructure is based on established open source technologies to collect, stream, store and access monitoring data. The previous solutions, based on in-house development and commercial software, have been replaced with widely- recognized technologies such as Collectd, Kafka, Spark, Elasticsearch, InfluxDB, Grafana and others. The monitoring infrastructure, fully based on CERN cloud resources, covers the whole workflow of the monitoring data: from collecting and validating metrics and logs to making them available for dashboards, reports and alarms. The deployment in production of this new DC and WLCG monitoring is well under way and this contribution provides a summary of the progress, hurdles met and lessons learned in using these open source technologies. It also focuses on the choices made to achieve the required levels of stability, scalability and performance of the MONIT monitoring service.
© The Authors, published by EDP Sciences, 2019
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.