EPJ Web Conf.
Volume 214, 201923rd International Conference on Computing in High Energy and Nuclear Physics (CHEP 2018)
|Number of page(s)||5|
|Section||T8 - Networks & facilities|
|Published online||17 September 2019|
Evolution of monitoring, accounting and alerting services at INFN-CNAF Tier-1
2 >Università di Bologna, Sede di Cesena, Italy
* e-mail: firstname.lastname@example.org
Published online: 17 September 2019
CNAF is the national center of INFN (Italian National Institute for Nuclear Physics) for IT technology services. The Tier-1 data center operated at CNAF offers computing and storage resources to scientific communities as those working on the four experiments of LHC (Large Hadron Collider) at CERN and other 30 experiments in which INFN is involved. In past years, monitoring and alerting services for Tier-1 resources were performed with several software, such as LEMON (developed at CERN and customized on the char-acteristics of datacenters managing scientific data), Nagios (especially used for alerting purposes) and a system based on Graphite database and other ad-hoc developed services and web pages. By 2015, a task force has been organized with the purpose of defining and deploying a common infrastructure (based on Sensu, InfluxDB and Grafana) to be exploited by the different CNAF depart-ments. Once the new infrastructure was deployed, a major task was then to adapt the whole monitoring and alerting services. We are going to present the steps that the Tier-1 group followed in order to accomplish a full migration, that is now completed with all the new services in production. In particular we will show the monitoring sensors and alerting checks redesign to adapt them to the infrastructure base on the Sensu software, the web dashboards creation for data presentation, the porting of historical data from LEMON/Graphite to InfluxDB.
© The Authors, published by EDP Sciences, 2019
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.