Issue |
EPJ Web of Conf.
Volume 295, 2024
26th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2023)
|
|
---|---|---|
Article Number | 04002 | |
Number of page(s) | 7 | |
Section | Distributed Computing | |
DOI | https://doi.org/10.1051/epjconf/202429504002 | |
Published online | 06 May 2024 |
https://doi.org/10.1051/epjconf/202429504002
Accounting and monitoring tools enhancement for Run 3 in the ATLAS distributed computing
1 University of Texas at Arlington, TX, USA
2 Università e INFN Genova, Italy
3 Czech Academy of Sciences, Czech Republic
* e-mail: aleksandr.alekseev@cern.ch
Published online: 6 May 2024
The ATLAS experiment at the LHC utilizes complex multicomponent distributed systems for processing (PanDA WMS) and managing (Rucio) data. The complexity of the relationships between components, the amount of data being processed and the continuous development of new functionalities of the critical systems are the main challenges to consider when creating monitoring and accounting tools able to adapt to this dynamic environment in a short time. To overcome these challenges, ATLAS uses the unified monitoring infrastructure (UMA) provided by CERN-IT since 2018, which accumulates information from distributed data sources and then makes it available for different ATLAS distributed computing user groups. The information is displayed using Grafana dashboards. Based on the information provided, they can be grouped as “data transfers”, “site accounting”, “jobs accounting” and so on. These monitoring tools are used daily by ATLAS members to spot and fix issues. In addition, LHC Run 3 required the implementation of significant changes in the monitoring and accounting infrastructure to collect and process data collected by ATLAS during the LHC run. This paper describes the recent enhancements to the UMA-based monitoring and accounting dashboards.
© The Authors, published by EDP Sciences, 2024
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.