Issue |
EPJ Web Conf.
Volume 245, 2020
24th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2019)
|
|
---|---|---|
Article Number | 04032 | |
Number of page(s) | 6 | |
Section | 4 - Data Organisation, Management and Access | |
DOI | https://doi.org/10.1051/epjconf/202024504032 | |
Published online | 16 November 2020 |
https://doi.org/10.1051/epjconf/202024504032
An Information Aggregation and Analytics System for ATLAS Frontier
1
Université Paris-Saclay, CEA/Saclay IRFU, 91191 Gif-sur-Yvette, France
2
University of Texas at Arlington, Department of Physics, Arlington Texas 76019, USA
3
Ecole Nationale Supérieure d’Informatique, Alger Oued Smar 16309, Algeria
4
University of Valencia, Instituto de Física Corpuscular, Parque Científico, E-46980 Paterna, Spain
5
University of Oxford, Denys Wilkinson Bldg, Keble Rd, Oxford OX1 3RH, UK
6
University of Chicago, Enrico Fermi Institute, 933 East 56th Street, Chicago IL 60637, USA
* e-mail: Andrea.Formica@cea.fr
** e-mail: Nurcan.Ozturk@cern.ch
*** e-mail: em_si_amer@esi.dz
**** e-mail: Julio.Lozano.Bahilo@cern.ch
† e-mail: Elizabeth.Gallas@physics.ox.ac.uk
‡ e-mail: Ilija.Vukotic@cern.ch
§ Copyright 2020 CERN for the benefit of the ATLAS Collaboration. CC-BY-4.0 license.
Published online: 16 November 2020
ATLAS event processing requires access to centralized database systems where information about calibrations, detector status and data-taking conditions are stored. This processing is done on more than 150 computing sites on a world-wide computing grid which are able to access the database using the Squid-Frontier system. Some processing workflows have been found which overload the Frontier system due to the Conditions data model currently in use, specifically because some of the Conditions data requests have been found to have a low caching efficiency. The underlying cause is that non-identical requests as far as the caching are actually retrieving a much smaller number of unique payloads. While ATLAS is undertaking an adiabatic transition during the LHC Long Shutdown 2 and Run 3 from the current COOL Conditions data model to a new data model called CREST for Run 4, it is important to identify the problematic Conditions queries with low caching efficiency and work with the detector subsystems to improve the storage of such data within the current data model. For this purpose ATLAS put together an information aggregation and analytics system. The system is based on aggregated data from the Squid-Frontier logs using the Elasticsearch technology. This paper§ describes the components of this analytics system from the server based on Flask/Celery application to the user interface and how we use Spark SQL functionalities to filter data for making plots, storing the caching efficiency results into a Elasticsearch database and finally deploying the package via a Docker container.
© The Authors, published by EDP Sciences, 2020
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.