The HSF Conditions Database Reference Implementation

Ruslan Mashinistov; Lino Gerlach; Paul Laycock; Andrea Formica; Giacomo Govi; Chris Pinkenburg

doi:10.1051/epjconf/202429501051

Open Access

Issue		EPJ Web of Conf. Volume 295, 2024 26^th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2023)


Article Number		01051
Number of page(s)		8
Section		Data and Metadata Organization, Management and Access
DOI		https://doi.org/10.1051/epjconf/202429501051
Published online		06 May 2024

EPJ Web of Conferences 295, 01051 (2024)
https://doi.org/10.1051/epjconf/202429501051

The HSF Conditions Database Reference Implementation

Ruslan Mashinistov¹^*, Lino Gerlach¹^**, Paul Laycock¹^***, Andrea Formica, Giacomo Govi and Chris Pinkenburg¹

¹ Brookhaven National Lab US

^* e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
^** e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.
^*** e-mail: This email address is being protected from spambots. You need JavaScript enabled to view it.

Published online: 6 May 2024

Abstract

Conditions data is the subset of non-event data that is necessary to process event data. It poses a unique set of challenges, namely a heterogeneous structure and high access rates by distributed computing. The HSF Conditions Databases activity is a forum for cross-experiment discussions inviting as broad a participation as possible. It grew out of the HSF Community White Paper work to study conditions data access, where experts from ATLAS, Belle II, and CMS converged on a common language and proposed a schema that represents best practice. Following discussions with a broader community, including NP as well as HEP experiments, a core set of use cases, functionality and behaviour was defined with the aim to describe a core conditions database API. This paper will describe the reference implementation of both the conditions database service and the client which together encapsulate HSF best practice conditions data handling.

Django was chosen for the service implementation, which uses an ORM instead of the direct use of SQL for all but one method. The simple relational database schema to organise conditions data is implemented in PostgreSQL. The task of storing conditions data payloads themselves is outsourced to any POSIX-compliant filesystem, allowing for transparent relocation and redundancy. Crucially this design provides a clear separation between retrieving the metadata describing which conditions data are needed for a data processing job, and retrieving the actual payloads from storage. The service deployment using Helm on OKD will be described together with scaling tests and operations experience from the sPHENIX experiment running more than 25k cores at BNL.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.