Optimizing access to conditions data in ATLAS event data processing

The processing of ATLAS event data requires access to conditions data which are stored in database systems. This data includes, for example alignment, calibration, and configuration information which may be characterized by large volumes, diverse content, and/or information which evolves over time as refinements are made in those conditions. Additional layers of complexity are added by the need to provide this information across the worldwide ATLAS computing grid and the sheer number of simultaneously executing processes on the grid, each demanding a unique set of conditions to proceed. Distributing this data to all the processes that require it in an efficient manner has proven to be an increasing challenge with the growing needs and numbers of event-wise tasks. In this presentation, we briefly describe the systems in which we have collected information about the database content and the use of conditions in event data processing. We then proceed to explain how this information has been used not only to refine reconstruction software and job configuration but also to guide modifications of underlying conditions data configuration and in some cases, rewrites of the data in the database into a more harmonious form for offline usage in the processing of both real and simulated data. 1 The ATLAS Conditions Database All processing of event data collected by the ATLAS [1] experiment at the CERN LHC is performed using a complementary set of data in the ATLAS Conditions Database (DB) which describes the configuration, state, calibration and alignment of the detector sub-components during the time interval in which the event data was taken. The database design is based on the LCG Conditions Database and the database is accessed by clients using the COOL API, both of which were developed by the CERN IT department for the LCG [2]. The cardinal repositories of the data reside in two Oracle DBMS instances at CERN; one for online data taking and another for offline processing. The offline instance is replicated to three Tier 1 sites (TRIUMF in Canada, IN2P3 in France, and RAL in the UK) to provide global access to the data on the WLCG. ∗e-mail: rinaldi@bo.infn.it ∗∗e-mail: gallas@cern.ch ∗∗∗Copyright [2018] CERN for the benefit of the [ATLAS Collaboration]. CC-BY-4.0 license.

Abstract. The processing of ATLAS event data requires access to conditions data which are stored in database systems. This data includes, for example alignment, calibration, and configuration information which may be characterized by large volumes, diverse content, and/or information which evolves over time as refinements are made in those conditions. Additional layers of complexity are added by the need to provide this information across the worldwide ATLAS computing grid and the sheer number of simultaneously executing processes on the grid, each demanding a unique set of conditions to proceed. Distributing this data to all the processes that require it in an efficient manner has proven to be an increasing challenge with the growing needs and numbers of event-wise tasks. In this presentation, we briefly describe the systems in which we have collected information about the database content and the use of conditions in event data processing. We then proceed to explain how this information has been used not only to refine reconstruction software and job configuration but also to guide modifications of underlying conditions data configuration and in some cases, rewrites of the data in the database into a more harmonious form for offline usage in the processing of both real and simulated data.

The ATLAS Conditions Database
All processing of event data collected by the ATLAS [1] experiment at the CERN LHC is performed using a complementary set of data in the ATLAS Conditions Database (DB) which describes the configuration, state, calibration and alignment of the detector sub-components during the time interval in which the event data was taken. The database design is based on the LCG Conditions Database and the database is accessed by clients using the COOL API, both of which were developed by the CERN IT department for the LCG [2]. The cardinal repositories of the data reside in two Oracle DBMS instances at CERN; one for online data taking and another for offline processing. The offline instance is replicated to three Tier 1 sites (TRIUMF in Canada, IN2P3 in France, and RAL in the UK) to provide global access to the data on the WLCG. Conditions data are in general not event-wise, but extend over longer time intervals from minutes to hours in duration. The data collectively characterizes the states of all ATLAS subsystems, organized by "Intervals of Validity" (IOV), which is the span in time over which that data is valid. The data is collected from many sub-systems such as the Detector Control System (DCS), trigger and DAQ, Data Quality, ATLAS sub-detectors, and the LHC accelerator complex.
Conditions data are stored in several Oracle schemas according to subsystem and whether the data is used online or exclusively offline. This division is historic, dating from before the start of LHC Run 1 data taking, in order to cleanly divide the data in the database for the different subsystems and its online usage. Within each schema, the data is logically grouped into "folders" so that clients may selectively access that folder's data when needed.
Folders are classified as single-version (SV) when the nature of the data is such that it has a fixed value in each IOV (e.g. a configuration setting). A multi-version (MV) folder, on the other hand, is used for folders where more than one set of data can apply in a given IOV (e.g. a calibration or alignment). Online applications save conditions which have been observed during a run such as configuration, run parameters, measured physical parameters, etc. This type of data is generally stored in SV folders, while data stored from the offline (such as calibrations resulting from offline studies) are stored in MV folders (leaving the possibility to refine them in future). MV folder data is always associated with a version, called a "folder tag". When event data processing requires MV folders, the folder tags for each folder requested by the client must be specified: sets of folder tags are put into a group called a "global tag" (a collection of folder tags) which can more succinctly be cited in client task configuration. At the present time, all such data is stored in the ATLAS Conditions DB, which currently contains data in over 2300 folders, distributed over 32 schemas. Conditions data are accessed in all ATLAS event data processing activities, in many phases of analysis, as well as in monitoring and calibration studies by subsystems.
A folder's "payload" is the set of values stored in the relational database columns of that folder. The LCG Conditions database software leaves a lot of freedom in the choice of the payload structure and on the column data types. Allowed data types include the standard numeric and string types, along with LOB (Character or Binary Large Object types, i.e. CLOBs and BLOBs). In addition, special string-type columns are used to designate references to external data files (file GUIDs and internal files pointers) registered in the ATLAS DDM/Rucio system [3].
In offline distributed computing, Athena [4] jobs are efficiently provided with concurrent conditions data access from the ATLAS Conditions DB via a caching system which mediates client requests: the Frontier [5] system. If a client queries data from a folder with the special external references, Athena methods provide the data from a distributed CVMFS [6] file system. For the case of clients operating in environments without network access to Frontier, the required conditions are extracted into SQLite files which can be provided to the processing server so that jobs can access the data file locally.
In this proceedings, we describe how we have built upon existing tools and resources to gain an improved understanding of database usage by clients and of the structure and content of the database itself. We further describe how we used this knowledge to 1. direct subsystems toward more efficient utilization of the Conditions DB infrastructure for their data storage and 2. develop improved tools for building collections of Conditions data into SQLite files for use by specific data processing tasks. This knowledge also puts us in a better position for further improvements in how we store and use this data.

Conditions DB Content
The LGC Conditions database and COOL API were designed and implemented in the years before the start of LHC Run 1 for generic use by experiments. The design factored in the experience from previous experiments as well as from feedback from the experiments who planned to use it (including ATLAS). Knowing that some database tables may be large and that all queries via the COOL API would be qualified with filters on key columns such as IOV and tag names, the database design includes the indexing of those columns and the COOL API includes SQL hints with every query: these measures considerably improve the read performance of the database for clients by instructing the database how best to execute each query (e.g. whether to use the index or not).
The above design considerations served the experiments using COOL well throughout Run 1 as long as, it was thought, the table volumes were well regulated. In order to manage volumes, ATLAS deployed an "COOL Monitor" application which stores daily a set of metrics associated with each folder and generates a simple HTML report based on that data. The report is intended for subsystem and database experts to use to view folder-wise insert rates and volume. A snapshot of this report is shown in Figure 1. Specifically, this web page shows, for each COOL DB schema and instance, the folder name and type (SV or MV), the average row length of the payload tables, the number of rows, channels and tags in the last day, week and month. This interface worked well enough in the Run 1 start-up to recognize and reduce unnecessarily high rates, but the interface gradually fell into disuse once data entry procedures stabilized.
With the start of Run 2, the application was updated to include new folders as they were created over time. There was a slight degradation in read performance which was not immediately recognized, obscured by Oracle storage capacity improvements. As Run 2 evolved however, demands for conditions increased with an ever increasing number of jobs on the grid, creating a significant degradation in conditions delivery per job. The COOL Monitor itself was found to have a number of limitations in its misleading assumptions in the use of its input data and limited scope for improvement in its current implementation. We realized that we needed better tools to understand not only job requests, but also to understand the data content in a more thorough and systematic way, not only for current operations, but to design a better system for Run 3[7] as well.
The COMA application, which had already aggregated metadata about Conditions DB folder and tags [8] was a natural place to add more detailed information about each folder.