Performance of the Belle II Conditions Database

The Belle II experiment at KEK observed its first collisions in the summer of 2018. Processing the large amounts of data that will be produced requires conditions data to be readily available to systems worldwide in a fast and efficient manner that is straightforward for both the user and maintainer. This was accomplished by relying on industrystandard tools and methods: the conditions database is built as an HTTP REST service using tools such as Swagger for the API interface development, Payara for the Java EE application server, and Squid for the caching proxy. This article presents the design of the Belle II conditions database production environment as well as details about the capabilities and performance during both Monte Carlo campaigns and data reprocessing. 1 The Belle II Experiment The Belle II experiment is part of a broad-based search for new physics in the Intensity Frontier, focused on precisely measuring and comparing with theory branching fractions, angular distributions, CP asymmetries, forward-backward asymmetries, and a host of other observables. It is the successor to the highly-successful Belle experiment, which provided the experimental foundation for the 2008 Nobel Prize in Physics in CP violation [1]. The SuperKEKB accelerator upgrade will provide asymmetric electron-positron beams tuned to produce large quantities of B/anti-B meson pairs at 40x the instantaneous luminosity of KEKB and 50x the data taken with Belle. Initial “Phase 2” data-taking started in May 2018, concurrent with beam-tuning efforts, and “Phase 3” full running starts in 2019. Figure 1 presents the various interactions that the conditions database (DB) needs to support. The primary purpose of the conditions database is to support data processing across the international Belle II computing grid, shown as the WAN connection at lower left that connects to the KEK computing center and Tier 1 computing centers, as well as interactive users. Data enters the database via the calibration cycle, which uses data collected by the Belle II DAQ at KEK and processes it into the appropriate format for the * Corresponding author: lynn.wood@pnnl.gov © The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (http://creativecommons.org/licenses/by/4.0/). EPJ Web of Conferences 214, 04050 (2019) https://doi.org/10.1051/epjconf/201921404050 CHEP 2018


The Belle II Experiment
The Belle II experiment is part of a broad-based search for new physics in the Intensity Frontier, focused on precisely measuring and comparing with theory branching fractions, angular distributions, CP asymmetries, forward-backward asymmetries, and a host of other observables. It is the successor to the highly-successful Belle experiment, which provided the experimental foundation for the 2008 Nobel Prize in Physics in CP violation [1]. The SuperKEKB accelerator upgrade will provide asymmetric electron-positron beams tuned to produce large quantities of B/anti-B meson pairs at 40x the instantaneous luminosity of KEKB and 50x the data taken with Belle. Initial "Phase 2" data-taking started in May 2018, concurrent with beam-tuning efforts, and "Phase 3" full running starts in 2019. Figure 1 presents the various interactions that the conditions database (DB) needs to support. The primary purpose of the conditions database is to support data processing across the international Belle II computing grid, shown as the WAN connection at lower left that connects to the KEK computing center and Tier 1 computing centers, as well as interactive users. Data enters the database via the calibration cycle, which uses data collected by the Belle II DAQ at KEK and processes it into the appropriate format for the conditions database. Currently this is done offline, but is envisioned to become significantly more automated as Belle II matures.

Fig. 1.
Interactions of the Belle II conditions database with production nodes, interactive users, and the Belle II data acquisition system.

Concepts
A conditions database holds time-dependent status of detectors for data processing and reprocessing, consisting of either constants or time-varying parameters such as calibration settings, geometry, alignment, etc.
User access to the Belle II database is through a REST API, meaning that all accesses take the form of HTTP requests. For example, the following HTTP request would return a list of payloads for a given global tag, experiment and run number: This type of interface provides several key advantages to Belle II: it restricts DB access to allowed operations only, and makes it easy to scale with industry standard (HTTP) tools. Responses can be formatted in either XML or JSON format.
Payloads are kept as external files outside of the database; a request for a payload returns a partial URL of the payload file, which can be combined with a hostname to access a file server containing the payload. This method avoids a database bottleneck for large payload files, and allows for different scenarios for file storage and transfer.

Global Tags and Intervals of Validity
The data in the Belle II conditions database is organized using several structures. Intervals of Validity (IOVs) specify the starting and ending experiments and runs for which a payload is valid. A given payload may have multiple IOVs assigned to it if it is valid for different periods of time.
A Global Tag contains a list of IOV-payload relationships, and is used to select a complete set of conditions for a given reprocessing effort. The IOVs and payloads can be reused in different global tags, and the global tags can be classified by type (development, release, online, data, etc.) and have evolving status over time (new, published, invalid, etc.).

Service Layers
When operating at the expected full processing capacity, the Belle II computing grid will generate ~100 database requests per second, and can burst to significantly higher rates. To attain the required performance, various features of commercial tools designed for scalability are used.
As examples: Squid HTTP caches [2] are used to support repeated requests for the same query. These are configured as reverse proxy, i.e. optimized for many clients and few servers. These serve to cache the most common global tags in Belle II. Requests that are not yet found in the Squid caches reach the "b2s" Belle II service layer. This layer is implemented in Java using Payara Micro [3], a microservice JavaEE server, which translates REST requests into SQL queries. The REST API is built using Swagger [4], which can auto-generate client and server code, as well as provide interactive online documentation of the API. The storage back-end behind these layers is the PostgreSQL [5] relational database, which resides on the IBM General Parallel File System (GPFS) [6].

Belle II Database Server Architecture
As mentioned previously, the database and payloads are kept on separate servers for improved performance and support. Both servers are built with multiple instances of all components, using Docker [7] containers to provide easy support, greater reliability, and improved performance. As shown on the top in Figure 2, the database service consists of multiple nodes behind an HTTP load balancer. Each node has its own dedicated Squid cache and multiple instances of the b2s service which has a connection to query the single database service. The database service contains both a primary database as well as a hot standby backup database to minimize any downtime.
As shown at the bottom in Figure 2, the payload file is designed similarly, with multiple nodes behind an HTTP load balancer. Each node contains multiple instances of the NGINX [8] high-performance HTTP server, and the payload files themselves are stored in the IBM GPFS accessible by all nodes.
The primary interface is through the Belle II Analysis Software Framework [9], known as basf2. To simplify efforts, several assumptions are made about the conditions data. First, payload IOV granularity is assumed to be at the run level, as opposed to time or event granularity. This is expected to be sufficient for the majority of the conditions data in Belle II, but if sub-run granularity is required basf2 provides support for storing event numberbased granularity dependence inside of a single run payload. Another assumption made is that all payloads are stored as ROOT [10] objects. This is not a requirement of the database system itself since the payloads are simply files stored on a file server, but makes the analysis software more manageable.
The DBStore service inside of basf2 manages storage and automatic updates of payloads as data is processed. This is handled through C++ template classes that always provide the user with a pointer to the currently-correct payload. The DBStore service ensures that the pointer is updated when a new IOV becomes valid for a payload, making this rundependence transparent to the user. basf2 also supports accessing IOVs and payloads from either a remote database or a local directory, allowing development and testing of new conditions data payloads before loading them into the global database. This is handled by creating a "chain" of database locations to search for a given payload in the basf2 steering file. The relationship between the conditions database components in basf2 is shown in Figure 3.
Due to the easy accessibility of the HTTP API, several other interfaces are also available. The Swagger tool used to develop the API provides online documentation of all queries, and allows the user to interactively send queries and view responses. A command-line client written in Python [11] is also available, and is currently the primary method of adding new payloads to the database.

Service Layers
The Belle II conditions database is a sophisticated system with many interactions between the different service layers. Extensive monitoring of all layers is implemented using Prometheus [12] and Grafana [13]; example parameters to be monitored include CPU and RAM usage, disk latency, input HTTP request rate, HTTP response types, and so on.

Database Performance
The current Monte Carlo campaigns and the data reprocessing for data taken in 2018 do not put a significant load on the DB and payload file servers. The current biggest stressor is users submitting large numbers of non-grid-based jobs, which cause a significant spike in requests in a short period of time. To ensure that the database is performing to the required level, directed tests are necessary to confirm that performance is sufficient for the expected full processing load.
Multiple types of tests are performed to ensure proper database behavior via Monte Carlo job submissions to the grid, scripting, and the Gatling HTTP load tester [14]. These tests include verification of basf2 software releases as well as specific scripts to generate "problem" workflows, such as the non-grid user submissions mentioned above.
In one example of performance testing, the Gatling tool was configured to generate 400 requests per second for a period of 15 minutes. The requests were randomly selected from a list of 21,000 actual request types recorded from HTTP logs. As the test ran, the network bandwidth and CPU usage was monitored, as well as the actual rates of requests submitted and responses received. No performance issues were seen during this test, and all parameters showed steady values, indicating sufficient resources were available for this type of workload.

Upcoming Development
While already in use by the Belle II experiment, a number of improvements are planned and in progress for the conditions database.
Authentication and authorization is currently not present for database access. The expectation is that all reads will be open, but write access needs to be user authenticated and authorized. Several methods of implementing this functionality are being investigated, including leveraging the existing X.509 certification authentication used for Belle II grid processing. It has been proposed to support three roles for database users: "user", which would allow read-only access; "developer", which would support adding data to existing global tags; and "coordinator", who can create and manipulate global tags. These access rights would also vary with the type of global tag, to better control the update process.
While the implemented servers are fully capable of supporting Belle II during full production, the loss of networking access to the single location at Brookhaven National Laboratory would prevent proper operation of data reprocessing. To this end, replication of the system to other sites is being investigated. An intermediate solution being used currently is replicating the database contents and payload files on a distributed file system (CVMFS [15]), but as the database contents grow that may become a burden on the distributed grid site nodes. PostgreSQL supports standard replication such as streaming or log-based methods, although replication to remote sites is more complicated. Because of this, persistent remote Squid caching is being considered as a solution as well, where long cache invalidation times are implemented.