Testing and verification of the LHCb Simulation

Monte-Carlo simulation is a fundamental tool for high-energy physics experiments, from the design phase to data analysis. In recent years its relevance has increased due to the ever growing measurements precision. Accuracy and reliability are essential features in simulation and particularly important in the current phase of the LHCb experiment, where physics analysis and preparation for data taking with the upgraded detector need to be performed at the same time. In this paper we will give an overview of the full chain of tests and procedures implemented for the LHCb Simulation software stack to ensure the quality of its results. The tests comprise simple checks to validate new software contributions in a nightlies system as well as more elaborate checks to probe simple physics and software quantities for performance and regression verifications. Commissioning of a new major version of the simulation software for production implies also validating its impact using a few physics anlayses. A new system for Simulation Data Quality (SimDQ) that is being put in place to help in the first phase of commissioning and for fast verification of all samples produced is also discussed.


Introduction
High-energy physics experiments rely on Monte-Carlo simulations for many things, such as understanding the effects and performance of detectors, sanity checks of analyses, etc. All these studies require a high level of accuracy of the simulations and thus implicitly high quality of the software tools which produce them. The central pillar of the simulation software for LHCb is Gauss application [1] which we will use as a main example in this paper.
In the past several years the LHCb collaboration developed an efficient and smooth workflow for developing, testing and deploying its software [2]. For the LHCb simulation applications to be put into production, this procedure consists of several distinct milestones: software commissioning, performance and regression testing [3], production validation and an additional step called 'Simulation Data Quality [4]' which is planned for the near future.

Software Commissioning
The commissioning stage of the LHCb Simulation corresponds to the phase of an active development of a new simulation software stack. A typical example is a development of a new major release of the LHCb simulation application Gauss, which behind the scenes may require migration to a more recent versions of Monte-Carlo generators, upgrade of the underlying software framework (Gaudi [5]) and simulation toolkit (GEANT4 [6]). It may also bring a more modern set of development tools, e.g. recent versions of compilers and building tools.
The main goal of the commissioning phase is to have a new version of the simulation software stack to compile, execute and finalise successfully, without any technical faults and rule out gross mistakes. The LHCb nightly build system [7] is utilised to streamline the development process. The different simulation software stacks 1 developed are handled by their corresponding nightly building slots, which automatically configure the required build environment, apply patches and take care of building the projects. Compilation, linking and other problems occurring in the build process are reported to the developers. In case that the build finalised successfully, nightly tests are executed for the software stack.

Nightly tests
The nightly tests are designed to be simple, fast to execute without validating a spectrum of observables, ideally check just a single property. Their main goal is to verify that the built software works -the application starts, runs and finalises successfully, that there are no missing external libraries and those which are picked up, the underlying frameworks and toolkits, are exactly of the versions which developers intended to utilise. Often these tests are expected to fail, for example when a tuning of a Monte-Carlo generator is updated, to report that there was a change in the validated observable. This is usually the case during heavy development of the simulation software, when significant portions of the software stack are touched or altogether replaced. One would naturally expect to see changes in the simulation application's output when migrating to a more recent version of a Monte-Carlo generator, e.g. PYTHIA 8 [8], or upgrading the mechanism which handles propagation and interaction of particles with matter, such as GEANT4. The nightly tests are designed with a time constraint in mind -they are expected to take from a couple of minutes to a maximum of half an hour to execute and not to stall the nightly build system.

Performance and Regression tests
The the active development phase concludes with the implementation of all of the main breaking changes. The nightly tests validate that the modified simulation software stack builds and works from the purely technical point of view, and the verification procedure transitions into the performance and regression (PR) testing phase. After the active development phase is over, the main breaking changes have been implemented and with the help of the nightly tests it was verified that the new version of the simulation software stack builds and works from the purely technical point of view, the verification procedure transitions into the performance and regression (PR) testing phase. The nightly build system is configured to execute PR tests on the commissioned simulation software. These tests are much more sophisticated than the nightly tests and often implement full physics analyses to output plots of distributions of a wide spectrum of physics observables. The PR tests are flexibly set up and support different configurations, such as execution of separate standalone parts of the simulation software stack (e.g. GEANT4 standalone applications), launching only the generator phase to monitor the behaviour of the Monte-Carlo generators or running the full simulation with propagation of  [9]. The LHCb Simulation group uses many PR tests (see Table 1) with various data taking conditions and Monte-Carlo generators to validate the simulation software during this phase. Compared to the nightly tests, PR tests require much more time to execute. Typically for the tests which validate the results of GEANT4 standalone applications or Monte-Carlo generators, the size of samples produced are limited to O(10 4 ) simulated events. In the case of running the full simulation with propagation of particles in the detector, the size of the data samples produced is lowered to O(10 3 ) events, which on a single core of a modern CPU may take between four to six hours, depending on the simulated beam conditions such as simulated primary interacting particles, their energies, etc..

LHCbPR web interface
The results produced by the PR tests are automatically processed, stored and available for examination in a web interface (LHCbPR front-end). Here users can select the results of different tests, filtering them by the target platform (e.g. OS and compiler versions), configurations (e.g. LHCb data taking conditions of specific Monte-Carlo generators), nightly build versions, etc. In the case when PR tests store ROOT [10] files, it is possible to browse the files and draw their content (histograms, graphs, etc.). For the same categories of PR tests, the web interface allows to display the selected plots from different dates side-by-side, present them superimposed or show the ratio of two histograms which makes it easier to spot discrepancies.

Simulation production validation
Addressing all the issues revealed with the nightly and PR tests opens the door to the production validation. The new simulation software stack is utilised in simulation jobs on the Grid [11] to produce larger data samples (O(10 6 ) events). Some problems are spotted only on live, ongoing productions at runtime, such as technical problems with large productions on the Grid, troubles with reconstructions, discrepancies in physics distributions, etc.
Given the significantly larger data samples produced at this stage, compared to the previous ones, LHCb Physics Performance Working Groups (PWG) are involved in the process of verification of the results at this point. Bigger samples allow PWG to use the produced data in their physics analyses, explore the physics processes which are impossible to analyse on smaller data samples and spot discrepancies in physics distributions, and verify detector modelling and its performance.

Simulation Data Quality
The three milestones described help to achieve the better quality of the LHCb simulation software, but they are not an absolute guarantee of the absence of any kinds of problems. Launching a large campaign of production of hundreds of millions simulated events only to discover in the end that the produced data are useless due to a human error or a technical problem is a huge loss of computing and human resources. For this reason the LHCb collaboration is at work to introduce an additional phase in its validation process for simulationthe Simulation Data Quality for Monte-Carlo (SimDQ).
The purpose of SimDQ is to automatically verify the quality of any new sample in an ongoing large scale simulation production by checking configurations and sampling the results of a subset of the production jobs at every stage (simulation, reconstruction, etc.). Experts from the PWG will be able to define a set of distributions of interest which may depend on different things, such as beam settings, specific event topology, etc., and provide references for them. The references will be associated with the specific steps of productions. At runtime during these steps, distributions will be produced and compared by shifters to the associated references. Later this comparison procedure may be automated. In case of a significant mismatch, the production will be put on hold and the issue will be reported to experts, who will be able to investigate the produced distributions in a web interface and make a decision either to continue the production or to halt it completely.