Fast Calorimeter Simulation in ATLAS

Abstract. The ATLAS physics program at the LHC relies on very large samples of simulated events. Most of these samples are produced with Geant4, which provides a highly detailed and accurate simulation of the ATLAS detector. However, this accuracy comes with a high price in CPU, and the sensitivity of many physics analyses is already limited by the available Monte Carlo statistics and will be even more so in the future as datasets grow. To solve this problem, sophisticated fast simulation tools are developed, and they will become the default tools in ATLAS production in Run 3 and beyond. The slowest component is the simulation of the calorimeter showers. Those are replaced by a new parametrised description of the longitudinal and lateral energy deposits, including machine learning approaches, achieving a fast but accurate description. In this talk we will describe the new tool for fast calorimeter simulation that has been developed by ATLAS, review its technical and physics performance, and demonstrate its potential to transform physics analyses.


Introduction
Due to its complexity, full simulation of the ATLAS [1] detector at the LHC (in particular the calorimeter system) with Geant4 is very expensive. With the need for increased simulation statistics for Run 3 and the HL-LHC, an effort must be made to develop a simulation tool that is lightweight in terms of CPU requirements while still preserving a high quality of physics modelling.
FastCaloSim is a fast simulation tool for the ATLAS calorimeter system which relies on a parametrisation of the particle shower development to realistically model the detector response. This parametrisation can be split into two parts: a longitudinal piece, which handles the outward propagation of energy from the interaction point through each calorimeter layer, and a lateral piece, which deals with the shape a particle shower takes within each layer.
FastCaloSim has been used in the ATLAS fast simulation framework since Run 1 [2]. However, a variety of improvements to this tool have been made. We discuss here the baseline methods of the new fast calorimeter simulation, FastCaloSimV2, and describe ongoing developments in improving the modelling of hadronic showers. Full details of the ATLAS detector, including the calorimeter system, can be found in [1].

Energy Parametrisation and PCA
The longitudinal energy parametrisation describes the energy deposited in each calorimeter layer. Energy deposits in layers are highly correlated, making this difficult to model. FastCaloSimV2 thus relies on a technique called Principal Component Analysis (PCA) [3] to de-correlate the layers, aiding parametrisation.
The PCA chain transforms N energy inputs into N Gaussians and projects these Gaussians onto the eigenvectors of the corresponding covariance matrix. This results in N de-correlated components, as the eigenvectors are orthogonal. The component of the PCA decomposition with the largest corresponding eigenvalue is then used to define bins, in which showers demonstrate similar patterns of energy deposition across the calorimeter layers. To further de-correlate the inputs, the PCA chain is repeated on the showers within each such bin. This full process is reversed for the particle simulation. A full description of the method can be found in [4].

Shape Parametrisation
Modelling of the lateral shower shape makes use of 2D histograms filled with Geant4 hit energies in each layer and PCA bin. Binned in polar α − R coordinates in a local plane tangential to the surface of the calorimeter system, these histograms represent the spatial distribution of energy deposits for a given particle shower. Such histograms are constructed for a number of Geant4 events, and the histograms for each event are normalized to total energy deposited in the given layer. The average of these histograms is then taken (what is called here the "average shape").
In simulation, these average shape histograms are used as probability distributions, from which a finite number of equal energy hits are drawn. This finite drawing of hits induces a statistical fluctuation about the average shape which is tuned to match the expected calorimeter sampling uncertainty.
As an example, the intrinsic resolution of the ATLAS Liquid Argon calorimeter has a sampling term of σ samp ≈ 10%/ √ E [5]. The number of hits to be drawn for each layer, N layer hits , is thus taken from a Poisson distribution with mean 1/σ 2 samp , where the energy assigned to each hit is then just E hit = E layer N layer hits . This induces a fluctuation of the order of 10%/ √ E bin for each bin in the average shape. Figure 1 shows a comparison of energy and weta2 [6], defined as the energy weighted lateral width of a shower in the second electromagnetic calorimeter layer, for 16 GeV photons simulated with the current FastCaloSimV2 and with full Geant4 simulation. The agreement is quite good, with FastCaloSimV2 matching the Geant4 mean to within 0.3 and 0.03 percent respectively. Similar results are seen for other photon energies and η points. Figure 2 shows the ratio of calorimeter cell energies for single Geant4 photon and pion events to the corresponding cell energies in their respective average shapes. While the photon event is quite close to the corresponding average, the pion event shows a deviation from the average which is much larger and has a non-trivial structure, reflecting the different natures of electromagnetic and hadronic showering.  Figure 1. Energy and lateral shower width variable, weta2, for 16 GeV photons with full simulation (G4) and FastCaloSimV2 (FCSV2) [4].

Fluctuation Modelling
While the shape parametrisation described in Section 3 is thus sufficient for describing electromagnetic showers, we will demonstrate below that it is not sufficient for describing hadronic showers ( Figures 5 and 6). In Sections 6, 7, and 8, we therefore present and validate methods to improve this hadronic shower modelling.

Modelling Methods and Results
Two methods for modelling deviations from the average shape have been studied: (1) a neural network based approach using a Variational Autoencoder (VAE) [8] and (2) a map through cumulative distributions to an n-dimensional Gaussian. With both methods, the shape simulation then proceeds as described in Section 3, with the drawing of hits according to the average shape. However, these hits no longer have equal energy, but have weights applied to increase or decrease their energy depending on their spatial position. This application of weights is designed to mimic a realistic shower structure and to encode correlations between energy deposits.
Both methods are trained on ratios of energy in binned units called voxels. This voxelization is performed in the same polar α − R coordinates as the average shape, with a 5 mm core in R and 20 mm binning thereafter. There are a total of 8 α bins from 0 to 2π and 8 additional R bins from 5 mm to 165 mm. The 5 mm core is filled with the average value of core voxels across the 8 α bins when creating the parametrisation. However, during simulation, each of these 8 core bins is treated independently. The outputs of both methods mimic these energy ratios and are used in the shape simulation as the weights described above. In contrast to an approach based on, e.g., calorimeter cells, using voxels allows for flexibility in tuning the binning used in creating the parametrisation. Further, due to their relatively large size, using calorimeter cells is subject to "edge effects", where the splitting of energy between cells has a non-trivial effect on the observed energy ratio. The binning used here is of the order of half of a cell size, mitigating this effect.
The Gaussian method operates by using cumulative distributions to map Geant4 energy ratios to a multidimensional Gaussian distribution. New events are generated by randomly sampling from this Gaussian distribution.
For the VAE method, a system of two linked neural networks is trained to generate events. The first "encoder" neural network maps input Geant4 energy ratios to a lower dimensional latent space. A second "decoder" neural network then samples from that latent space and tries to reproduce the inputs. In simulation, events are generated by taking random samples from the latent space and passing them through the trained decoder. Gauss Generated VAE Generated G4 Input Figure 4. Correlation coefficient of ratios of voxel energy in single events to the corresponding voxel energy in the average shape, examined between the core bin from α = 0 to 2π/8 and each of the other voxels. The periodic structure represents the binning in α, and the increasing numbers in each of these periods correspond to increasing R, where the eight points with correlation coefficient 1 are the eight core bins. Both the Gaussian and VAE generated toy events are able to reproduce the major correlation structures for 65 GeV central pions in EMB2 [7]. Figure 3 shows the distributions of input Geant4 and Gaussian method generated energy ratios in the grid of voxels. Figure 4 shows the correlation coefficient between the centre voxel from α = 0 to 2π/8 for input Geant4 and the Gaussian and VAE fluctuation methods. Agreement is good throughout.

Fluctuation Validation
Validation of the Gaussian and VAE fluctuation methods was performed within FastCaloSimV2. Figure 5 shows the energy ratio of cells for a given simulation to the corresponding cells in the average shape as a function of the distance from the shower centre.
The mean for all simulation methods is expected to be around 1, with deviation from the average (the RMS fluctuation) shown by the error bars. The Gaussian method RMS (red) and VAE method RMS (green) both match the Geant4 RMS (yellow) better than the case without correlated fluctuations (blue) for a variety of energies, η points, and layers, often reproducing 80 − 100 % of the Geant4 RMS magnitude, compared to the 5 − 30 % observed in the no correlated fluctuations case.

Application in ATLAS Simulation
Number of clusters

Conclusions
FastCaloSimV2 is a crucial part of the future of simulation for the ATLAS Experiment at the LHC. The per event simulation time of the full detector with Geant4, calculated over 100 tt events, is 228.9 s. Using FastCaloSimV2 for the calorimeter simulation reduces this to 26.6 s, of which FCSV2 itself is only 0.015 s. Good physics modelling is achieved, and hadronic showers are improved with the correlated fluctuations method.