Fast simulation of electromagnetic particle showers in high granularity calorimeters

Abstract. The future need of simulated events by the LHC experiments and their High Luminosity upgrades, is expected to increase by one or two orders of magnitude. As a consequence, research on new fast simulation solutions, including deep Generative Models, is very active and initial results look promising. We have previously reported on a prototype that we have developed, based on 3 dimensional convolutional Generative Adversarial Network, to simulate particle showers in high-granularity calorimeters. In this contribution we present improved results on a more realistic simulation. Detailed validation studies show very good agreement with Monte Carlo simulation. In particular, we show how increasing the network representational power, introducing physics-based constraints and using a transfer-learning approach for training improve the level of agreement over a large energy range.


Introduction
High Energy Physics (HEP) relies heavily on Monte Carlo simulation in order to model complex processes and describe detector response. The classical Monte Carlo approach can reproduce theoretical expectations with a high level of precision but it is both time and resource intensive [1]. Existing fast simulation techniques are mostly based on parametrization [2][3][4] or look up table like approaches [5] providing different levels of accuracy. Deep Neural Networks are also being investigated as fast simulation alternative solutions [6][7][8][9]. Calorimeters are among the most time consuming detectors as far as simulation is concerned. Their output can be regarded as a pattern of energy depositions that can be interpreted as pixel intensities in an image. These 3D images are characteristic of the particle type, its energy and the incident angle with respect to the orthogonal to the calorimeter cell face. Hence, these variables are input to the simulation process. The work presented in [9] demonstrated the benefits of using the primary particle energy to condition the training of a DNN. Here we introduce a more realistic use case by conditioning our model on the incident angle as well: the network thus learns a joint distribution of both primary energy and incident angle. With respect to [9], this model introduces new features to reach a higher level of accuracy: in particular we found that domain knowledge is useful for tuning the optimization procedure.

Previous Work
Generative Adversarial Networks (GAN) [10] implement the idea of adversarial training using two neural networks: a generator that reproduces the true data distribution and a discriminator, typically a classifier discriminating generated from true examples. Training develops as a minimax optimisation process reaching, ideally, a saddle point that corresponds to a minimum for the generator and a maximum for the discriminator (Nash equilibrium). The Auxiliary Classifier GAN (ACGAN) follows a semi-supervised approach and demonstrates that the introduction of a label results in faster convergence and stable performance [11]. There have been many recent variations of GAN: our work combines an ACGAN-like approach with physics-derived constraints. The LAGAN [12] and CaloGAN [8] models represent the first applications of GAN to HEP simulation: particle showers in simplified calorimeters are simulated as a set of two-dimensional images. Further examples are described in [6,7]. Simulation of highly granular calorimeters using true 3D convolutions to fully exploit the correlations in the volumetric space achieves promising results [9]. Additional examples of Generative Models applied to the simulation of detector output can be found in different contributions of these proceedings.

The 3D convolutional GAN
In the present work we focus on the simulation of a high granularity electromagnetic calorimeter (ECAL), designed in the context of the detector studies for the CLIC accelerator project [13]. It consists of a regular grid of 5.1 mm 3 cells with an inner calorimeter radius of 1.5 mm. The corresponding data was generated in an effort to provide a common realistic data set that could be used to foster development of different Deep Learning and Machine Learning applications [14]. Data samples were generated using a detailed Monte Carlo approach (with the Geant4 toolkit [15]). Here, we show results obtained using 400, 000 single electrons with a primary energy (E p ) range of 2 − 500 GeV and incident angle (θ) uniformly distributed between 60 • and 120 • . This data are pre-processed to generate three-dimensional 51 × 51 × 25 pixelized images centered around the barycenter of the energy depositions. 2D projections for an example event are shown in shown in Figure 1.
The 3DGAN model, described in [9], represents a first proof of concept of the possibility to use 3D convolutional GANs to simulate high granularity calorimeters. This work extends the scope of [9] and generates more realistic simulations with particles entering the detector with a variable incident angle and generating images that are 4 times larger. The training is conditioned using both the incident angle and energy of the incoming particle, therefore, 3DGAN learns an angle-energy multivariate distribution. The networks architecture and the corresponding loss functions are modified to take into account these new features and introduce physics-based constraints. Figure 2 shows the 3DGAN architecture. The generator input is obtained by concatenating a latent vector of 254 random numbers drawn from a Gaussian distribution to the primary particle energy and the incident angle. The next step is represented by a set of up-sampling operations that are clustered together before any convolution is applied. A faithful representation of the energy shower vs. incident angle distribution is obtained by a setup in which the generator is stronger (seven convolutional layers) than the discriminator (four layers). Further details on the networks architecture can be found in [16]. The discriminator output is two-fold: a sigmoid neuron predicts the typical GAN real/fake probability and a linear neuron implements a regression on the primary particle energy. In addition, the total deposited energy, a binned pixel intensity distribution, and the incident angle are calculated from the images using lambda functions and constrained at training time. Given the large pixel dynamic range, shown in Figure 3, we do not apply the standard normalization-rescaling procedure. In order to slightly reduce the pixel dynamic range, we calculate the power function of pixels intensities using an exponent smaller than one. The exponent is treated as a hyper-parameter and adjusted, to a value of 0.85, through a trial-anderror procedure. The training process is split into two steps: initially, we restrict the primary particle energy range to 100 − 200 GeV to reduce sample variability, we then extend the energy to the full 2 − 500 GeV range, using a transfer learning approach. The discriminator and the generator are trained alternatively for the same number of steps.The architecture is implemented using keras 1.2.2 [17] (and Tensorflow 1.0.0 [18] as a backend). Training on a single NVIDIA GeForce GTX 1080 card for one epoch requires about two hours and convergence is reached after 60 epochs.

Results and Discussion
We validate the 3DGAN results performing a detailed comparison to Monte Carlo simulated data. Figures 1, 4 and 5 present results obtained by training the network on the full energy range. Figure 1 presents an example of 2D sections of the energy showers corresponding to particles entering the calorimeter with the same incident angles and energies for both Geant4 and GAN: the corresponding Figure 4 presents the energy shower profiles along the x, y and z axes for different angles, in both linear (for 90 • incident angle) and log scale. The agreement is very good: the network is capable of correctly reproducing the spatial distribution of energy deposits as a function of the incident angle, across a large dynamic range. The largest discrepancies appear at the edges of the simulated volumes, where 3DGAN predicts, on average, smaller energy depositions. It should be noted, however, that the amount of energy expected in this regions is very small (well below 10 −4 GeV). We obtain a similar agreement in terms of the sampling fraction in Figure 5: the network correctly reproduces the Geant4 behaviour over the entire energy range. The results obtained by training the optimized architecture on the full energy range, (from 2 to 500 GeV), for seven additional epochs on a larger sample (400, 000 events): 3DGAN successfully generalizes the results obtained on the smaller range to the larger one. Figure 3, shows the good level of agreement on the the

Automatic deployment of 3DGAN distributed training on Cloud
In general terms, simulating samples using generative models is much faster than using a Monte Carlo approach. In our case we observe several orders of magnitude of speed-up. We have measured the time to simulate a single electron shower in about 2 milliseconds running 3DGAN on a NVIDIA GeForce 1080 GPU 1 .
The training process is however very time consuming: an entire week is needed in order to train the model to convergence using a single gaming GPU, such as the NVIDIA GTX model  and therefore a distributed training approach is essential [19]. In order to reduce the training time we have interfaced 3DGAN to several distributed frameworks, including Horovod [20] and mpi-learn [21], and benchmarked the parallel training process on different HPC systems [19,22].
Here we report on updated results on 3DGAN scaling performance on public clouds. Several initiatives exist that aim at understanding how the scientific community can integrate public clouds in their computing models. The European Commission funded project Helix Nebula Science Cloud (HNSciCloud) [23], for example, explored an hybrid cloud model linking together commercial cloud service providers and research organisations' in-house resources in order to provide an innovative hybrid architecture to support the growing computing needs of the research community. We have created a mpi-learn based docker [24] image and integrated it to kubernetes [25] and kubeflow [26] in order to smoothly deploy our workload on commercial cloud providers, via the HNSciCloud project. Results are shown in Figure 6: we have tested different deployment configurations and no overhead, due to the docker, kubernetes or kubeflow additional layer, has been observed. We have compared results obtained using Exoscale 2 (equipped with NVIDIA P100 GPUs) (blue) to the speed-up  measured on a small local set of GPUs, available in CERN Openstack (green) and we observed no difference in timing. Training time is significantly reduced, but the current speedup is not linear. A possible explanation is that the workload for the workers is too small with respect to communication time and weights updates processing by the master. Analysis and optimisation of resource usage is part of our on-going work.

Conclusions
The 3DGAN model is capable of realistically reproducing single particle showers in high granularity calorimeters, the same kind that will be operated, in some years, at the High Luminosity LHC [27] and at next generation particle accelerators. Detailed validation studies show that the 3DGAN images reproduce the classical Monte Carlo behaviour, within just a few percents over a very large dynamical range. We intend to continue this work in order to bring the 3DGAN prototype to production-level quality, following two main R&D directions: an investigation on the generalization capabilities of the model (whether it is possible to tune the architecture parameters in order to simulate different calorimters) and a deeper study on the 3DGAN performance in terms of dataset mixing and phase space coverage, sample diversity and size of the generator support space.