Fast simulation of the electromagnetic calorimeter re- sponse using Self-Attention Generative Adversarial Net- works

Simulation is one of the key components in high energy physics. Historically it relies on the Monte Carlo methods which require a tremendous amount of computation resources. These methods may have difficulties with the expected High Luminosity Large Hadron Collider need, so the experiment is in urgent need of new fast simulation techniques. The application of Generative Adversarial Networks is a promising solution to speed up the simulation while providing the necessary physics performance. In this paper we propose the Self-Attention Generative Adversarial Network as a possible improvement of the network architecture. The application is demonstrated on the performance of generating responses of the LHCb type of the electromagnetic calorimeter.


Introduction
The Large Hadron Collider (LHC) built by the European Organization for Nuclear Research (CERN) is the world's largest collider. The Large Hadron Collider beauty (LHCb) experiment is a specialized b-physics experiment, designed to measure the parameters of CP violation in the interactions of b-hadrons. The whole detector consists of several parts: the vertex detector VELO, Ring imaging Cherenkov detector RICH, the main tracking system, followed by another RICH, the electromagnetic/hadronic calorimeters and the muon system. Calorimeters specialize in measuring specific parameters of passing particles, providing measurements of the energy of electrons, photons, and hadrons. This data allows the reconstruction of the initial collision, so the detector's response simulation for a given input becomes the challenge. Using a baseline simulation software Geant4 [9] requires a tremendous number of CPU hours, being most computationally expensive part of the experimental program.
It was shown [1,2] that it is possible to simulate the calorimeter response using deep learning techniques and Generative Adversarial Networks, but in order to apply GANs practically their performance should be properly evaluated using quality metrics and improved using additional techniques if it is possible. In this work, we propose adding Self-Attention module into previously published architecture to improve the quality of generating objects and compare previous results with the new ones generated by Self-Attention Generative Adversarial Network in terms of Area Under Precision-Recall Distribution Curve (PRD-AUC).

Generative neural networks
Generative adversarial networks (GANs) are types of generative model that consist of two networks competing against each other in order to provide an ability to reproduce a given distribution. In this section a brief overview of Conditional GANs is given. Then, we review an approach based on Self-Attentions, because SA-module helps Convolutional Neural Networks (CNNs) with modeling long range and multi-level dependencies across image regions [3], and analyze the possibility of applying Self-Attention Generative Adversarial Network (SAGAN) [3] to reproduce the response of the calorimeters.

Conditional Generative Adversarial Networks
The key idea of generative adversarial networks is a contest of two neural networks, a generator (G) and a discriminator (D). The first one generates samples and another one evaluates them, so the generator training objective is to increase the error of the discriminative network. Basically, the GAN learns to map from a latent space to a data distribution, so its input can be represented by a noise (a simple distribution, e.g. Gaussian). Formally, it is possible to represent the minimax game of two networks: where p data is a true data distribution. Adding the conditional vector along with the noise one as an input of the GAN it is possible to perform a transition to the conditional generation. Thus, the generator utilizes additional information that comes from a class label, in case of multi-class dataset, or just like another continuous input for generation in order to generate more appropriate sample according to the condition. The discriminator uses the conditional input to classify real and generated objects as well.

Self-Attention Generative Adversarial Networks
Previous CaloGAN architectures [1,2] used many convolutional layers, but common CNN GANs have difficulties in figuring out some geometric patterns that occur consistently in case of some fixed cluster conditions. We noticed some outliers to appear in generating objects in case of WGAN (Fig. 3). One of the possible explanations can be the weakness of local receptive field of the neurons: it becomes impossible to process some long-distance dependencies because the convolution has only a local receptive field, thus in order to take far regions into account several convolutional layers should be applied or the size of the convolution kernel should be increased. Such architectural changes lead to the loss of the computational and statistical efficiency obtained by using CNN structure. Increasing the receptive field with a small computational cost becomes the challenge. Attention-based approach may be the solution.
A recent work [3] proposed to use Self-Attention modules for convolutional layers of neural networks in order to improve the model's performance. Basically, SA-module calculates response at a position as a weighted sum of the features at all positions, where the weights are expected to be calculated with a small cost, allowing networks to model relationships between widely separated spatial regions.
According to the algorithm, the features that come from the hidden layer x ∈ R C×N (C is the number of channels, N is the number of feature locations of features from the previous hidden layer) are transformed into two feature spaces f (x) = W f x, and g(x) = W g x to calculate the attention.
where s i j = f (x i ) T g(x j ) and β j,i indicates the extent to which the model attends to the i th location when synthesizing the j th region. The component of the output of the SA layer can be represented as follows: All the matrices W are learned through the training process as well as the layer output's scale parameter γ, that is initialized as zero, so SA-module doesn't affect the output in the beginning of the training procedure, allowing the convolutional layer to learn some closerange dependencies first and then increase the complexity of the task.

GANs for electromagnetic calorimeter response simulation
The application of Generative Adversarial Networks to simulation in High Energy Physics has been proposed by Paganini et al [4]. It was shown that the implementation of DCGAN could be used for calorimeter simulation, however improved accuracy was required in order to replace Geant4 [9]. The idea was further developed in [2] with the use of Wasserstein CGAN [6]. In addition to the previous DCGAN implementation, particle coordinates and momentum projections were used as an input, meanwhile particle energy was an output. Lately, Fedor Sergeev et al. [1] compared the performance of Conditional Variational autoencoder, an improved Conditional GAN and combined CVAE+CGAN architectures, considering CGAN to achieve the best results, boosting previous accuracy even more.
In this paper, we improve the generated samples quality by adding Self-Attention layers to the previous best-performing architecture to help CNN-based generator and discriminator to process long-range relationships between image regions through the training process, as it was claimed in [3], expecting the generation of clusters with more appropriate shape according to the given conditions. During the training process Spectral Normalization [5] was implemented to improve training dynamics. We test our model the same way as it was done in [1], boosting the PRD metrics evaluated on the generated samples.

SAGAN for calorimeter response generation
In this section we review the given dataset and data preprocessing, describe the architecture of the model and discuss the metrics we use to evaluate generated samples.

Dataset
In this paper, the dataset contains electrons interactions inside the electromagnetic calorimeter data inspired by the LHCb detector at the CERN LHC [8]. The electromagnetic calorimeter uses "shashlik" technology of alternating layers of lead and scintillation plates. The prototype consists of 5 × 5 blocks of size 12 cm × 12 cm, the cell granularity corresponds to each block being 6 × 6 of size 2 cm × 2 cm. As an electron enters the calorimeter, it collides with a metal plate, producing a few secondary photons and electrons. The produced particles hit metal plates, creating more ones. This chain reaction produces a stream of electrons and photons, called an electromagnetic shower. All energies deposited in the scintillator layers of one cell are summed up.
Calorimeter cells and the energy stored in them are represented as a 30 × 30 energy distribution matrix being the target output of the generation process. We use the momentum of particles p p p = (p x , p y , p z ) and the coordinates where they enter the calorimeter r r r = (x, y) as a conditional input of the model. All the energy distributions for the given r r r, p p p were precomputed using the Geant4 framework. According to [1] we use logarithmic transformation to achieve better performance: Figure 1: Visualisation of the dataset.

Model
We added two self-attention layers in the architecture that was introduced in [1]. During the experiments in order to stabilize and speed up the training process we applied spectral normalization to discriminator only and to both networks, achieving the best performance in the second case. Comparing Wasserstein loss in case of baseline, the hinge version of the adversarial loss through SAGAN training was minimized: y))],

Metric
To evaluate the quality of generated samples and the performance of the models PRD-AUC [7] is used : where P and Q are distributions, that are defined on a finite state space, It is worthwhile noting that objects that are generated by the models can be considered as images to apply the same mechanisms to generate them, but we should take into account physics metrics through the evaluation as well. Thus, we use the minimum of two PRD-AUC scores, evaluated over raw images and over a set of physical metrics. These physical statistics are: • shower asymmetry along and across the direction of inclination; • shower width along and across direction; • the number of cells with energies above a certain threshold (sparsity level).
This approach allows us to verify that the generated samples are not only similar in terms of images but also have similar physical metrics distribution that seems more appropriate in terms of HEP application.

Results and discussion
All the models were trained for reproducing logarithmic energy distributions. We compared the PRD-AUC scores on the test dataset, computing it on with the generated samples and ground truth inverse transformation. Best performance training parameters are: • batch size of 512; • Adam optimizers with leaning rate 4 · 10 −4 and 1 · 10 −4 for discriminator generator respectively; • 3 discriminator iterations per 1 generator iteration. SAGAN architecture without normalization performed too badly so we don't take in into account. Using Spectral Normalization (SN) only for discriminator changes the behaviour of the model, however, normalization of both networks led to the most stable training process as well as the best results.
The most reasonable difference between the samples from WGAN and SAGAN models are the cells on the edge of the matrix and the shape of generated clusters. Sometimes there are some outliers that contain such relatively high-value energy cells in samples generated by WGAN (fig. 3), meanwhile such samples are more smoothed-out in SAGAN case ( fig.  4) being closer to the original energy distribution according to PRD-AUC that was computed based on shower width along and across direction.  Comparing PRD-AUC scores, SAGAN model with spectral normalization apllied to generator and discriminator achieved the best result, boosting the previous published best value. PRD-AUC scores computed on the set of raw images for SAGAN and WGAN were almost same, however physics ones differ.

Conclusion
In this paper, we proposed to use Self-Attention Generative Adversarial Network as a possible efficient architecture for reproducing energy distribution in calorimeters using the case of the electromagnetic calorimeter of the LHCb experiment. Generative Adversarial Networks show promising results for the fast simulation and this research shows the importance of adding and studying new techniques and architectural approaches to improve the performance. The quality of generated result also relies on an appropriate training procedure with proper parameters. Adding Self-Attention layer made it possible to improve the shape of generated clusters, boosting the PRD-AUC score of the model, evaluated using physics-specific metrics.
Having proper evaluation measures is also important. PRD-AUC can be used as a basic solution, but it does not take any physics metrics into account and requires proxy evaluations of parameters that are chosen in advance. It is still an open question whether currently existing evaluation metrics generalize across different domains. So different ways of evaluating performance of GANs in case of simulation should be studied as well.
Future work will be focused on searching of the proper evaluation metrics and improving the quality of generated objects. Transformer GAN [10] seems like a next possible architecture to be studied because of its attention-based architecture, meanwhile density and coverage metrics [11] may solve some problems of precision-recall approach.