Next Generation Generative Neural Networks for HEP

. Initial studies have suggested generative adversarial networks (GANs) have promise as fast simulations within HEP. These studies, while promising, have been insu ﬃ ciently precise and also, like GANs in general, suffer from stability issues. We apply GANs to to generate full particle physics events (not individual physics objects), explore conditioning of generated events based on physics theory parameters and evaluate the precision and generalization of the produced datasets. We apply this to SUSY mass parameter interpolation and pileup generation. We also discuss recent developments in convergence and representations that match the structure of the detector better than images. In addition we describe on-going work making use of large-scale distributed resources on the Cori supercomputer at NERSC, and developments to control distributed training via interactive jupyter notebook sessions. This will allow tack-ling high-resolution detector data; model selection and hyper-parameter tuning in a productive yet scalable deep learning environment.


Introduction
Simulation of physics processes, particle propagation, and detector response are essential components of physics analyses in High Energy Physics (HEP) experiments such as those at the Large Hadron Collider (LHC) [1]. High fidelity simulation software toolkits such as Geant4 [2] are heavily used by experiments to model their building-sized detectors with extremely fine detail. However, producing accurate simulations of such complex physics and detectors requires considerable computing resources. As a result, considerable personpower effort is spent in the HEP experiments to develop fast simulation solutions that can supplement or replace the full-fidelity simulation.
One promising direction in the area of fast simulation development lies in the growing field of Deep Learning [3]. Deep generative models such as Generative Adversarial Networks (GANs) [4] have shown significant promise in recent studies in Cosmology [5] and Particle Physics [6][7][8] to supplement or replace existing simulation codes. These models can be trained to learn a data distribution to quickly sample from. In the GAN framework, the learning problem is posed as a two-player game between a generator network (tasked with producing realistic looking samples) and a discriminator network (tasked with distinguishing real from generated samples).
In this paper we extend the work of using GANs to emulate physics data to full HEP detector images of collision events. We demonstrate, in section 4, the capability of GANs to learn to produce physics-realistic samples from a new-physics theory. We then show, in section 5, how the model can be extended to produce samples conditional on the new physics theory parameters. In section 6 we turn to a different application and explore the capability of GANs to learn to produce the underlying collision event background known as "pileup". Finally in section 7 we describe ongoing computing work at NERSC to run these models at scale on the Cori supercomputer and to provide a productive interface via Jupyter as well as discussing possible future directions for GAN modelling.

Related work
This work is built on a number of related contributions in HEP and Cosmology. First, in [9], the full-detector image representation for HEP collisions was developed and used with Convolutional Neural Networks (CNNs) trained to distinguish new physics samples from background. This work showed the potential in low-level image representations of entire collision events for physics analysis. CosmoGAN [5] is an application of Deep Convolutional GANs (DCGANs) to the problem of simulating weak-lensing convergence maps for Cosmology. In that work, GANs were shown to be able to produce images of very high fidelity with very little physics knowledge imposed on the model. In HEP, there have been studies [6][7][8] that demonstrate the ability of customized GAN models to produce 2D or 3D particle shower signatures in calorimeter detectors. These models worked on the level of individual particles and could be conditioned on the particle kinematics.

Datasets
For this study, we take as a use-case, new massive supersymmetric ('RPV-SUSY') particles in multi-jet final states at the LHC. We focus on 'gluino-cascade decays' with varied gluino masses. We use the Pythia event generator [10] interfaced to the Delphes fast detector simulation [11] and using the default Delphes ATLAS detector configuration. We generate events for the RPV-SUSY signal and also a minimum-bias soft-QCD sample for pileup generation studies.
To produce the jet variables we use a standard jet algorithm used in the physics analyses of these signals (with Radius R=1, and transverse momentum p T > 200 GeV) and applied to the final calorimeter images (described below) using 'FastJet' [12] via pyjet [13].
Data from the surface of the cylindrical detector is represented as a 2D image with coordinates corresponding to azimuthal angle φ and pseudorapidity η. For the pixel intensity in this image, we use the overall energy deposited in the combined calorimeter. We choose to bin the energy into uniform 64x64 bins,which correspond to the approximately 0.1 × 0.1 (η × φ) resolution of the ATLAS hadronic calorimeter.

RPV whole-detector GAN
The architecture employed for this study is based on the original DCGAN topology [14]. The generator network consists of five transposed convolutional layers with batch-normalization, rectified linear unit (ReLU) hidden activations, and a final thresholded output sigmoid activation which ensures sparsity in the generated samples. The discriminator network consists of five convolutional layers with batch normalization, leaky ReLU hidden activations, and a final sigmoid activation on the output. As in the original DCGAN approach, the generator takes as input a vector of random noise (with values sampled from the standard normal distribution), produces fake detector images, and is trained to try and fool the discriminator. The discriminator takes images as input and is trained to classify them as real or fake. We use random label flipping to regularize and stabilize the training. The model is trained using the Adam optimization algorithm [15] with learning rate, random vector size, number of convolutional filters, and label flip rate treated as hyper-parameters.
To identify the best values of the hyper-parameters we use a random search over the parameter space. To evaluate and select the best configuration, a physics quality metric is computed for a validation set of generated and real samples at every epoch of training. First, jets are reconstructed from both sets of samples. Then, we compare the jet kinematic distributions with Kolmogorov-Smirnoff (KS) tests [16]. The final evaluation metric is the sum of the negative log of the KS test p-values for three jet quantities: the jet multiplicity, the transverse momentum, and the summed jet mass. The model and epoch which minimize this quantity are selected as the best.
Example real and generated samples are shown in figure 1. The generated samples can be seen to exhibit qualitatively similar structure and sparsity to the real ones. A comparison of the reconstructed jet variables is shown in figure 2. The generated samples produce realistic jet multiplicities and kinematics without having those distributions explicitly used in training.

Conditional RPV GAN
We extend the GAN architecture used in the last section to incorporate the gluino and neutralino masses in training. This allows the GAN to learn the conditional data distributions. The generator is augmented to take the two mass parameters as additional input, and for the discriminator the mass values are included in the input image by filling them in new image channels.
For this study, we generate SUSY-RPV samples for a range of gluino and neutralino masses and train on a mixture of them. Once the model is trained, we can sample from the generator for specific mass values and see that the reconstructed physics distributions change accordingly. In figure 3 we show that the conditional dependence is also learned and that the shift in sum of jet masses is reproduced in GAN generated data. Such an approach could be used to supplement full simulation samples in physics analysis. One can use the high fidelity   generated sample was used to train a model which is then employed for fast, on-the-fly pileup sampling.
To test the feasibility of this approach we train a GAN on min-bias events. The training,validation and test samples are generated using the Pythia event generator for soft-QCD events, passed through the same Delphes detector simulation and image creation described above in section 3. N image histograms are then summed to produce images representing a pileup of µ = N on which a GAN is trained with the same DCGAN architecture as described in section 4 though no hyper-parameter optimization is performed. A WGAN implementation [17,18] was also used which achieved similar results to those shown here. We then evaluate the effects on reconstructed object kinematics by overlaying the original and generated pileup on the RPV-SUSY images used above, reconstructing jets on these on these overlaid images and comparing the resultant shifts in the variables used by the RPV-SUSY analyses. Figure 4 shows the a comparison of the images generated by the GAN and those in a validation set for µ = 20 and figure 5 shows distributions for two key jet variables. It can be seen that the shift in jet variables due to pileup is modelled by the GAN generated events. However we have observed that at higher pileup the distributions are not modelled with the required level precision so further work will be required to tune models for this purpose.

Discussion
The results of the previous sections demonstrate some new capabilities of GANs to supplement HEP simulation, but challenges remain in developing sufficiently sophisticated solutions for use by experiments. In this section we describe some of the remaining challenges and potential solutions.
GANs infamously suffer from instabilities and other difficulties in training. Many augmentations have been proposed to stabilize GAN training [17,19,20] and prevent issues like mode-collapse or otherwise poor performance. Some of these solutions, such as Wasseserstein-GAN, have been explored in this work and in the related HEP studies from section 2. Other studies (e.g. CaloGAN) have leveraged additional physics knowledge or constraints to improve GAN results. While it's clear that such approaches can help, there is still more to be done to evaluate and compare different techniques.  . Weak scaling of Cosmology DCGAN network using Horovod [22] and CrayPE [23] MPI libraries with Tensorflow at NERSC Most modern Deep Learning applications require large compute resources because of the large datasets and complex models needed to solve tasks. HPC facilities are particularly well suited to address this demand and work has already been done at NERSC to study GANs on large-scale HPC systems [21]. In figure 7 we show that we are able to scale GAN architectures up to 1000s of compute nodes with reasonable efficiency using modern MPI libraries. However, training GANs at scale generally exacerbates the instability issues and how to resolve this is still very much an open question.
To effectively utilize HPC resources for GAN applications, it is helpful to have infrastructure which enables high productivity and interactive, iterative development. At NERSC we are developing Jupyter notebook solutions for deploying distributed Deep Learning applications on HPC [24] which we anticipate will be useful for large-scale GAN training as well.
Finally, HEP experiments use complex detector geometry. The layout of detector sensors gives data which cannot always be mapped into 2D or 3D image formats without lossy transformations. There is, however, a growing area of research into deep learning methods for graph-and manifold-structured data known as Geometric Deep Learning [25]. Deep generative models for graph-structured HEP data may be effective but haven't yet been studied to our knowledge.

Conclusion
In this paper we demonstrate the applicability of using Generative Adversarial Networks to generate low-level LHC collision events. We extend previous work to consider whole detector events; define a systematic KS-based procedure for more stable performance and hyper-parameter optimization; and add the capability to condition on physics parameters of interest. We apply this to key problems that face major computational challenges in future LHC running, theory parameter interpolation and pileup generation. We demonstrate that our DCGAN architectures are able to learn min-bias and RPV SUSY events and reproduce the reconstructed jet features used in these analyses, and the effect of pileup, without having been explicitly trained on those features.
We also discuss how the computational challenges related to training these large and unstable models are beginning to be addressed on HPC facilities at NERSC, scaling to thousands of nodes and harnessing the power of those resources through productive Jupyter interfaces. This work presents an important next step in pushing the computational and methodological frontier of generative networks for HEP.