The use of Convolutional Neural Networks for signal-background classification in Particle Physics experiments

The success of Convolutional Neural Networks (CNNs) in image classification has prompted efforts to study their use for classifying image data obtained in Particle Physics experiments. Here, we discuss our efforts to apply CNNs to 2D and 3D image data from particle physics experiments to classify signal from background. In this work we present an extensive convolutional neural architecture search, achieving high accuracy for signal/background discrimination for a HEP classification use-case based on simulated data from the Ice Cube neutrino observatory and an ATLAS-like detector. We demonstrate among other things that we can achieve the same accuracy as complex ResNet architectures with CNNs with less parameters, and present comparisons of computational requirements, training and inference times.


Introduction
Particle physics experiments have been incredibly successful in improving our understanding of nature.An important aim of many of these experiments is to search for elusive particles with interesting properties.Evidence for these particles comes from rare events recorded in the detectors of these experiments, termed signal.These are accompanied by large number of other events from known and well-tested particle interactions, called background.To extract the interesting physics, it is essential to filter out the background from the signal.This process of distinguishing between the two is known as Signal-background classification and lies at the heart of experimental particle physics.Currently this is achieved by applying selections on derived high-level physics variables.
Given the success of deep learning methods in classifying real-world images, deep learning methods have been applied to the problem of signal-background classification ( [1], [2] and references therein).In this work, we describe the use of Convolutional Neural Networks (CNNs) to classify signal from background for two use cases: a simulation dataset for the IceCube experiment and one for the ATLAS experiment.

Convolutional Neural Networks
Convolutional Neural Networks (CNNs) are a class of neural networks specialized for image classification.They are designed to capture features of images at different scales.Fig. 1 shows the basic structure of a CNN.It typically starts with a Convolutional layer: scanning through the input image in small blocks, it performs convolution operations to create feature maps.The dimensions of the resulting layers are large and they are reduced by the subsequent Subsampling layers.Stacking many such blocks sequentially can enable the networks to learn a variety of features at different scales.Eventually the neurons are combined together using one or more Fully connected layers.The output of final layer predicts the image class.CNNs have been very successful in classifying 2D images [3,4].3 Dataset 1: IceCube

The IceCube experiment
IceCube is a neutrino observatory located at the South Pole, looking for high energy (>100GeV) astrophysical neutrinos [5,6].Interactions of these high energy neutrinos with nuclei produce secondary charged particles.These emit Cherenkov light and are detected using an array of Digital Optical Modules (DOMs) placed below the ice.The experimental setup is depicted in Fig. 2a.
While different particles can contribute to the Cherenkov light seen by the DOMs, this dataset only considers the contribution of muons.The Cherenkov radiation from muons produced by astrophysical neutrinos forms the signal for this dataset.The background consists of the light contribution from other atmospheric muons.This dataset used the high energy down-going region of IceCube detection, a region not used in most analysis due to the high background component.The physical way to distinguish the signal and background in this region is using the stochasticity of energy deposition.Signal muons obtained from reactions of astrophysical neutrinos are single muons and hence lose energy stochastically.This results in uneven light emission along the track.The atmospheric muons typically consist of hundreds of muons and hence their light emission averages out, resulting in a more even distribution.This is shown in Fig. 2b.

Data set
The input data samples comprise of events, each consisting of the timings and the total charge deposited on each DOM.Each raw input image has the dimensions 86 x 60 laid out on a hexagonal grid, and is mapped onto an orthogonal grid to create an image of size 10 x 20 x 60.More details about the dataset can be found in [1].This study did not use information from the Deep Core DOMs, which are specialized DOMs placed near the center of the IceCube grid.
In [1], a few of the authors in this paper had explored the potential of Graph Neural networks (GNNs) and CNNs in performing signal-background classification.Their results  showed that both GNNs and CNNs performed better than the physics benchmarks, with the GNNs achieving better performance than the CNNs used in the paper (ResNet).In this work, we perform a more through exploration of CNN architectures.Another aspect that differentiates this work is the exclusion of a set of events called High Energy Starting Events (HESE) [7], which are neutrino events where the interaction starts inside the detector, and can be identified in a pre-selection stage using existing IceCube analyses.
For training and validation, we used 130989 samples, with a validation ratio of 33% and signal to background ratio of 16.2%.The test dataset had 737715 samples with a signal to background ratio of 1.92%.

Analysis and Results
The study in [1] utilized the ResNet CNN (similar to the ones in [8]), with very high number of parameters.In this analysis, we did a more thorough architecture search, specifically looking for compact, layered 3D CNNs with lesser number of parameters.Scanning CNN network architectures using combinations of convolution, pooling and dropout layers, we identified a few networks that achieved better performance than the previous ResNet model.
A good way to assess the performance of models is by looking at the Receiver Operating Characteristic (ROC) curves.Fig. 3a shows the ROC curves for three models: ResNet, the best CNN in this work termed Compact CNN and a CNN with very few parameters termed Simple CNN and Table 1 gives a comparison of these models.Comparing the true positive rate (tpr) values for the three models at a false positive rate (fpr) value of 3x10 −6 (which was the value used for comparison with the physics cuts in [1]), it is clear that the Compact CNN performs better than ResNet, while having only a tenth of the parameters.Its structure is shown in Fig. 3b.Comparing the training times per epoch for the models in Table 1, it can be seen that the training time for the Compact CNN model is slightly higher than ResNet.Given that it has only a tenth of the parameters, this is a bit surprising.It is possible that   ResNet might be faster due to the connectivity between its different layers that is absent for our models.Nevertheless, we have developed a CNN with better classification performance than ResNet, while reducing the parameters by an order of magnitude.

ATLAS experiment
The ATLAS experiment is one of the major experiments at the Large Hadron Collider (LHC) at CERN, Switzerland.It was one of the two LHC experiments involved in the discovery of the Higgs boson in 2012.Among other things, one of the main goals of the LHC is to look for evidence for Physics beyond the Standard Model of particle physics, such as Supersymmetry.In this work, we use a dataset that is an input for analyses searching for new

Data set
The dataset consists of simulated data obtained using the Pythia event generator [10] interfaced to the Delphes fast detector simulation [11,12].The Signal events are the RPV-Susy events, while the background is QCD.The images are 2D, with dimensions 64 x 64, with each image pixel representing the energy deposited in the calorimeter.For training and validation and testing, we used 412416, 137471 and 137471 samples respectively, with a signal to background ratio of about 43%.

Analysis and Results
In [2], the potential of CNNs for signal-background classification was explored with this dataset.A few simple CNNs with large number of parameters were studied and they were found to achieve better performance than the physics benchmarks.The best model was found to have around 34 million parameters.The main aim of this work was to perform an architecture search to identify CNNs with better performance and fewer parameters.As in the previous case, we explored network architectures using a combination of convolution, pooling and dropout layers.Table 2 and Fig. 4a show a comparison of the best performing model termed Compact CNN with the model used in the previous work and ResNet.From the ROC curves for these 3 models in Fig. 4a, it is clear that Compact CNN performs better than the other two models.It is also much simpler in structure (almost 1/800th the number of parameters) compared to the other two models, while having a significantly lower training time per epoch as seen in Table 2.The structural details of the Compact CNN model is given in Fig. 4b.Thus, we have developed a substantially more compact network, while reducing the number of parameters by almost 3 orders of magnitude.

Summary
We have demonstrated the effectiveness of compact, layered Convolutional Neural Networks in classifying signal from background for simulated datasets from two different particle physics experiments.In both cases, performing an architecture search, we have identified compact neural networks that achieve better performance than previous studies with substantially fewer parameters.Given this success of CNNs in signal-background classification, there is potential for their applicability to classification problems in other fields beyond particle physics.

Computational details
All computations were performed at NERSC.The CNNs were implemented in keras [13].
During the course of this work, we developed a package for training of general stacked CNN models with visualization tools1 .These should be of use for general purpose CNN training and visualization

Figure 1 :
Figure 1: The general structure of a CNN with convolutional, subsampling and fullyconnected layers stacked together.

Figure 2 :
Figure 2: Fig (a) shows the setup of the IceCube Observatory.There are approximately 5000 Digital Optical modules (DOMs) placed within the Antarctic ice.Fig (b) shows the pattern of light deposition for signal and background.The colored bubbles indicate the relative time of arrival of light, with red being earliest and blue being the latest.The size of the bubbles is proportional to the number of photons.To the left, the pattern is more uniformly distributed, which is indicative of a background event produced by atmospheric muons.The pattern in the right figure has a large stochastic energy deposition which is indicative of a signal due to a muon produced from an astrophysical neutrino.
Roc curves for 3 chosen models.The X axis denotes the false positive rate(fpr) and the Y axis denotes the True positive rate(tpr).The dotted line denotes an fpr = 3x10 −6 .The model Compact CNN from this work achieves best performance everywhere.We also compare these with a simpler model with very few parameters termed Simple CNN (b) Structure of the best performing model.It has ∼ 2 million parameters.

Figure 3
Figure 3 (a) Roc curves for 3 chosen models.The X axis denotes the false positive rate(fpr) and the Y axis denotes the True positive rate(tpr).The dotted line denotes an fpr = 3x10 −6 .The Compact CNN achieves best performance everywhere, excelling especially in the low fpr region to the left.(b) Structure of the best performing model.It has ∼ 43, 000 parameters.

Table 1 :
Comparison of selected trained models.tpr and fpr stand for true positive rate and false positive rate respectively, while AUC stands for area under the ROC curve.Simple CNN is a layered CNN with very few number of parameters, Compact CNN is the best performing model and ResNet is the model used in the previous paper [1].The training times are obtained by running on a Titan X (Pascal) GPU.

Table 2 :
Comparison of selected trained models.tpr and fpr stand for true positive rate and false positive rate respectively.Old CNN denotes the CNN used in the previous work, Compact CNN is the best performing CNN in this work.We also compare these with a ResNet model.The training times are obtained by running on a Titan X (Pascal) GPU.