Using Machine Learning for Precision Measurements

The use of machine learning techniques for classification is well established. They are applied widely to improve the signal-to-noise ratio and the sensitivity of searches for new physics at colliders. In this study I explore the use of machine learning for optimizing the output of high precision experiments by selecting the most sensitive variables to the quantity being measured. The precise determination of the electroweak mixing angle at the Large Hadron Collider using linear or deep neural network regressors is developed as a test case. 1 Physics Motivation The forward-backward asymmetry AFB of lepton pairs at the Large Hadron Collider (LHC) around the Z peak is sensitive to the electroweak mixing angle. The potential of the LHC for a precise measurement of this angle has been recognized early on [1]. Traditionally AFB is measured from the cosθ distribution of the electron or negative muon member of the lepton pair in the Collins-Soper frame [2]. This entails transformation of the kinematic variables to this frame, producing the distribution, and then typically performing a fit on it to extract the asymmetry, and using it for the electroweak angle measurement. This contribution asks: can we extract AFB directly from the experimentally measured quantities, bypassing the standard procedure, by applying machine learning (ML) techniques? In other words, can we bypass our knowledge of how high energy physics has been done for decades, and replace it by a neural network regressor based on machine learning? 2 Monte Carlo Simulations and Setup Events are generated using the very popular Monte Carlo generator PYTHIA [3–5] version 8.210. The parton density functions used are from the CTEQ61 set [6]. The events are generated in the typical acceptance of a generic LHC detector at 13 TeV at different invariant masses for the final-state lepton (dielectron or dimuon) pair around 70, 91, 200 and 500 GeV. The leptons are required to have pseudorapidity |η| < 2.4 and transverse momentum pT > 25 GeV . These masses are chosen to follow the change of the asymmetry AFB with mass from negative values below the Z peak to positive values above the peak, as exemplified in figure 1 by the measurement of the CMS collaboration at 8 TeV. Two thousand events are generated at each selected mass. The samples are split in three independent parts: • 75% for training ∗e-mail: dimi@ufl.edu © The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (http://creativecommons.org/licenses/by/4.0/). EPJ Web of Conferences 214, 06022 (2019) https://doi.org/10.1051/epjconf/201921406022 CHEP 2018

. Example of the observed forward-backward asymmetry A FB at the LHC for |y|<1, as measured by the CMS collaboration at 8 TeV [7].
The details of the generation and the number of events for the following studies are summarized in table 1. In total slightly below half of the events pass the acceptance cuts and are retained for the analysis. As expected, the acceptance grows for higher masses of the dilepton system. The following input variables are considered for the ML regressor: dilepton invariant mass m (not used), transverse momentum p T and rapidity y of the dilepton system, pseudo-rapidity η 1or2 , transverse momentum p 1or2 T and azimuthal angle ϕ 1or2 for each decay lepton. The mass is not used because it relates directly to the forward-backward asymmetry A FB , and the goal is to extract the asymmetry from the decay kinematics and the angles "hidden" inside it, and not from the mass.
The target variable is the observed forward-backward asymmetry A FB . Here a comment is in order: the asymmetry is traditionally extracted by the angle between the incoming quark initiating the hard interaction of dilepton production (the Drell-Yan process [8]), and the outgoing electron or negative muon. While the charge of the outgoing leptons is measured with high precision at the LHC, there is no information to distinguish which proton provided the quark, and which the anti-quark. In other words, at the LHC the initial state is symmetric. At rapidity y=0 there is no way to measure A FB . When |y| grows, or the dilepton system gets a "boost", the probability that the higher momentum initiating parton is a valence quark, and that the lower momentum parton is a sea anti-quark, grows as well. Correspondingly the tagging of quarks and anti-quarks improves when |y| increases away from the center of the detectors. The bottom line is that events with sizable values of |y| are sensitive to A FB . Still, this method of tagging has its limits, so the observed values of A FB , as shown in figure 1, are typically 2-3 times lower than the real values if perfect knowledge of the initial state was available. In other words, the observed asymmetry is "diluted" compared to the real one, thus reducing the sensitivity to the electroweak mixing angle, and requiring many more events for reaching the target precision.
In this study linear or deep neural network regressors (LR or DNN regressors) are used to extract A FB directly from the decay kinematics. The Tensorflow library [9] provides all the needed functionalities and is applied to "learn" the A FB directly from the simulated data.

Linear Regressor -Results
For all regressors first a normalization of the input features is applied. All input variables are transformed to span the range [-1, 1]. This linear scaling can help the minimization procedure (e.g. stochastic gradient descent) to perform more smoothly for multi-dimensional problems.
The linear regressor encounters difficulties when trying to extract the asymmetry A FB directly from the input kinematic variables. This is related to the discussion in the previous section. Experimentation shows that the most sensitive variables are the rapidities, and they work better if converted to derived features (also called synthetic features in the Tensorflow documentation), such as the absolute value of the rapidity |y| and of the pseudorapidities |η 1 |, |η 2 |. This way they are well tuned and more proportionally connected to the symmetric nature of the LHC (where the observed A FB is stronger when the dilepton system has higher boost), and help the regressor to learn from the data, as measured by the root mean squared error (RMSE) between predictions and targets for all events. This is illustrated in figure 2. For each run the training of the model is performed over two thousand epochs. In the discussion section the performance is quantified further for the asymmetry measurements for the four mass bins. As the starting conditions are randomized in Tensorflow, there is slight variation of the results from run to run. For each case ten runs are performed, and the final performance is taken as the average over the ten runs.

DNN Regressor -Results
The next step is to learn the non-linearities in the dataset and try to achieve better performance compared to the linear regressor results obtained in the previous section. While the LR is a useful first step, the problem is non-linear and a DNN regressor has better chances to enhance the performance, at the price of introducing higher complexity with more parameters to be determined at the learning stage. The hyperparameters of the deep learning neural network used in this study are: two hidden layers with ten nodes each, ReLu activation, batch size 100, and 2000 epochs. They are selected to match the nature of the problem, neither over-simplifying nor unnecessarily complicating the architecture.
Again normalization of the input features is applied. Various alternatives for the optimization algorithms are explored: • Stochastic Gradient Descent -often used as a starting point • Adagrad Optimizer -for each model coefficient it modifies adaptively the learning rate and lowers it over the course of the training. This is known to perform well for convex problems, bur is not necessarily optimal for non-convex problems often encountered in Neural Net Training • Adam Optimizer -a viable alternative for non-convex problems, especially if the hyperparameters are well matched to the problem at hand.
The performance of the Adagrad and Adam optimizers is illustrated in figure 3.

Discussion
To quantify the performance of the regressors, for each mass bin the "measured" forwardbackward asymmetry A FB is extracted from the ensemble of all predictions for the events for this mass bin. As few events can produce outliers (predictions far from the Monte Carlo truth) which can skew the mean value, the median of all predictions for a given mass bin is retained as the result of the training. The root mean squared error between the so determined asymmetries and the four true asymmetries (RMSE asym ) is used as a measure of the performance for each model training. This is compared to the traditional high energy physics way of doing the analysis by counting forward and backward events, and summarized in table 2. Already the Linear Regressor is able to provide a decent performance after normalizing the input features. This improves further for the DNN Regressor when the Adam Optimizer is used. In contrast the Adagrad and Stochastic Gradient Descent Optimizers were not able to outperform the Linear Regressor in this feasibility study. The best performance with a Neural Net Regressor is starting to approach the traditional way of measuring the forwardbackward asymmetry -the precision, as measured by RMSE asym , is only 14% worse, which is an impressive result given the small size of the learning dataset. It is reasonable to expect that by increasing considerably the size of the simulated samples, the machine learning results can improve and be at least on par with the traditional approach.

Future Work
The results presented here are just the initial feasibility study. Next possible steps include: • Substantially increase the size of the Monte Carlo samples • Tune the hyperparameters of DNN for optimal performance • The ultimate goal would be to perform a regression for the electroweak mixing angle directly on the data; this would require analyzing millions or tens of millions of events.

Outlook
Machine learning techniques show interesting potential for precision measurements at the LHC. They are able to learn how to extract complex quantities like the forward-backward asymmetry of lepton pairs in proton-proton collisions at the LHC without much knowledge of the quite sophisticated underlying physics. Deep Neural Net regressors can outperform linear regressors helped by the introduction of synthetic features, as they are able to learn the non-linearities of the problem. The first results of this feasibility study look promising. It remains to be seen if ML techniques can outperform in precision the traditional analyses. This will be the focus of future exploration of this and similar topics.