Data Unfolding Methods in High Energy Physics

A selection of unfolding methods commonly used in High Energy Physics is compared. The methods discussed here are: bin-by-bin correction factors, matrix inversion, template fit, Tikhonov regularisation and two examples of iterative methods. Two procedures to choose the strength of the regularisation are tested, namely the L-curve scan and a scan of global correlation coefficients. The advantages and disadvantages of the unfolding methods and choices of the regularisation strength are discussed using a toy example.


Introduction
In high energy physics, typical measurements are based on counting experiments. Events are detected and later classified depending on their properties. Cross sections, for example, are determined from event counts, where the event properties are restricted to certain regions in phase space (bins), divided by the integrated luminosity. The observed event counts are different from the expectation for an ideal detector mainly because of three effects: Detector effects: the event properties such as energy or scattering angle are measured only with finite precision and limited efficiency. Events may be reconstructed in the wrong bin or may get lost.
Statistical fluctuations: the observed number of events is drawn from a Poisson distribution. The measurement provides an estimate of the Poisson parameter µ. Commonly, the square root of the number of event counts is assigned as "statistical uncertainty".
Background: events similar to the signal may also be produced by other processes.
The process of extracting information about the truth content of the measurement bins, given the observed measurements, is referred to as "unfolding". In mathematics, the general problem is formulated as an integral equation of the type k(y, x) f (x)dx = g(y) . (1) Given the observations g(y) and the kernel k(y, x) one seeks to know the function f (x). It is wellknown that for this type of equation small changes of g(y) may result in large changes of f (x). In the following, a simpler version of equation 1, corresponding to a finite number of bins, is studied. The distribution f (x) is replaced by a vector x of dimension M X , where the components x j a e-mail: sschmitt@mail.desy.de correspond to the expected number of events in a bin j at "truth level". Similarly, the function g(y) is replaced by a vector µ of dimension M y and its components µ i correspond to the expected number of events on "detector level". The two vectors are connected by the folding equation where the elements A i j of the response matrix A specify the probability to find an event produced in bin j to be measured in bin i. The expected number of background events is described by the vector b. The matrix A and the vector b are assumed to be known. In a real experiment, these numbers may have limited precision, leading to systematic uncertainties. In many cases the A i j are estimated using Monte Carlo techniques to simulate the signal process and detector effects, is the total number of Monte Carlo events generated in truth bin j, including events which are not reconstructed in any of the bins i. The reconstruction efficiency is given by ǫ j = i A i j . Events which are reconstructed in a bin j but are generated outside any of the M X generator bins are attributed to the background b j .
As the experiment is performed, numbers y i are observed instead of the expectation value µ i . The differences of the vector of observations y and the expectation µ are amplified in the unfolding process. For counting experiments, the integer event counts y i are drawn from a Poisson distribution, P(y i ; µ i ) = exp(−µ i )(µ i ) y i /y i ! . In the large sample limit, the event counts y i are taken to follow multivariate Gaussian distributions, with mean µ i and a fixed covariance matrix V yy . The covariance matrix is diagonal in case of statistically independent bins. The diagonal elements often are approximated using the observations y i as the variances.
The result of the unfolding process is an estimatorx of the truth distribution x and a corresponding covariance matrix V xx . The uncertainties δ j and correlation coefficients ρ jk of two bins j and k are given by The global correlation coefficient of bin j is defined as In this paper, a few selected unfolding algorithms are discussed together with methods to verify the unfolding procedures and to choose parameters of the algorithms. Unfolding algorithms and their application in high-energy physics and elsewhere are also widely discussed in dedicated workshops, e.g. [1] and in literature, e.g. [2,3].

Toy example
A simple toy example is used here to test and compare various unfolding algorithms. It is included in the TUnfold package [4], example number 7. A heavy particle is produced with a given transverse momentum P T distribution and decays into two massless particles. The energy and angles of the decay products in the laboratory frame are smeared by resolution functions. The observed P T is calculated from the vector sum of the two reconstructed particles. Background is also generated and contributes in this example mainly at large P T . For the P T distribution on truth level, Landau distributions are used. The corresponding parameters differ between the "data" and the "simulated" events, where the latter are used to construct the response matrix. The resulting P T distributions are shown in Fig. 1 on truth and detector level. The differences between the two parameterisations are clearly visible, both on truth and on reconstructed level. The response matrix indicates a moderate detector resolution at low P T , where a fine binning is used.

Testing unfolding results
Given a method to estimatex and the covariance V xx for a given vector of observations y, it is desirable to judge on the quality of this estimator. Two classes of tests are defined here, "Data tests" and "Closure tests".

Data tests
The folding equation 3 can be applied to the unfolding result, i.e. one may compare Ax + b with the observation y. The most basic comparison is to verify the normalisation by calculating The expectation is to find Y unf = Y data . Another test is to calculate a χ 2 sum, In the large sample limit, one expects to find χ 2 A distributed with M y − M x degrees of freedom, unless a strong level of regularisation is introduced by the unfolding procedure. In particular, for the case M y = M x , the χ 2 sum is expected to be zero and hence Y unf = Y data . For M y > M x , the quantiles of the χ 2 distribution for M y − M x degrees of freedom can be assessed. Another interesting quantity to study is the average global correlation coefficient,

EPJ Web of Conferences
The average global correlation coefficient can be used to tune regularisation parameters, as discussed below. In general, one expects to find a non-zero global correlation coefficient in the presence of non-negligible migrations.

Closure tests
When using pseudo-data, generated with the help of Monte Carlo simulations, the truth distribution x truth is known, so the unfolding resultx may be directly compared to it. Such comparisons, where pseudo-data are unfolded and compared to the truth are often called closure tests. The most trivial test to think of is to insert µ truth = Ax truth +b for the observations y and perform the unfolding. However, this test is not very meaningful, because basically all commonly used unfolding methods will trivially result inx = x truth in this case.
More interesting closure tests are based on independent Monte Carlo samples. For example, the y i could be drawn from Poisson distributions given the parameters µ truth i . As these Poisson experiments and the subsequent unfolding are repeated many times, one gets an independent determination of the average and of the covariance where the averages are taken over the unfolded, independent Monte Carlo samples. The resulting x avg is expected to agree with x truth and the resulting covariance is expected to agree with the covariance returned by the unfolding algorithm. This type of test, seeded from the same truth distribution as is used to construct the matrix A verifies the statistical properties of the unfolding method. The most interesting type of tests includes independent Monte Carlo samples where the underlying truth distributions are modified. For a good unfolding algorithm, one expects to obtain unbiased results when unfolding observations drawn from the changed truth distribution using the unchanged response matrix. For each of the independent Monte Carlo samples one can define the χ 2 and verify that the unfolding result is not biased. In the large sample limit and for a completely unbiased algorithm, the quantity χ 2 truth is expected to follow a χ 2 distribution with M x degrees of freedom.

Bin-by-bin correction factors
For this method, the unfolded distribution is given bŷ where N rec The bin-by-bin method often is used due to its simplicity; however its results are biased significantly by the underlying Monte Carlo distributions. The results of performing data tests using the toy example with various unfolding methods are summarised in Tab. 1. The use of the bin-by-bin method is clearly disfavoured. It yields the wrong normalisation Y unf and χ 2 A is far from zero.

Matrix inversion and template fit
The unfolded result is given bŷ The matrix inversion returns an unbiased result, because it is a simple linear transformation of the result and no assumptions on the probability distributions of the y i enter the calculation. The result is depicted in Fig. 2. While the folded-back distribution is on spot with the data and all basic tests are fulfilled (Tab. 1), the unfolding result shows large bin-to-bin fluctuations and correspondingly large uncertainties and correlation coefficients close to −1 for neighbouring bins. Such "oscillation patterns" are often observed when solving inverse problems. The solution is statistically correct within the uncertainty envelope given by the covariance matrix V xx . However, it does not correspond to a Simple template fits lead to well-known biases when applied to Poisson-distributed data, such that the overall normalisation is not retained. For this reason, the template fit is repeated, including a constraint on the overall normalisation [4]. The results of the template fits with and without this constraint are included in Tab. 1. As compared to the matrix inversion, the template fits have somewhat reduced uncertainties and correlations, however the large bin-to-bin fluctuations of the result are still present (not shown in this paper). As summarised in Tab. 1, the template fits pass the data tests with the exception of the overall normalisation for the template fit without constraint, which is slightly low.

Template fit with Tikhonov regularisation
For the constrained template fit explained above, the large bin-to-bin fluctuations of the result can be reduced by adding an extra term to the χ 2 A function (Eq. 7), as suggested by Tikhonov [5]. The function which is minimised takes the form The vector x B is the bias vector, often set to zero or to the Monte Carlo truth. The matrix L specifies the regularisation conditions and is here set to the unity matrix. The parameter τ is the regularisation strength. The case τ = 0 corresponds to the template fit without regularisation, whereas for very large τ the result is strongly biased to x B . Eq. 14 is modified slightly [4] to account for the normalisation constraint. Fig. 3 shows the unfolding result obtained for the choice τ = 0.0068. As compared to the matrix inversion, the oscillating behaviour ofx is removed and the uncertainties and correlations are reduced. As compared to Fig. 2, the original of the input distribution is visualized much better.

Choosing regularisation parameters
When using Tikhonov regularisation one has the difficulty to find an appropriate choice of the parameter τ. Similarly, the maximum number of iterations has to be chosen in the case of iterative methods, see section 4.5. Two methods are discussed in the following, the L-curve scan, applicable for the Tikhonov case, and the minimisation of the average global correlation coefficient, which is applicable to a larger class of unfolding methods. The L-curve scan is based on investigating the variables X := log χ 2 A and Y := log χ 2 L . The L-curve is determined by varying τ and minimising χ 2 TUnfold for each choice of τ. The variables X and Y are visualised on a parametric plot, as shown in Fig. 4. There is a characteristic kink, i.e. the curve is shaped similar to the letter "L". The kink corresponds to the point with the largest geometric curvature. The corresponding value of τ is chosen to set the regularisation strength. For a review of the L-curve method, see e.g. [6].
The minimisation of the average global correlation coefficient [7] is also based on repeating the unfolding algorithm for different choices of the regularisation parameter. The average global correlation coefficient (Eq. 8) is recorded and the regularisation parameter is chosen at the minimum ρ avg . The maximisation of the L-curve curvature and the minimisation of ρ avg as a function of τ are compared in Fig. 5. In this example, but also in many other cases, the ρ avg minimisation yields a stronger regularisation than the L-curve method. Both methods pass the test against data, as shown in Tab. 1.

Iterative unfolding methods
Iterative unfolding methods have been proposed since long. Here, two such unfolding methods have been tried: the EM algorithm 1 [8] and the IDS algorithm [12]. The EM algorithm, in the form described in [11] defines an iterative improvement of the unfolding result x (n+1) i , given the result of a previous iteration, x (n) j , In the original works [8][9][10], the efficiency ǫ j is absorbed in a redefinition ǫ j x j →x j and A i j /ǫ j →Ã i j , such thatx (n+1) The EM algorithm has two interesting properties: given that the start values x (−1) j and the measurements y i are all positive, the result is bound to be positive. Furthermore, it converges to a maximum of the likelihood function, the likelihood defined for the case where the measurements y i are independent and follow Poisson distributions [8]. However, the convergence rate can be very slow and the number of iterations is expected to grow with the number of bins squared [10]. While the method is expected to give unbiased results for a sufficiently large number of iterations with Poisson distributed measurements, this is not necessarily true in other cases, e.g. for correlated input data. This is evident from the fact that the covariance of the input data V yy does not enter Eq. 15.
In high-energy physics, the EM method typically is not iterated until it converges. Instead, the number of iterations is fixed, and the result then still depends on the start values x (−1) j . The dependence on the start values, typically taken to be the Monte Carlo truth, provides a regularisation of the result.  study, the denominator is modified as follows to take into account the background, The results obtained with the EM method for n = {0, 20, 1000} iterations are shown in Fig. 6 and the data tests are summarised in Tab. 1 for n = {0, 20, 100, 1000}. The results obtained for n = 0 iterations, corresponding to the non-iterated, so-called Bayesian unfolding [11] are very poor, as visible from both the the data tests and from Fig. 6. All bins have positive correlations, corresponding to a smearing rather than unfolding, and the result is far from the truth distribution. The data tests obtained for a low number of iterations indicate that a certain minimum number of iterations is required to reach a proper normalisation. The shapes of the unfolded results observed for n = 20 and n = 1000 iterations (Fig. 6) are similar to the cases of Tikhonov regularisation and matrix inversion, respectively. Another iterative algorithm tested here is the Iterative Dynamically Stabilised unfolding (IDS) [12]. The algorithm is too complex to be explained here in detail. Briefly, it combines elements of the EM iterative procedure and the bin-bybin unfolding, using non-linear weighting factors in each bin. For each iteration, care is taken to preserve the data normalisation. Because of the bin-by-bin component in the algorithm, its use is restricted to the case M y = M x or to cases where a clear correspondence of bins on detector and truth level exists [12]. The IDS algorithm is expected ultimately to converge to the same value as the EM algorithm, however at improved convergence speed [13]. Results of data comparisons after n = {1, 3, 10, 30} iterations are given in Tab. 1. In contrast to the EM method, in this case the data normalisation is correctly reproduced even after only one iteration. The observed χ 2 A indicates that a sufficiently large number of iterations is required to accurately match the shapes on detector level.

Scan of average global correlations for iterative methods
The dependence of the average global correlation coefficient and of χ 2 truth /N D.F. on the number of iterations is studied in Fig. 7 for the EM and the IDS algorithms. For both algorithms, a characteristic minimum of ρ avg is observed. This is related to the fact that the first iteration produces positively correlated results (smearing), whereas in the limit of many iterations (matrix inversion), negative correlation coefficients ρ i j appear. The minimum ρ avg is interesting to study, because it is largely independent of the start values and hence can be used as an objective to define the number of iterations. The IDS algorithm is observed to converge faster than the EM algorithm. The minimum in ρ avg is reached after 3 (20) iterations for the IDS (EM) method. The minimum value of ρ avg determined for the EM method is similar to the minimum observed in template fits with Tikhonov regularisation, whereas the IDS minimum is significantly lower. Most probably this is related to the fact that the IDS algorithm has a bin-by-bin correction component (bin-by-bin trivially results in ρ avg = 0). To assess the quality of the unfolding in greater detail, closure tests are performed. The unfolded data are compared to the data truth, which is known for the toy example. In addition, fits of the original Landau function are performed to the unfolded distributions. The comparisons to truth are summarised in Tab. 2. Here, the comparisons are based on unfolding the "data" and comparing to the truth. For more detailed tests, toy studies using the "data" truth would have to be performed in order to assess the quality of the expectation values x and the distribution of χ 2 truth . The bin-by-bin method, the Tikhonov method with large τ and the iterative methods with small number of iterations all result in unacceptable biases of the extracted width σ Landau . As expected, the matrix inversion and constrained template fit results are not biased. The Tikhonov method with L-curve scan, resulting τ = 0.0068, gives acceptable results. The EM method works well for a sufficiently large number of iterations, n 20, where n = 20 was determined in the scan of ρ avg . The IDS method also performs well for a sufficiently large number of iterations n; however n = 3 determined in the scan of ρ avg does not give a satisfactory result.

Summary and Conclusions
A selection of methods to unfold binned distributions is studied: bin-by-bin correction factors, matrix inversion, template fits with Tikhonov regularisation, iterative methods. The bin-by-bin methods leads to biased results and should not be used. Matrix inversion and constrained template fits give unbiased results. However, the result typically suffer from large bin-to-bin correlations, large uncertainties and bin-to-bin oscillation patterns.
The oscillations are reduced in the other unfolding methods using regularisation techniques. These damp the fluctuations and reduce bin-to-bin correlations, at the cost of introducing biases. There are free parameters which have to be tuned to obtain a good compromise between bias and damping.
Template fits with Tikhonov regularisation seem to give good results when choosing the regularisation parameter by means of the L-curve method. For the iterative EM method, an interesting choice is the minimisation of the average global correlation coefficients. The IDS iterative method also has been tested but seems to require a different objective to optimise the number of iterations.
In general, methods with Tikhonov regularisation have the advantage, that there is a natural transition to unbiased results, by setting the τ parameter to zero. In contrast, the iterative methods start from a fully biased results and the number of iterations required to reach the unbiased result is a priory CONF12 unknown. Furthermore, in the case of bin-to-bin correlated or non-Poisson distributed measurements, it is not clear whether the iterative methods converge to an unbiased estimator.
In summary, no matter which unfolding method is used, detailed closure tests are required to quantify the level of bias introduced by the unfolding.