The evaluation of the systematic uncertainties for the finite MC samples in the presence of negative weights

The analysis of results from HEP experiments often involves the estimates of the composition of the binned data samples, based on Monte Carlo simulations of various sources. Due to a finite statistic of MC samples they have statistical fluctuation. This work proposes the method of incorporating the systematic uncertainties due to finite statistics of MC samples with negative weights. The possible approximations are discussed and the comparison of different methods are presented.


Introduction
Experimental results in high energy physics are often represented as a binned distribution (histogram) of observed events X = (X 1 , X 2 , ...), where X i is a number of events in bin i. The usual method to estimate physical parameters such as particle masses or cross sections from this distribution is to perform some of Bayesian or frequentist analyses based on likelihood function. Likelihood function L(X|m) connect the data with a theoretical model and represente how well the observatios are described by the prediction m = (m 1 , m 2 , ...) . This prediction may depend on several different parameters π = (π 1 , π 2 , ...): nuisance parameters and parameters of interests. In addition if some signals or background processes are known from Monte-Carlo simulations then the likelihood function depends on template distributions t = (t a 1 , t a 2 , .
where t k i is a number of events for process k in bin i: This is important task to define an adequate likelihood function and take into account all existed statistical and systematic uncertainties present in the analysis.
Template distributions from Monte-Carlo generators are subject to statistical fluctuations due to finite number of events in samples. The influence of these fluctuations can be expected to be significant in regions of low amounts of Monte-Carlo events. For incorporating such uncertainties into likelihood function Barlow and Beeston proposed a method [1] wherein one for every bin i and every process e-mail: Petr.Mandrik@ihep.ru k introduces a new parameter T k i corresponding to unknown expected number of events in infinite statistics limit: where for constrain P(t k i |T k i ) Barlow and Beeston assumed a Poisson distribution. On the other hand several of the modern Monte-Carlo generators [2] produce a weighted events with both negative and positive weights. In this case the transformation (2) is not applicable. In this paper we provide a method of incorporating uncertainties due to the finite statistics of Monte-Carlo samples in the presence of negative weights.

Likelihood functions for Monte-Carlo samples with negative weights
In simplified form the algorithm of event production in most important example of Monte-Carlo generator with negative weights MadGraph5_aMC@NLO can be described as follow [3]. A cross section of some process σ NLO is calculated by computing the integrals of two functions F H (x) and F S (x): where N H , N S are thw number of events in corresponding sets. In this way in the infinite statistics limit the prediction of any observable in any intervals [x i , x i + ∆x] of histograms from MadGraph5_aMC@NLO can be only positive, but in the case of finite statistics the prediction could get negative values in some bins. On the other hand the events with negative and positive weights should be treated in the same way during analyses and pass the same cuts to keep the correct cross section value (4). Further we assume that the last condition is satisfied, so for the finite number of generated Monte-Carlo events the probability of obtaining a given one in the case of only positive or negative weights is described by multinomial distribution, which is usually approximated by multiplication of independent Poisson distributions (see for example, [4]).
Let us consider a simple case of single bin and only one generated process with total number of event t. If t + is a sum of all positively weighted events from Monte-Carlo samples and t − is a sum of all negative, then t is a difference of two Poissonian quantities t + and t − and described by Skelleman distribution: where I t is a modified Bessel functions of the first kind, P is Poisson distribution, T + and T − are the parameters of Skellenam distribution, corresponding to unknown "true" prediction of negative and positive events from MC generator.
Using the equations (5) in (2) we obtain the following transformation rule for taking into account uncertainties due to the finite statistics of Monte-Carlo samples in the presence of negative weights in likelihood function: where the new parameters are related by the equation (6) may be improved by an independent treatment of values t + and t − in analyses. In this case we get: and from (2) with (7): The number of extra parameters in the transformation rule (8) is equal to 2× number of processes × number of bins. We can decrease the number of parameters by using the method of maximum likelihood function Indeed, if L is a likelihood function with transformation (7) then for bin i one gets: The requirement of an extremum gives the following system of equations: This system (10) in some cases may be solved analytically for the parameters T k− i , T k+ i , or numerically with some fixed values of the remaining parameters.
The another way to decrease the number of parameters related to finite statistics of Monte-Carlo is known as Barlow-Beeston "light" transformation [5]. As the statistical uncertainties for each source in each bin are independent they may be combined and be represented approximately by single effective parameter per bin M i : Usually, in this approximation for P(m i |M i ) usually a Gaussian constrain G(m i |M i , σ i ) is used, where the value of σ i are calculated by propagation of the Monte-Carlo statistical uncertainties in bin i with fixed values of the remaining parameters. For histograms with negative weights the transformation (11) has the form: The number of extra parameters is equal to 2× number of bins in histogram.
For the likelihood function with transformation (12) a system of equations similar to (10) can be obtained. Using the Gaussian constrain one gets:

The performance of methods
In this section few results of study the proposed transformations for taking into account uncertainties due to the finite statistics of Monte-Carlo samples are given. The source code was implemented with statistical package SHTA [6]. From the simple single-bin single-channel model: a set of events (X, t + , t − ) may be generated for the fixed values of parameters π, T + , T − . Here the constant C is introduced in order to avoid a long tail in posterior distribution of π from T + − T − ∼ 0.
To estimate the parameter of interest π we use three different likelihood functions. First of all a naive approach without incorporating uncertainties due to the finite statistics of Monte-Carlo samples: a likelihood function with transformation (8): and similar one but with Gaussian approximation for multiplication of two Poissons: where H is a Heaviside function.

Conclusion
In this work a method of incorporating the systematic uncertainties due to finite statistics of MC samples with negative weights is presented. The influence of this statistical uncertainty can be expected to be high in regions of low amounts of Monte Carlo events and they must be included into the fit. The proposed transformation (8) and its simplified version (12) can be used to construct the correct likelihood function. While using the Gaussian approximation of multiplication of two Poisson distribution in (8) or (12) leads to the known expressions used in different statistical packages [8] [9] in different forms, the choice of specific form of likelihood function depends on the analysis and in some cases the more accurate proposed methods can improve the results.