Combination of measurements and the BLUE method

The most accurate method to combine measurement from different experiments is to build a combined likelihood function and use it to perform the desired inference. This is not always possible for various reasons, hence approximate methods are often convenient. Among those, the best linear unbiased estimator (BLUE) is the most popular, allowing to take into account individual uncertainties and their correlations. The method is unbiased by construction if the true uncertainties and their correlations are known, but it may exhibit a bias if uncertainty estimates are used in place of the true ones, in particular if those estimated uncertainties depend on measured values. In those cases, an iterative application of the BLUE method may reduce the bias of the combined measurement.


Measurement combination
The most rigorous and accurate method to combine measurements from different experiments relies on the combination of the individual likelihood functions that have been used for each measurements. Imagine that an experiment A provides a measured data sample x A , whose likelihood function, characterized by a set of parameters θ A , is and an experiment B provides a sample x B , whose likelihood function, characterized by a set of parameters θ B , is The parameter sets θ A and θ B contain some experiment-specific parameters and some parameters common to both experiments. Among the latter, there are physical parameters of interest, nuisance parameters related to common source of systematic uncertainties, such as theory uncertainties, acceleratorspecific component of luminosity uncertainty, etc. The global likelihood function, if A and B are independent experiments, is: where θ is the set of all common and non-common parameters. Any statistical method can be applied at this level to determine a combined measurement, upper limit and/or significance: a Bayesian or frequentist inference, profile likelihood, modified frequentist upper limit, etc. a e-mail: luca.lista@na.infn.it An example of such approach is provided by the combination of measurements of single top-quark production in the s-channel at the Tevatron performed by the CDF and D0 experiments [1][2][3].
CDF and D0 measurements of single-top production cross section were combined using as input the binned distributions of multivariate discriminator outputs for each individual measurement. Each bin in each data sample was used in a Bayesian analysis, assuming a Poisson distribution, to extract a central value of the cross-section estimate. A likelihood-ratio analysis using asymptotic formulae [4] was instead used to determine the combined significance of the observed s-channel signal.

Approximate approaches: the BLUE method
In many cases, unless experiments, or even analysis groups within the same experiment, agree in advance, individual likelihood functions may not be available, or are available under different software frameworks, etc. Approximate methods can be used to combine the individual results, which are usually provided in terms of a central value and an uncertainty: Correlation between uncertainties, which is related to the non-diagonal elements of the covariance must be properly taken into account. The most popular method to combine correlated measurements is the best linear unbiased estimate (BLUE). The method was formulated initially in the '30s [5] and proposed in high-energy physics in the '80s [6]. By definition, it is the unbiased linear estimator that provides the smallest possible variance assuming the true uncertainties and their correlation are known. The estimator is equivalent to a χ 2 minimization which, for Gaussian distributions, is also equivalent to a maximum-likelihood estimate.
Given two measurements, as in eq. (4) and (5), the BLUE estimate is given by the following linear combination of the individual measurements: with variance: Unlike the usual weighted average, for some values of the correlation coefficient ρ, the coefficients of the linear combination that appear in eq. (7) may be negative. More in general, for n measurementsθ 1 ± σ 1 , · · · ,θ n ± σ n , with a covariance matrix V, the BLUE combination in eq. (7) can be generalized as:θ with variance:

CONF12
The weights in eq. (9) can be computed as: where u = (1, · · · , 1) is the vector with all elements equal to unity. The normalization condition for weights w holds: In case the weights w i are positive, an interpretation of the BLUE combination in eq. (7) in term of weighted average is possible by introducing the "common error" [7], defined as: The two measurements in eq. (4) and (5) can be rewritten with uncertainties given by the sum in quadrature of fully uncorrelated contributions σ 1 and σ 2 and a 100% correlated contribution σ C : where the uncorrelated uncertainty contributions are defined by: The BLUE combination in eq. (7) achieves an expression similar to a regular weighted average: with weights that only take into account the uncorrelated uncertainty contributions, but in this case the uncertainty receives an additional contribution due to the correlated uncertainty, with respect to the uncertainty of the usual weighted average, and is given by: Cases with negative weights have a less intuitive interpretation than eq. (19), as will be more evident in section 4.

Quantifying the importance of individual measurements
In order to quantify the "importance" of each individual measurement used in a combination, the first approach adopted in literature was to quote the so-called "relative importance" (RI) of each individual EPJ Web of Conferences measurement, proportional to the absolute value of the corresponding weight 'w i , and defined as: The definition is chosen in order to have a normalization condition: i RI i = 1. This approach was, for instance, used in combinations of top-quark mass measurements at Tevatron and at LHC [8,9]. This choice is questionable, as observed in ref. [10]. In fact, imagine we have three measurement, say A, B 1 and B 2 . The RI of measurement A changes whether the three measurements are combined all together or if B 1 and B 2 are first combined into B, and then A and the partial combination B are combined together.
Ref. [10] proposes alternatives to RI based on the Fisher information, which is defined as: the average being performed over all possible measurements, hence J does not depend on a specific measurement, but only on the form of the likelihood function and on the parameters choice. Fisher information sets a lower bound to the variance of an unbiased estimator [11,12]: and for a single parameter, the Fisher information is given by: The alternative quantities to RI proposed in ref. [10] are the intrinsic information weight (IIW), defined as: and the marginal information weight (MIW), defined as follows: i.e.: the relative difference of the Fisher information of the combination and the Fisher information of the combination excluding the i th measurement. Both IIW and MIW do not obey a normalization condition. For IIW the quantity IIW corr can be defined such that: IIW corr represents the weight assigned to the correlation interplay, not assignable to individual measurements, and is given by:

Negative weights
Negative weights are always sign of a high-correlations regime. The maximum value of the ratio σˆθ2 /σ 2 1 (eq. (8)) as a function of ρ is obtained for ρ = σ 1 /σ 2 . For ρ > σ 1 /σ 2 , an increase in correlation implies a decrease of the uncertainty and a negative weight, and uncertainty becomes strongly dependent on ρ. For ρ = σ 1 /σ 2 , in particular, the weight w 2 becomes equal to zero, as well as MIW 2 = 0. But this does not imply that the measurement θ 2 is not used in the combination. Figure 1, from ref. [10], shows how the BLUE coefficient w 2 and the ratio of uncertainties σ 2 θ /σ 2 1 vary as a function of the correlation ρ for different fixed values of the ratio σ 2 /σ 1 (note that the figure uses a different notation with respect to this text, as specified in the caption). BLUE coefficient for the second measurement w 2 (left; λ B in the original figure notation) and combined BLUE variance σ 2 θ (right; σ 2 Y in the original figure notation) as a function of the correlation ρ between two measurements 1 and 2 for various fixed values of the ratio σ 2 /σ 1 (σ B /σ A in the original figure notation). The figure is from ref. [10].
When the correlation coefficient ρ is not well know, assuming ρ = 1 is not always a conservative choice. The assumption ρ = 1 yields the largest possible uncertainty only if the uncorrelated contributions to the total uncertainty dominate. ρ should be accurately determined in case of negative weights in order to avoid the risk of underestimating uncertainties. Assume that the two measurements A and B have total uncertainties given by the sum in quadrature of uncorrelated contributions, σ A (unc) and σ B (unc) respectively, and correlated contributions, σ A (cor) and σ B (cor) respectively, whose correlation coefficient is ρ(cor). Figure 2 shows the most "conservative" value of the correlation coefficient ρ(cor), which is equal to 1 only for σ B (cor)/σ A (cor)< (σ A /σ A (cor)) 2 .

Bias with the BLUE method
The BLUE method provides an unbiased estimate only if uncertainties and their correlation are known exactly. This is not always a realistic scenario, since estimates of uncertainties and their correlation, and not their true values, are available in most of the cases. Moreover, the uncertainty estimates may depend on the assumed central value. One example of such a case in which the BLUE method provides a bias is the combination of two Poissonian estimates, each of which has an uncertainty estimate that depends on the central value through a square-root relation: A maximum-likelihood estimate, using Poissonian distributions, would produce the following unbiased combined estimate:n = 1 2 n 1 +n 2 ± n 1 +n 2 .
The BLUE combination, instead, gives weights proportional to 1/n i , which correspond to a harmonic average:n = 2n 1n2 n 1 +n 2 ± n 1n2 n 1 +n 2 .
(32) Equation (32), compared to eq. (31), exhibits a bias because, due to the dependence of uncertainties on the measured values, downward measurement fluctuations achieve larger weights pulling down the combination, while upward fluctuations produce a smaller opposite effect. In many cases, relative uncertainty estimates are available, like for uncertainty contributions due to luminosity, efficiencies, etc. When performing a combination, the best estimate of the central value improves the individual measurements, hence one may argue whether the assumed uncertainties should change accordingly, re-evaluating them using the central-value estimate from the BLUE combination [16,17]. The method can be applied iteratively until the combination converges to a stable CONF12 value. Namely: In practice, convergence only needs few iterations in most of the cases.
Let us assume that a contribution to the total uncertainty is known as relative uncertainty. In this case, uncertainties can be written as the sum in quadrature of a contribution that does not depend on the central value and another contribution that is proportional to the central value 1 : The covariance matrix estimate can be written as follows: where ρ 0 and ρ r represent the correlation coefficients of the uncertainty contributions that do not depend on the central value and of the ones that are proportional to the central value, respectively. A special case is when uncertainties are fully proportional to the central values, i.e.: σ 1 = 0 and σ 2 = 0. In that case, the iterative BLUE method converges in two iterations to the following central value:θ = (r 2 2 − ρ r r 1 r 2 )θ 1 + (r 2 1 − ρ r r 1 r 2 )θ 2 q 2 1 − 2ρ r r 1 r 2 + r 2 which is similar to eq. (7) but relative uncertainties are used in place of the absolute ones. A numerical Monte Carlo study [17] shows that in most of the cases the bias in the combination can be mitigated by applying the iterative procedure. Assuming for simplicity, and without loss of generality, a true value θ = 1, uncertainties and their correlations are chosen randomly by spanning a wide ranges of values in the boundaries σ 1 , σ 2 , r 1 , r 2 , < 1 and −1 < ρ 0 , ρ r < 1. For each set of randomly-extracted uncertainties and correlation values, 500 000 random values ofθ 1 andθ 2 are generated using the proper two-dimensional correlated Gaussian distribution. The BLUE method is then applied in its standard formulation and iteratively. Pulls distributions allow to study the bias and the correctness of uncertainty estimates. Bias is in general mitigated with the iterative BLUE method while the standard BLUE method tends to underestimate the central value, as shown in Fig. 3. The iterative BLUE application may provide overestimates in few of the cases with large uncertainties. More detailed comparisons of the two method are available in ref. [17]. In particular, the combination uncertainty may be overestimated (and in fewer cases underestimated) in both standard and iterative BLUE, but the iterative BLUE estimates tend to be more conservative, as visible in Fig. 4. A dedicated toy Monte Carlo may be useful in case of large individual uncertainties in order to achieve a proper combined uncertainty estimate.  . Distribution of the average value of the standard (top and middle, left) and iterative (top and middle, right) BLUE estimates for different limits on r 1, 2 and for ρ 0 ≥ 0 and ρ r ≥ 0 (top) and for ρ 0 < 0 and ρ r < 0 (middle). Bottom plots show the difference of measured absolute value of the bias for the standard and iterative BLUE estimates for different limits on r 1, 2 and for ρ 0 ≥ 0 and ρ r ≥ 0 (left) and for ρ 0 < 0 and ρ r < 0 (right). Positive values indicate a smaller bias in the iterative method compared to the standard method. Underestimate of the uncertainty (negative values) occur in fewer cases with the iterative BLUE method than with the standard BLUE method. The figure is from ref. [17].

Conclusions
Combination of measurements is a crucial task to improve the precision on the knowledge of important parameters. Combining individual likelihood functions before applying any statistical method is the most rigorous and precise approach, but often the full likelihood function of individual measurements is not available. When uncertainties and their correlations are available, the BLUE method is a simple and powerful tool to combine measurements. But BLUE may have counterintuitive behaviors such as negative weights or uncertainties that may decrease for increasing correlation. Correlation needs to be accurately evaluated in those cases, and an assumption of a 100% correlation is not necessary a conservative choice. In case relative uncertainty contributions are known, or more in general when uncertainty estimates depend on the central values, the BLUE method may exhibit a bias that can be mitigated, in most of the cases, with an iterative application of the method where uncertainties are rescaled at each iteration to the combined central value.