Peelle ’ s Pertinent Puzzle and its Solution

Peelle’s Pertinent Puzzle is a long-standing problem of nuclear data evaluation. In principle it is a phenomenon exhibiting unexpected mean values for experimental data affected by statistical and systematic errors. This occurs for non-linear functions of statistical quantities, e.g. for a product, but not for a sum. In the literature on nuclear data, this phenomenon was attributed to the underlying non-linearity of the relation between data. Here, we show in terms of Bayesian Statistics that Peelle’s Pertinent Puzzle is primarily caused by improper estimates of covariance matrices of experiments and not exclusively by non-linearities. Applying the correct covariance matrix leads to the exact posterior expectation value and variance for an arbitrary number of uncorrelated measurement points which are normalized by the same quantity. It is also shown that the mean value converges in probability to zero with increasing number of observations, if the improper covariance matrix is applied.


Introduction
Peelle's Pertinent Puzzle (PPP) [1,2] denotes the occurrence of unexpected values of quantities that are estimated from experimental data which are affected by statistical and systematic errors.More specifically, the weighted mean is outside the range of the corresponding observations.For highly correlated data, the occurrence of PPP might be reasonable.However, in some (non-linear) cases, PPP is caused by an improper construction of the experimental covariance matrix [3].We show how to avoid PPP in this circumstance, in spite of the non-linear dependence of the estimated quantity on the observed data.Originally, PPP was formulated for the case of only two uncorrelated observations scaled by the same quantity.We extend the scope of our investigation to an arbitrary number of observations of the same physical variable scaled by the same scaling factor.
The puzzle is best illustrated by the example of two uncorrelated measurements q 1 , q 2 of the same unknown quantity α, with standard errors σ 1 , σ 2 .Both measurements are normalized by a factor N which is a stochastic quantity with expectation η and standard deviation σ N .This results in a pair of correlated observations r i = Nq i , i = 1, 2, from which we wish to estimate the quantity ρ = ηα.

Least-squares Estimation
Obviously, r 1 and r 2 are correlated because of the scaling by the common factor N. Usually the covariance matrix C of r 1 and r 2 is approximated by linear error propagation, see [4,5]: The least-squares estimator ρ of ρ is then given by [6]: From this follows: In the Bayesian paradigm [7] the posterior of ρ is a Gaussian density, according to the principle of maximum entropy [8]: In our example we obtain: 36.550 −39.474 −39.474 52.632 The estimate ρ is outside the range of the observations.The reason for this strange result is the negative factor in front of r 1 , so that ρ is not a convex combination of r 1 and r 2 .Fig. 1.The Gaussian posterior in Eq. ( 2) and the exact posterior in Eq. (3).

Exact Posterior
If we use the full information (q 1 , q 2 , N) and assume Gaussian distributions for α and η according to the principle of maximum entropy, we can construct the exact posterior density of ρ: with The posterior mean and the posterior variance of this density are In our example this evaluates to E(ρ) = 2.1639, var(ρ) = 0.1114.
The mean E(ρ) now lies between r 1 and r 2 , conforming to our expectations.The posterior is not Gaussian, but has very small skewness in our example: Fig. 1 shows that the true posterior is very close to a Gaussian, therefore the non-linearity of the function ρ = ηα cannot be the only source of the pathologically low estimate ρ, contrary to some of the pertinent literature [4,5].

The Proper Covariance Matrix
The linear error propagation in Eq. ( 1) requires the partial derivative ∂ρ/∂η = α.As α is unknown, it has to be approximated by a value a computed from the data.Before we specify a we note that using the same a for all matrix elements gives the following covariance matrix [3]: Inserting this covariance matrix into the Gaussian density postulated by the principle of maximum entropy, results in: Remarkably, the mean value ρ a does not depend on the choice of a in our setting.The obvious candidate for a is the experimental mean value q.Using this value gives the covariance matrix and the posterior with In our example: The estimate ρ is now a convex combination of r 1 and r 2 and therefore inside the range of observations.The posterior mean of the density in Eq. ( 4) coincides with the true one, and the posterior variance differs only by the higher order term.The exact posterior variance can be obtained by choosing a such that a 2 = q 2 + σ 2 .Fig. 2 shows that the posterior P is nearly identical to the true one.

Comparison of the Estimates
It is possible to rewrite ρ in Eq. ( 2) as .
Thus we always have: Equality holds if and only if q 1 = q 2 .The various estimates are summarized in Table 1 and Fig. 3.
2nd Workshop on Neutron Cross Section Covariances Fig. 2. The Gaussian posterior in Eq. ( 4) and the exact posterior in Eq. ( 3).
Fig. 3. Graphical representation of the observations r 1 , r 2 , their standard errors, and the estimates ρ and ρ.

The Geometry of PPP
According to a result by Sivia [9], PPP occurs if and only if the covariance between r 1 and r 2 is larger than the smaller of the two variances: cov(r 1 , r 2 ) > min (var(r 1 ), var(r 2 )) .
The underlying reason is that in this case the least-squares estimate is not a convex combination of r 1 and r 2 .Because of this the estimate is outside the range of the observations.The covariance matrix Ĉ in our example, As a consequence, the estimate is a convex combination of r 1 and r 2 and always lies between the two.We now ask how frequently PPP occurs for a given experimental setup, when Ĉ is used in the estimator.To answer this question, we have performed a simulation experiment using the following values: The results from 5000 pairs of observations are shown in Fig. 4, for different values of σ N .If the ratio of σ N to σ i rises, the correlation between r 1 and r 2 increases, and along with it the frequency of PPP.
Linear error propagation as applied in [4,5] gives the following joint covariance matrix of r: Using the Sherman-Morrison identity to invert Ĉ, we can give an explicit expression for the least-squares estimator: , with Alternatively, linear error propagation using q yields: N ww T , w = (q, . . ., q).The corresponding estimator reads: The Cauchy-Schwarz inequality implies s 2 1 ≤ s 0 s 2 , so that we have proved for all m: Equality holds only if all q i are the same as already shown in [3].
We now proceed to give an estimate for the bias of ρ.We can write ρ as with ξ = (s 2 − s 1 2 /s 0 ).It is easy to show that ξ is invariant with respect to α.Without loss of generality we can therefore assume that α = 0.In this case s 2 is χ 2 distributed with m degrees of freedom, and s 1 2 /s 0 is χ 2 distributed with one degree of freedom.It follows from Cochran's theorem that ξ is χ 2 distributed with m − 1 degrees of freedom, and that ξ and q are independent.We therefore have: Linear error propagation gives the following approximate expressions for the mean, the bias and the variance of ρ: We have checked the validity of the approximation by a simulation experiment with the following assumptions: α = 2, η = 1, σ i ∼ Un(0.1, 0.12), σ N = 0.15.
We have simulated 5000 data sets for m = 2, 5, 10, 20.Table 2 shows a comparison of the bias b = ρ − ηα with the average approximate bias b1 , where the angular brackets denote the average over the simulated samples, and the analogous comparison of the variances.There is fairly good agreement, an indication that the approximation is adequate.
The asymptotic distribution of ρ for m −→ ∞ is given by the following theorem.
Theorem 1 The random variable ρ converges in probability to 0 as m −→ ∞, provided that the σ i stay bounded.
Proof Let σ denote an upper bound of all σ i .The variance of q is not larger than σ 2 /m for all m and converges to 0. The mean of the numerator n = N 3 q in Eq. ( 5) does not depend on m and its variance converges to var(N 3 ) α 2 as m −→ ∞.Hence, for every ǫ > 0 we can find a number M > 0 so that for all m the absolute value |n| is smaller than M with a probability close to 1:

Now consider the denominator
N in Eq. ( 5).By choosing m sufficiently large, we can ensure for every To this end, we increase m until the ǫ/2-quantile of the distribution of d exceeds D. This is possible because for large m the denominator is dominated by ξ, and because the ǫ/2-quantile of the χ 2 distribution with m − 1 degrees of freedom exceeds any bound as m −→ ∞.
If |n| < M and d > D then certainly | ρ| = |n|/d < M/D.From this follows: Now, for every δ > 0 we can choose D such that M/D < δ.Thus, for every δ > 0 and every ǫ > 0 we can find an m such that It is natural to ask whether for m > 2 there is a simple criterion for the occurrence of PPP, similar to the one found by Sivia for m = 2 [9].If ρ is a convex combination of r 1 , . . ., r m , then PPP is excluded.The converse, however, is no longer true in general if m > 2. If PPP does not occur, it does not follow that ρ is necessarily a convex combination of r 1 , . . ., r m , as can be ascertained by the following counterexample: ρ is always an affine combination of r 1 , . . ., r m , i.e., the coefficients sum to 1, so a general criterion would be required to determine whether an affine combination lies in the interval [r min , r max ].We are not aware of such a criterion, and it is doubtful whether one exists.
We now investigate the probability of PPP as a function of m, the experimental conditions staying the same.As we have not been able to find an explicit expression for the joint distribution of ρ and r min , we have resorted to the simulation experiment described above.Fig. 5 shows the empirical distribution of ρ vs. r min for various values of m.PPP occurs whenever ρ < r min .Clearly the frequency of PPP rises with m.For m = 20 only a single sample out of 5000 does not show PPP, so the probability of the latter must already be very close to 1.
We can explain this effect by the following considerations.For small m the distribution of r min is centered somewhat below ηα.With rising m the distribution is shifted toward 0. If ηα is sufficiently large this shift is slower than the shrinking of ρ toward 0. For larger m ρ is therefore virtually always smaller than r min .It has to be noted, however, that for extremely large values of m, many orders of magnitude beyond the range of practical relevance, negative values of r min are possible, in which case PPP does not occur.
The situation is different if ηα is small compared to σ N and the σ i .In the extreme case of α = 0, r min is virtually always negative, while ρ shrinks to zero, so that PPP is increasingly unlikely.As an illustration, Fig. 6 shows the distribution of ρ vs. r min for α = 0.2 and η = 1, with the same σ N and σ i as in Fig. 5.The mean value of r min shifts at the same rate as before, but the shrinking of ρ towards 0 is now slower than before, because the bias of ρ is proportional to ηα.Thus PPP does not occur at all.

Conclusions
We have shown that the occurrence of unexpectedly small mean values, termed Peelle's Pertinent Puzzle, is caused by improper construction of the covariance matrix of the reduced observations r i = Nq i , and not, as widely believed, by the non-linearity of the underlying functional relationship ρ = ηα, see also [3].If the proper value of the derivative ∂ρ/∂η is used in the linear error propagation, the estimate is always in the range of the observations, and the Gaussian posterior is nearly indistinguishable from the exact posterior.As the frequency of PPP rises with the number of observations for sufficiently large α, the proper construction of the joint covariance matrix is absolutely essential.
In the case of more complex non-linear relationships, one should check whether the exact posterior density can be well approximated by a Gaussian density based on the reduced observations.Otherwise, more moments or even the entire posterior density have to be put at the disposal of the subsequent analysis.

Fig. 4 .
Fig. 4. Frequency of PPP for various values of σ N .

Table 1 .
Summary of the estimates ρ, ρ, their variances and the exact posterior moments.
C 12 = 0.1054 < 0.1154 = min C 11 , C 22 ⇒ no PPP!In general, PPP cannot occur if we use C, because the covariance never exceeds the smaller variance:

Table 2 .
Comparison of the bias b of ρ obtained from the simulation experiment with the approximate bias b1 , and of the variance σ2 of ρ obtained from the simulation experiment with the approximate variance σ21 , for m = 2, 5, 10, 20.