Pseudo-measurement simulations and bootstrap for the experimental cross-section covariances estimation with quality quantification

The classical use of a generalized χ-distance to determine the evaluated cross section uncertainty requires the values of the experimental cross sections covariance matrix. The usual propagation error method to estimate the covariances is hardly usable and the lack of data prevents from using the direct empirical estimator. Thus we present an alternative which exploits a regression model of the experimental cross section to generate pseudo-measurements and thereby allows an estimation of experimental covariances. The problem of assessing the quality of the estimate still remains. In our approach, we propose to determine the estimation quality through the means of the bootstrap method. We show on numerical examples that the bootstrap allows to have an order of magnitude of the estimation quality through a matrix norm. All the results are illustrated with a toy model (where all quantities are known) and also with real cross-section data measurements.


Introduction
Generalized χ 2 -distance for the evaluated cross section uncertainty determination needs to take into account the correlations and the uncertainties of the experimental measurements.As the covariances are not provided, an estimation of the experimental cross section covariance matrix Σ F and its inverse is necessary.Since in many cases only one measure per energy is available, the standard empirical estimator is not available and the classical method of error propagation [1] is rarely usable due to the lack of information on the experimental parameters and due to the linearity assumption that is not fullfilled.Thus we propose in this article a statistical approach for the covariances evaluation based on the generation of gaussian pseudo-measurements around a regression model which is assimilated to the mean.The main advantage of this approach is that we do not impose any structure to the matrix.It remains to quantify the quality of the obtained matrix.We then propose to quantify the estimator quality via a bootstrap approach.In order to test our alternative and the bootstrap approach we use a toy model for which we know all the quantities we try to estimate.We present our alternative for the covariance evaluation in the next section and we give a way to evaluate the estimator quality in the third section.The entire approach is tested with the toy model and also illustrated in the case of the 25 55 Mn nucleus in the corresponding sections.
Notations Let F = (F 1 , ..., F N ) t the random vector of the measured cross section 1 at N distinct energies E j , j = 1, ..., N .We assume that F is gaussian.The covariance matrix of a e-mail: suzanne.varet@cea.fr 1 () t denotes the transpose of a vector.F is denoted by Σ F .We assume that we can observe n realisations of the F vector writen F (1) , ..., F (n) (we recall that, most of the time, for the cross-sections experiments n = 1).
Toy model description For the toy model construction we simulate n realisations of a gaussian vector, that can be assimilated to the experimental cross sections, with a fixed mean m M and a fixed covariance matrix Σ M .In the numerical examples of this article we have taken N = 5 and n = 1.

Experimental covariance matrix evaluation
Our alternative for the estimation of Σ F first consists in summarizing the experimental measurements with a regression model.We have chosen a non-linear SVM (Support Vector Machine) model [2].We assume that in the energy range of interest (> 1MeV) the response is a smooth and flat function of the energy.We denote by h(E) the SVM regression model evaluated at the energy E. Assuming that the measurements are gaussian (section 1), we generate r vectors S (1) , ..., S (r) such as S (i) is a gaussian random vector of N independant random variables with expectation corresponding to the value of the regression model.For the noise variances, we refer to the square of the experimental uncertainties σ i , i = 1, ..., N .The final sample (F (1) , ..., F (n) , S (1) , ..., S (r) ) is denoted by X = (X (1) , ..., X (n+r) ).With the pseudomeasurements, we compute the empirical estimator of Σ F where the empirical mean is replaced by the regression model value.Let, 2 be the estimation of the diagonal terms.Finally with the diagonal of the empirical estimator taken as (σ 2 1 , ..., σ 2 N ), we obtain the estimator Σ F .The Σ F term at the i th line and j th column C ij , defined by the estimate of cov(F i , F j ), is given by C ii = σ 2 i and for i = j by the following equation: As for each k = 1, ..., r and i, j = 1, ..., N , i = j, the two realisations S (k) i and S (k) j are assumed to be independent, if r is too large (in eq. ( 1)) compared with n, then Σ F is close to a diagonal matrix.Moreover, in many applications, the inverse covariance matrix is required.Thus, r must be large enough to ensure the invertibility of Σ F .As a consequence, to determine the number r of generated vectors, we distinguish two cases: with r = 0, Σ F is invertible (case 1) or not (case 2).In the first case, the direct empirical estimator of the covariance matrix of F = (F 1 , ..., F N ) is invertible.In the second case, we generate as many vectors as needed to ensure the invertibility of the covariance matrix.Application to the toy model Using one pseudomeasurement, we can see on the figure 1 that the covariance matrix estimation (eq.( 1)) is close to the real one.

EPJ Web of Conferences
Application to the 55 25 Mn nucleus The 55 25 Mn (n,2n) cross section is measured at N = 11 energies in the range 12 MeV to 20 MeV with n = 1.The experimental data, extracted from [3], are ploted with their uncertainty on the figure 1(a) (green).We build a SVM regression model on the experimental data (red curve figure 1(a)).We generate r = 1 pseudo-measurements (blue circles figure 1(a)) per energie with a gaussian noise N (h(E i ), σ 2 i ) for i = 1, ..., N .The N = 400 experimental data (with n = 1) (green circles) of the total 55 25 Mn cross section extracted from the EXFOR database 3 , the SVM regression model (red curve) and the r = 1 pseudo-measurement (blue circles) are ploted on the figure 2(a).

Quality assessment
Providing an estimation of the experimental covariance matrix, how far is the estimation from the real matrix?To answer this question we have choosen the following scalar criterion: 3 http://www.oecd-nea.org/dbdata/x4/03002-p.3

WONDER-2012
where .F is the Frobenius norm (for any q × m matrix A with elements a ij , ) which is easily computable and interpretable.Even if (2) can be estimated by IE( Σ F )−Σ F , we propose an overestimation IE( Σ F −Σ F knowing that IE( Σ F )−Σ F ≤ IE( Σ F − Σ F ) due to the norm convexity.However, given that our interest is the estimation of Σ F −1 , we introduce the Kullback-Leibler measure of Σ F which is defined by To estimate the norm of the difference between the two matrix and the Kullback-Leibler measure we propose to use a parametric bootstrap methodology.Thus, we distinguish again the two cases: with r = 0, whether Σ F is invertible or not.In the first case, we create B bootstrap samples (F (1) , ..., F (n) ) 1 , ..., (F (1) , ..., F (n) ) B following a gaussian law N (H, Σ F ) where H = (h(E 1 ), ..., h(E N )) t .In the second case, we create, in addition to (F (1) , ..., F (n) ) 1 , ..., (F (1) , ..., F (n) ) B , B bootstrap samples (S (1) , ..., S (r) ) 1 , ..., (S (1) , ..., S (r) ) B following a gaussian law N (h(E i ), σ 2 i ).The sampling (S (1) , ..., S (r) ) 1 , ..., (S (1) , ..., S (r) ) B with a diagonal matrix allows to reproduce the initial generation of (S (1) , ..., S (r) ).Then, in both cases we compute the B covariance empirical estimators Σ F b , b = 1, ..., B where the Σ F b term at the i th line and th column, C b ij is defined by C b ii = σ 2 i and for i = j by a similar expression as eq.( 1).We estimate and the Kullback-Leibler measure by Application to the toy model The quality estimators (eq.( 4) and ( 5)) can be tested now with the toy model (section 1).We take the case 2 with N = 5 and n = 1.We compute the svm regression model of the vector F (1) and we generate r = 1 pseudo-vector S (1) of N = 5 independant random variables.In order to average the results, we repeat this operation fifty times.That is we generate 50 * r vectors noted S (1)1 , ..., S (1)50 .For each sample (F (1) , S (1)k ), k = 1, ..., 50, we compute the norm of the difference between the empirical estimator and the real matrix Σ F − Σ F and the Kullback-Leibler measure of Σ F (eq. ( 3)).We compare the mean and the variance of these two values with the mean and the variance of their bootstrap estimations (with B=30) (eq.( 4) and (5) respectively).The mean and the variance of the bootstrap estimations are obtained in repeating the bootstrap estimation about fifty times.We realise this computations for ten 1-sample of measurements.
We can see (figure 3) that the bootstrap allows to have a fine estimation both in terms of mean and variance of the deviation from the real matrix.

KL KL boot
We see (figure 4) that the bootstrap does not estimate as well the Kullback-Leibler measure as the Frobenius norm.However it can be sufficient to have an order of magnitude for the matrix estimation quality.Application to the 55 25 Mn nucleus We have estimated the deviation of the (n,2n) cross-section covariance matrix (figure 1(b)) from the real matrix.The bootstrap gives Σ F − Σ F ≈ 0.019 with a variance equals to 1.2 10 −5 and KL( Σ F , Σ F ) ≈ 6.834 with a variance equals to 2.351.As this case is, in dimension, close to the toy model, we can assume, with the estimation of Σ F − Σ F that our estimation of the covariance matrix is good enough and, with the estimation of KL( Σ F , Σ F ), the inverse seems not too far from the real inverse.
For the total cross-section (figure 2(b)) the bootstrap gives Σ F − Σ F ≈ 2.385 with a variance equals to 4.3 10 −5 and KL( Σ F , Σ F ) ≈ 386.36 with a variance equals to 6.19.We can conclude for this case that even if Σ F seems to be close to true matrix (due to a low norm value) the estimation Σ F −1 is probably far from the true inverse (a high K-L measure).

Conclusion
In this article a new protocol for experimental covariance matrix estimation has been presented.We focus on the situation where the lack of measurements prevents from using the direct empirical estimator and the informations on the experimental protocol are not available.This approach is based on the construction of a regression model to replace the empirical mean and the generation of pseudo-measurements as a noise around this regression model.The main advantage of this approach is that we can quantify the estimator quality thanks to a bootstrap sampling.Moreover, as it is a statistical approach, only measurements are needed and the only assumption is the gaussianity of the measurements which seems to be reasonable for experimental data.The numerical validation of our approach has been done by the construction of a gaussian toy model.Indeed the numerical tests have pointed out that we can easily supply an estimation of the covariance matrix.Moreover the toy model has illustrated that the bootstrap can be a good way to quantify the matrix estimation quality.
Estimation of the experimental covariances.

Fig. 2 .
Fig. 2. Total 55 25 Mn cross section measurements and color map of the ΣF estimation.