Monte Carlo integral adjustment of nuclear data libraries – experimental covariances and inconsistent data

Integral experiments can be used to adjust nuclear data libraries. Here a Bayesian Monte Carlo method based on assigning weights to the different random files is used. If the experiments are inconsistent within them-self or with the nuclear data it is shown that the adjustment procedure can lead to undesirable results. Therefore, a technique to treat inconsistent data is presented. The technique is based on the optimization of the marginal likelihood which is approximated by a sample of model calculations. The sources to the inconsistencies are discussed and the importance to consider correlation between the different experiments is emphasized. It is found that the technique can address inconsistencies in a desirable way.


Introduction
Integral experiments (benchmarks) can be used to adjust nuclear data (ND) libraries.The process is referred to as Integral adjustment (IA).The motivation is to use the full width of the experimental data available and hence increase the predictive power of the ND library.IA is usually done implicitly, when models, parameters, and experiments are chosen in order to assure proper performance of the ND library in respect to the integral data.Alternatively, the IA can be performed explicitly, where the ND-library is tuned to the integral data using fitting routines and mathematical models.This paper relates to such explicit IA, and from here on the use of IA will only refer to explicit IA.
IA is a well-established field [1,2] and can be performed by deterministic adjustment techniques [1] or by Monte Carlo IA (MC-IA) [3,4]; this paper concerns the later which sometimes is referred to as Bayesian Monte Carlo (BMC) [5].However, since BMC can be used for both differential-and integral data and this particular paper addresses the use of integral data, we will use the term MC-IA MC-IA has been explored by many authors, both for improving the best estimate (central value) of the ND file [6], but also to reduce the ND uncertainty [2 -4].In this paper, the MC- IA method is extended by including a marginal likelihood optimization (MLO) technique to address inconsistencies between the experimental and calculated integral parameter (in this case keff).The paper also discusses the sources of these inconsistencies.Methods for treating inconsistent integral data has been proposed earlier, such as the adjustment margin (AM) method [1], and the Δχ 2 -filtering [7].The AM method does not consider correlations and is binary.The Δχ 2 is also binary, and the choice of the rejection criteria appears to be rather arbitrary.In a binary method, an experiment is either kept or rejected.I.e., it is assumed that we have either full or no confidence it the experiment.We believe that this is a too course selection mechanism and that a continuous scale representing our trust in an experiment is more appropriate.The proposed method is statistical well-founded, and address the mentioned issues of the previous methods.The treatment of inconsistent data and handling of outliers is a wide research field, and the above-mentioned methods are examples of techniques that have been used in the IA community.Many other methods have been suggested for treatment of inconsistent data and Ref. [8] compares the performance of some different methods.In [8], the Bayesian procedure to adjust uncertainties performed best.The presented Bayesian procedure determines a factor to rescale the uncertainties of all data points.In contrast, our approach allows uncertainties to be adjusted individually.In the Bayesian setting, our MLO approach corresponds to the maximum a posteriori probability estimate of the uncertainties.
This paper is outlined in the following way, in chapter two the basics of MC-IA is explained as well as the underlying motivation for adding a calculation uncertainty when using IA.An example of an MC-IA with inconsistent data is also provided.Chapter three describes the MLO technique.Chapter 4 presents the results when using the MLO technique in combination with the MC-IA.Chapter 5 contains the conclusions.

Monte Carlo Integral adjustment
In IA, the ND is updated to reduce the deviation between the calculated and experimental value of the integral experiment.There are many different types of integral experiments and in this paper criticality experiments from the ICSBEP [9] is used.More specifically, the ICSBEP MCNP benchmarks of the integral experiments are used to calculate the keff of the benchmarks.This is subsequently compared to the reported benchmark values.A reported benchmark value is based on the experimental result of the underlying integral experiment and hence referred to as the experimental value.
MC-IA uses ND in the form of random files.Each ND random file is a full ENDF (but without a covariance file), which can be a possible realization of the underlying ND given the PDF of the ND.I.e., the set (e.g., 1000) of ND random-files is a way to provide the PDF of the ND.ND-random files can be obtained through the so-called Total Monte Carlo (TMC) method [10,11].These TMC random files are provided on the TENDL webpage or can be produced using the so-called T6 code package [10].Alternatively, ND random files can be generated by sampling the covariance matrix of an ENDF (e.g., JEFF3.3).This sampling can be achieved by using tools like SANDY [12,13], NUSS [14], or NUDUNA [15].One advantage of using TMC random files is that these can contain higher order moments of the ND-PDF as well as more cross-correlations between different parts of the ND file.
Given a reasonably large set of random-files (at least 1000, but in many cases more files are needed for convergence), an MC-IA can be performed.Each random file is used in the benchmark calculations, e.g., calculating the keff for a set of benchmarks.The MC-IA is performed by assigning weights, w, to the different random files, reflecting the agreement between the calculated and the experimental value.The weight is set to, (see, e.g., [16 -18]) where wi is the weight for random file i, δ is an arbitrary normalization constant and where b is the benchmark number, Cb,i is the calculation result of benchmark b for random file i, Eb, is the experimental value of benchmark b and σb is the benchmark uncertainty.If correlations are also considered we instead us the generalized χ 2 ( ) ( ) where Ci and Ei are the vectors with the calculation and experimental results for all benchmarks.B is the covariance matrix for the benchmarks.B or σb should contain the reported experimental uncertainty σreported, the statistical uncertainty of the MCNP calculations, σstat, and possibly other calculational uncertainties [4] (see Eq. 4 for details).In Refs.[19,20] the same approach is used, but in [20] with a different definition for the weights.
When the weights for the random files have been calculated, a new PDF for the underlying ND can be determined [3].In addition, a new weighted PDF for the calculated value of the benchmark or for any other macroscopic parameter for any system under investigation can be inferred.I.e., using the weights, a posterior best estimate and ND uncertainty can be deduced for, e.g., the benchmarks used in the IA.This is also how the weights are used in this work.An example of this is seen in Fig. 1 where 1000 TENDL 2014 random files for 235 U and 238 U are used to calculate keff for five benchmarks.These benchmarks and random files are used for all the results in the paper.As can be seen in Fig. 1, all except IMF3 obtain better agreement (in absolute values) after the IA, and the ND uncertainty is reduced for all the benchmarks.In fact, the ND uncertainty is reduced to such an extent that the calculation results are hardly consistent with the experimental results.A p-value of 6 % is obtained if both experimental and ND uncertainty is included.The p-value is defined as the probability that the difference between the experiments and the calculation would be at least as extreme as the obtained result, given the null hypothesis.In our case, the null hypothesis is that the difference between the experimental data and the model prediction is governed by a multivariate normal distribution with a zero mean-vector and a covariance matrix composed as the sum of the experimental covariance matrix and the model covariance matrix.The test statistic is the generalized chisquare value and "as least as extreme" has to be understood with respect to this test statistic.IMF3 cannot be adjusted to obtain a good agreement with the experimental value simultaneously as the other values were adjusted because of the strong calculation correlations between the benchmarks (see Fig. 2 left).E.g., if the fission cross-section in the fast energy-range is increased for 235 U, keff is increased for all the benchmarks.
In fact, by eye one can easily believe that the prior data (data before the adjustment) is in good agreement with the experimental data.However, if the p-value is calculated for the prior data including the full ND covariance a p-value of 11% is obtained.The reason for this low p-value is the strong correlations between the benchmarks.What we witness is that the calculation data disagrees with our set of experimental data.Differently expressed, given the calculational data there is a disagreement between the experiments.This disagreement can be due to: • Model defects.I.e., the true ND is not within the PDF of the reported ND (in our case reported using random files).• The adjusted ND (in this case 235 U and 238 U) contain errors that are not reported in the covariance file or represented by the random files.E.g., in many libraries, covariance information for angular distribution is missing.• Unaccounted experimental uncertainties or covariances.
• Isotopes not considered.In this case, only 235 U and 238 U are adjusted.However, the actual experiments contain other isotopes that could be responsible for the disagreement.These isotopes can either be included in the adjustment procedure or should be marginalized, i.e., added to the benchmark uncertainty [4].Whatever the reason, it is clear that the posterior uncertainties are too small given our set of experiments.This disagreement needs to be addressed in some way.

Addressing inconsistent experiments
As mentioned previously there are many reasons for the data to be inconsistent, both among the different experiments and between the experiments and the calculation.Many of these underlying causes can be treated by different approaches.E.g., model defects can be considered in new ND evaluations [21,22] or better models can be used; ND evaluations can include more uncertainties (an example of this is the TENDL evaluations [10], which report uncertainties on most ND quantities); experiments can be re-evaluated to find unaccounted uncertainties; and more isotopes can be considered in the IA [4].We encourage and are part of these efforts, but also acknowledge that at any given time an IA is performed, remaining inconsistencies need to be treated.In this section, we present the MLO technique for the treatment of such inconsistencies.In [23] a different version of the MLO approach based on the construction of a sensitivity matrix of the model by linearizing it was suggested.Here we do not linearize the model but build the MLO approach on top of a sample of model calculations.
We add individual extra uncertainties, σextra,b, to each of the benchmarks to account for the inconsistencies.In addition, we add a fully correlated common uncertainty, σextra,c, to all the benchmarks.I.e., for each benchmark the new benchmark uncertainty is calculated as In this way, we try to model that there can be effects that are common for all the experiments and that there can be additional individual components in each experiment.The magnitudes of the σextra are found by using MLO.Details on the technique can be found in [23].The likelihood for a σextra is described by, ( ) where n is the number of experiments and ||is the determinant of the benchmark covariance matrix, with the components defined in Eq. 4. With increased σextra, χ 2 decreases and the numerator term increases.However, with increased σextra, the denominator increases as well.
A derivation of Eq. 5 can be found in [24].An example of the likelihood function in one dimension is shown in Fig. 2 (right), where σextra for HMF1 is varied and kept fixed for the other experiments.As can be seen, the most likely σextra for HMF1 is close to 250 pcm in this example; this value should be used for σextra.In the full exercise, the six-dimensional space (with five individual σextra,b and a σextra,c) was explored, and L was optimized (maximized).For the optimization, the minimum of the negative logarithm of L was found using the python scipy.optimize.minimizepackage with the 'SLSQP' method.For the calculation of the determinant, a Cholesky decomposition of B was performed using the python numpy.linalg.choleskypackage.
Given no information from the data, we would generally assume that the σextra should be small since considerable work has been done in both evaluating the experiments and creating the ND files.Eq. 5. does not consider this prior belief.To promote small σextra, a new term is added to the likelihood, ( ) where the summation is over both σextra,b and σextra,c, β is a constant that allows us to shape our prior belief on the σextra.Here, β is set so that 500 pcm σextra has a prior probability (before inspecting the data) of 50 % compared to zero σextra.In the choice of β, expert judgment is introduced in the methodology.We use a Gaussian function in the added term in Eq. 6, but other functions have also been suggested [23].It is not absolutely necessary to include the term.However, by not including it σb becomes independent of σreported.I.e., the information from the evaluation of the benchmarks uncertainties is not affecting the final results.This would be undesired, and hence we recommend using a prior term.In addition, β allows us to code expert knowledge into the likelihood.In our case β was set to the same value for all the experiments; however, β can be chosen to have individual values for different experiments to encode our belief in a particular experiment.

Results
The adjustment was tested including σextra,c; without σextra,c; using Eq.5; and using Eq. 6. Fig. 3 illustrates the results using Eq.6 and including σextra,c.Fig. 3 now presents a consistent data set.If compared to the results without using the MLO (Fig. 2), one can note that the posterior ND uncertainty is dramatically increased (e.g., from 69 pcm to 253 pcm for HMF1) In this case, the total benchmark uncertainty is augmented for all benchmarks, mostly since a 209 pcm common uncertainty component is added to all the experiments.In fact, only IMF3 obtains a significant individual extra uncertainty added and the σb,IMF3 increases from 170 pcm to 468 pcm.The results become quite different if σextra,c is assumed to be zero, as can be seen in Table 1.In the MLO σextra,b were allowed to take negative values as long as the sum of σextra,b and σextra,c was larger than zero.The motivation for this was that part of the reported experimental uncertainties could stem from common causes in the different benchmarks.To assure to retain a minimum individual uncertainty the criterion σ 2 extra,b > -0.5σ 2 reported,b was also included.

Conclusions and outlook
A new technique to address inconsistent integral experiments using an MC-MLO technique is proposed and applied to a set of benchmarks.The method can resolve the inconsistencies in the data, and the outcome of the results is desirable.To our knowledge, the MC version of the technique has not been proposed before in the context of ND-evaluation, and we believe it is the first time an MLO technique is proposed for IA.
The technique has many desirable features, e.g., being statistical well-founded; can be integrated with expert judgment in a transparent way; is not binary; takes into account both experimental and calculational correlations between the benchmarks; and results in reasonable posterior ND uncertainties.MLO is hence a promising technique to be used for integral adjustment in future library releases.
We suggest that the technique should be tested with a larger set of benchmarks, where also validation benchmarks are used to infer the predictive power of the method.In addition, the method should be compared quantitatively with other discrepant data techniques.

Fig. 1 .
Fig. 1. (Color online)The figure shows the results for five criticality benchmarks.On the y-axis is the difference between the mean of Cb,i (over i) and the experimental value in PCM.The red crosses are the results before the weighting of the random files.The blue dots are the result after the weighting of the random files using Eq. 1 The error-bars connected to the crosses and dots contain the ND uncertainty of the calculated keff.The bars without marks are the benchmark uncertainties.

Fig. 3
Fig. 3 Same as Fig. 2, but using the MLO technique to include σextra.The green bars without marker contain the benchmark uncertainty including σextra.

Table 1 .
Table of resulting σb.The second row contains σreported (No MLO performed).In the third row σextra,c is assumed to be zero.The third and fourth row uses Eq.5.The last row uses Eq. 6.