On the use of Bayesian Monte-Carlo in evaluation of nuclear data

As model parameters, necessary ingredients of theoretical models, are not always predicted by theory, a formal mathematical framework associated to the evaluation work is needed to obtain the best set of parameters (resonance parameters, optical models, fission barrier, average width, multigroup cross sections) with Bayesian statistical inference by comparing theory to experiment. The formal rule related to this methodology is to estimate the posterior density probability function of a set of parameters by solving an equation of the following type: pdf(posterior) ∼ pdf(prior) × a likelihood function. A fitting procedure can be seen as an estimation of the posterior density probability of a set of parameters (referred as x) knowing a prior information on these parameters and a likelihood which gives the probability density function of observing a data set knowing x . To solve this problem, two major paths could be taken: add approximations and hypothesis and obtain an equation to be solved numerically (minimum of a cost function or Generalized least Square method, referred as GLS) or use Monte-Carlo sampling of all prior distributions and estimate the final posterior distribution. Monte Carlo methods are natural solution for Bayesian inference problems. They avoid approximations (existing in traditional adjustment procedure based on chi-square minimization) and propose alternative in the choice of probability density distribution for priors and likelihoods. This paper will propose the use of what we are calling Bayesian Monte Carlo (referred as BMC in the rest of the manuscript) in the whole energy range from thermal, resonance and continuum range for all nuclear reaction models at these energies. Algorithms will be presented based on Monte-Carlo sampling and Markov chain. The objectives of BMC are to propose a reference calculation for validating the GLS calculations and approximations, to test probability density distributions effects and to provide the framework of finding global minimum if several local minimums exist. Application to resolved resonance, unresolved resonance and continuum evaluation as well as multigroup cross section data assimilation will be presented.


Introduction
Evaluation of nuclear data is based on nuclear reaction models and experiments.Various data assimilation procedures exist mainly based on the concept of Bayesian inference which consists in assessing the knowledge of nuclear data by updating a prior knowledge.This latter is in these energy ranges related to the estimation of models parameters, crucial ingredients of nuclear reaction models, we will first present a general mathematical framework related to parameters estimation (resonance parameters, optical models, fission barrier, average width, multigroup cross sections. . .).The Generalized least square algorithm as well as Bayesian Monte-Carlo solutions will be first described.Some illustrative examples on resolved resonance range, unresolved resonance range and high energy range (with nuclear reactions model parameters as well as directly multigroup cross sections) will be given.

Bayesian inference
Let y = − → y i (i = 1 . . .N y ) denote some experimentally measured variables, and let x denote the parameters a e-mail: cyrille.de-saint-jean@cea.frdefining the model used to simulate theoretically these variables and t the associated calculated values to be compared with y.Using Bayes' theorem [1] and especially its generalization to continuous variables, one can obtain the following relation between conditional probability density functions (written p(.)) when the analysis of a new data set y is performed: where U represents the "background" or "prior" information from which the prior knowledge of x is assumed.U is supposed independent of y.In this framework, the denominator is just a normalization constant.The formal rule [2] used to take into account information coming from new analyzed observations is: A fitting procedure can be seen as an estimation of at least the first two moments of the posterior density probability of a set of parameters x knowing an a-priori information on these parameters and a likelihood which gives the probability density function of observing a data set knowing x.
To solve this problem, one can add approximations and hypothesis and obtain an equation to be solved numerically (find the minimum of a cost function) or use Monte-Carlo sampling of all a priori distribution and estimate the final posterior distribution.
All algorithms described in this section as well as in following ones are implemented in the CONRAD code [3].

Deterministic theory
To obtain an equation to be solved, one has to make some assumptions on the prior probability distribution involved.

Basic assumptions
Given a covariance matrix and mean values, the choice of a multivariate joint normal for the probability density p( x|U ) and for the likelihood is maximizing the entropy [4].
Thus, here, a fitting procedure consists of finding the expectation and the covariance matrix of the following probability density function: where x m (expectation) and M x (covariance matrix) are prior information on x parameters and for the y experimental set, M y is the experimental covariance matrix.Here t represents the theoretical model predictions which are functions of the x to be compared to y.An additional approximation (Laplace approximation) is made considering that the posterior distribution is well approximated by a multivariate normal distribution with the same maximum and curvature at the maximum of the right side of Eq. ( 2).Then, one can demonstrate that the posterior expectation and covariances are calculated by finding the minimum of the following cost function (a Generalized Least Square):

Numerical issues and approximations for deterministic theory
To take into account non-linear effects and ensure a proper convergence of the algorithm, a Gauss-Newton iterative solution can be used.This is particularly important because the relationship between nuclear model parameters and the calculated cross sections may be highly non-linear.
The convergence estimator could be chosen related to the convergence of (χ 2 G L S ) (n) .For that purpose, we focus on the relative variation of the (χ 2 G L S ) (n) between two iterative steps: If ε (n) χ2 ≤ ε user (relative error defined by the user), we assume that the convergence is reached.
Thus, from a mathematical point of view, the evaluation of parameters through a GLS procedure suffers from the distribution choice as Gaussian for both prior and likelihood, the use of Laplace approximation (a posterior distribution is a multi-variate Gaussian as well), the linearization around the prior for Gauss-Newton algorithm and at last the fact of neglecting 2 nd order terms in the Gauss-Newton iterative procedure.

Bayesian Monte-Carlo
Bayesian Monte Carlo methods are natural solutions for Bayesian inference problems.They avoid approximations (exposed in previous section) and propose alternatives in probability density distribution choice for priors and likelihoods.The use of this kind of methods in the framework of nuclear data was first proposed by D. Smith and R. Capote for the evaluation of cross section in the continuum energy range [5].This document will expose and propose the use of Bayesian Monte Carlo (referred as BMC in the rest of the paper) in the whole energy range from thermal, resonance to continuum range.
BMC will be proposed as reference calculation to validate the GLS calculations and approximations.In addition, it allows to test probability density distributions effects and to can provide a framework for finding global minimum if several local minimums exist.

Classical Bayesian Monte Carlo
The main idea of "classical" Bayesian Monte-Carlo is the use of Monte-Carlo techniques to calculate integrals.For any function f of random variable set x, we can calculate by Monte-Carlo sampling any integral of the form: where p ( x) is a probability density function continuous over R.
You can thus estimate any moments of the probability function p ( x).For example: Application of these simple features to Bayesian inference analysis gives for posterior distribution's expectation values: The proposed algorithm to calculate the first two moments of the posterior distribution is to sample the prior probability distribution function p( x|U ) N x times and for each x k realization evaluate: With all the statistics (N x realizations), calculate the expectation and the covariances: The choice of prior distribution depends on what type of analysis is done.If no prior information is given, a non informative prior could be chosen (uniform distribution).On the contrary, for an update logic, the prior is related to a previous analysis, with a known probability distribution function.
The use of L k weights indicates clearly the major drawbacks of this classical BMC method: if the prior is far from high likelihood values and/or by nature L k values are small (because of the number of experimental points) the algorithm will have difficulties to converge.
Thus, the main problem is that the covered phase space of sampling is not favorable to convergence.In principle, and idealistically one should sample with a trial function close to the posterior distribution.In addition, one can notice that this algorithm will provide an expectation of the first two moments of the posterior probability density function but not the distribution itself (which can be anyway rebuilt by adding extra hypothesis with the previous moments).

Importance sampling
As previously exposed, the estimation of the integral of f ( x) times a probability density function p( x) is not straightforward.Especially in our case of Bayesian inference, sampling the prior p( x|U ) distribution, when it is far away from the posterior distribution or when the likelihood weights are difficult to evaluate properly, could be very expensive and time consuming without any valuable estimation of posterior.The idea is then to sample in a different region of phase space by respecting anyway the statistics.
This trial probability density function p trial ( x) can be introduced by a trick as follows: Thus, putting this expression in Eq. ( 5), one obtains: Then, sampling is done on the trial probability density function p trial ( x) getting a new set of { x k } and for each realization x k , evaluation of additional terms is necessary p( x) p bias ( x) .
As a result, expectation and covariances are defined by: The choice of trial functions is crucial and the closer to the true solution p trial ( x) is, the quicker the algorithm will be.In this paper, we used as trial function, the result of the generalized least-square (with additional standard deviation enhancement).Many other solutions can be used depending on the problem.

Prospects for Bayesian inference: Markov chain Monte-Carlo
For Bayesian inference, a tremendous activity is underway in the statistician community for using of Markov chains [6].
Metropolis-Hastings algorithm is an example of a wider range of algorithm called Markov chain Monte-Carlo.The proposed algorithm applied for Bayesian inference is based on the following paper [7].The basis of the algorithm is to use a random walk heading towards region of higher probabilities.The adaptation of this algorithm for Bayesian inference is proposed here.
At stage i, is the result of the i th iteration.At stage i + 1, sample x prior i+1 following prior density probability p( x|U ) and a number α i+1 from an uniform distribution.Thus the rejection law is the following: .
At the end of the simulation, the distribution related to the statistics { x keep } is related to the posterior density probability p( x| y, U ).Even though similarities exist between the Metropolis algorithm and the classical BMC, the major interests are to obtain the desired posterior distribution on the fly.In addition, numerical issues in classical BMC related to likelihood L k estimations are less problematic as for Metropolis what matters is to estimate ratios of L k 's.
For low statistics, a dependence to the initial point in phase space (need of multipath) could be seen and as a result a burning period is needed to get rid of unimportant statistics.In addition, the choice of trial function is still arbitrary.

155 Gd in the resolved resonance range
In the resolved resonance range, cross sections are characterized by resonance structures arising from the existence of long-lived nuclear states.This complex behavior exhibits structures which are related to resonance parameters and a model calculation based on the R-Matrix theory [8].For non-fissile target nuclei, the main parameters are the resonance energy E r , the neutron width n and the radiation width γ .
To challenge the Bayesian inference, a "fake" experimental data set was used with a fixed 1% statistical uncertainties (no systematic uncertainties) and around 2000 experimental points (see Fig. 1).It means that a potential uncertainty reduction factor could be around √ 2000.
The analysis consists in adjusting only the γ of the two resonances.Results are compiled in Table 1.
All Bayesian Monte-Carlo algorithms give both proper central values as well as covariances, but as can be seen on Fig. 2, importance algorithm is by definition converging faster than both classical Bayesian Monte-Carlo as well as Metropolis algorithm.Different behaviors are seen between latter algorithms.For the classical Monte-Carlo, steps are due to the fact that very few histories in this case are participating to the sum of Eq. ( 8), as L k is very often extremely low (almost zero).For Metropolis a similar behavior is seen but smoothed by the algorithm and instead of steps, one can see variations decreasing in amplitude towards converged values.
To emphasize on the challenging aspect of doing Bayesian Monte-Carlo with resonances, one can see Fig. 3 where the prior and posterior density probability function of the the γ amplitude is given.

238 U in the unresolved resonance range
In this chapter, we focus on a problem related to the analysis of the unresolved resonance range. 238U total cross section is studied in the energy range of [25 keV, 750 keV].The experimental data set used comes from the EXFOR database [9].A typical 1% arbitrary statistical uncertainty is chosen.This case corresponds to a classical analysis a nuclear data physicist might encounter during his work when using an average R-Matrix model.Table 2 gives the  prior parameters governing the total cross section (strength functions) as well as their prior uncertainties.
Classical, Importance and Metropolis algorithms were used for Monte-Carlo Bayesian inference.All algorithm are converging to the generalized least square solutions after around 1000 or 10000 histories for central values and uncertainties.
Figure 4 shows a 2D plot of prior and posterior statistics of two parameters when adjusting on an experiment with the Metropolis algorithm.This algorithm allows the estimation of posterior distributions on the fly.
One important aspect illustrated in Fig. 5 is the behavior of the correlation coefficient between the two evaluated parameters as a function of number of histories.This figure shows that as expected Importance sampling is converging faster and Metropolis algorithm is working better than the classical one.One major point for the "classical" algorithm is fluctuations on the correlation coefficient even at high number of histories (around several thousands) even though the central values are converged.It means that the evaluation of correlations is far more difficult than the central values and a limited set of histories could give bad results (even change of signs and magnitudes).

239 Pu in the fast energy range
In a previous paper [10], the data assimilation of JEZEBEL experiment was discussed in the framework of both nuclear reaction models (at high energies) and multigroup cross sections.This analysis is now proposed using Bayesian Monte-Carlo.The challenging issues are both to test the capability of Monte-Carlo to treat a large number of parameters (between 10 and 100) and also to investigate the feasibility of this kind of algorithm with integral experiments data assimilation (here a JEZEBEL k e f f measurement).

Multigroup cross sections adjustment
Only 239 Pu cross sections were used as parameters in this study.74 cross sections parameters (fission, capture, inelastic, elastic and nxn in a multigroup format) were sampled and re-evaluated through a Bayesian Monte-Carlo analysis.A-priori covariances matrices shows correlations between reaction channels (fission, capture, inelastic... etc.) and energy domains.Figure 6 shows an example of prior and posterior distributions obtained for 239 Pu fission cross section in two energy region.We obtain equivalent uncertainty reduction than what was presented in paper [10].

Nuclear reaction model parameters adjustment
An equivalent analysis was performed with prior parameters having uncertainties consistent with prior cross sections covariances matrices from the previous subsection.The set of nuclear reaction model parameters is related to optical potential parameters, fission barrier heights and widths.Around 19 parameters were used in the analysis.Obviously, this integral experiment is somehow an integral measurement of 239 Pu νσ f and as a result fission parameters are highly sensitive to this measure.Thus, these sensitive parameters will allow a challenging test for any Bayesian algorithm.
Figure 7 shows the posterior density probability of first chance fission barrier height for n+ 239 Pu after this Bayesian inference compared with prior distribution and GLS results.One can see the good behavior of the Metropolis algorithm in finding the posterior value of this parameter even though the change is very low: around 0.5%.

Conclusions
The use of Bayesian Monte-Carlo methods was exposed in this paper and three illustrative analysis were detailed.
One important point is that these methods could be used for resonance range analysis (both resolved and unresolved resonance) as well as higher energy models.In addition, both microscopic integral data assimilation could be achieved.Nevertheless, the major issue is related to the convergence estimator: depending on which parameters are investigated (central values or correlation between them), the number of histories (sampling) could be very different.Indeed, special care should be taken for the calculation of correlations because an additional order of magnitude of sampling histories could be necessary.Furthermore, it was shown that sampling priors is not the major problem.It is more efficient to properly sample in the phase space region where the likelihood is large.In this aspect, Importance and Metropolis algorithms are working better than brute force ("classical") Monte-Carlo.It also highlights the fact that pre-sampling prior with a limited number of realizations could be inadequate for further inference analysis.
Concerning Bayesian Monte Carlo inference methods, in the future, other Markov chain algorithms will be developed in CONRAD code and efficient convergence estimators will be proposed as well.The choice of Gaussian probability functions for both prior and likelihood will be challenged and discussed.
More generally, an open range of scientific activities will be investigated.In particular, one major issue is related to a change in paradigm : to go beyond covariance matrices and deal with parameters knowledge taking into account full probability density distributions.In addition, for endup users, it will be necessary to investigate the feasibility of a full Monte Carlo approach, from nuclear reaction models to nuclear reactors or integral experiments (or any other applications) without format/files/processing issues which are most of the time bottlenecks.
The use of Monte-Carlo could solve a generic issue in nuclear data evaluation related to difference of information given in evaluated files: in the resonance range where cross section uncertainties and/or nuclear model parameters uncertainties are compiled and in the higher energy range where only cross section uncertainties are formatted.This could simplify the evaluation of full covariance over the whole energy range.

Figure 2 .
Figure 2. Convergence of the expectation value of the first amplitude of the γ channel.

Figure 3 .
Figure 3. Probability density function of the first amplitude of the γ channel obtained with Metropolis algorithm.

Figure 4 .
Figure 4. Probability density (histograms) for prior and posterior after a Bayesian analysis with a Metropolis algorithm.

Figure 5 .
Figure 5. Evolution of correlation coefficient between two strength function parameters as a function of number of histories.

Figure 6 .
Figure 6.Probability density function (histograms) for prior and posterior 239 Pu fission cross section in two energy regions having important sensitivities to JEZEBEL k ef f measurement.Results are presented relatively to mean values.

Figure 7 .
Figure 7. Prior and posterior probability density function of 239Pu first barrier height.

Table 1 .
Comparison of bayesian inference methods (1 millions of histories) for 155 Gd γ 's adjustment.