Regression analysis of experimental reaction cross-section data of

. Pre-processing of neutron reaction cross-section is essential in the nuclear data evaluation. This work aims to pre-process experimental cross-section data of 241 Am (n, 2n) 240 Am neutron reaction. Pre-processing of the experimental data includes re-normalization, removal of the outliers, integrating multiple cross-section values at single energy to single cross-section value, and regression on the cleaned experimental data. To remove outliers from the data, standardized residual and studentized residual have been used. For integration of multiple cross-section values to single cross-section value, the weighted average method has been used. Regression on the cleaned experimental data has been accomplished using the Gaussian Process Regression (GPR) and Polynomial Regression (PR), and the performance of both regression methods has been studied using statistical indices such as the determination of coe ffi cient (R 2 ) and the sum of the square of residual (SSres).


Introduction
Cross-section, in nuclear physics, describes the probability that a given atomic nucleus will exhibit a specific reaction. Neutron cross-section data are crucial constituents in several nuclear application, therefore, accurate crosssection data are required. To generate accurate neutron cross-section, pre-processing of experimental neutron cross-section data is mandatory in evaluation process since in the evaluation process, experimental cross-section data are skilfully combined with the nuclear model predictions. Pre-processing of experimental data, includes removal of outliers, re-normalization of experimental data, integrating multiple cross-section values at single energy to single cross-section value and regression on the experimental data.
In this paper, regression has been applied on cleaned experimental data to predict the cross-section values for those energies that are not available in the EXFOR database. The application of regression in the field of nuclear is wide as in [1] multivariate regression analysis has been developed to improve a set of parameters in the design of a nuclear reactor. Regression has been used to calculate the HPGe detector efficiency at any other γ-ray energy of interest in Ref. [2]. In [3] two regression methods such as logistic regression and random forest regression methods have been used to highlight measurement features that are common among many of the outlier data points. In [4], experimental cross-section data of 100 Mo (n,2n) 99 Mo have regressed using Ridge, Lasso regression methods. The present work is focused on the pre-processing of the experimental data. Here, a neutron reaction such as 241 Am (n, 2n) 240 Am has been considered for energy range 7 MeV to 20 Mev since americium (Am) has various uses such as it can be used to measure the thickness of glass in the glass industry. Also, used in a smoke detector. This paper is organized as follows. Entire preprocessing steps and a detailed explanation of each regression method such as polynomial regression and Gaussian process regression are given in section II. The result and comparison of regression methods are given in section III. Section IV gives the conclusion of the paper.

Pre-processing on the Experimental Data
In this study, the experimental cross-section data for the 241 Am (n, 2n) 240 Am neutron reaction have been obtained from EXFOR [5] [6][7][8][9][10] and they are given in Figure.1.

Re-normalization of experimental data
The re-normalization process updates the nuclear data used in the calculation with the current/present nuclear data available in nuclear data libraries such as IRDFF-v1.05 [11] and Nuclear Wallet Cards [12] database. Required reference cross-section value of monitor reaction and half-life are taken from IRDFF-v1.05 and NuDat 3 database, respectively. In the re-normalization step, we tried to determine the cross section value (σ renorm ) based on the current values of attributes A new [13], using Eq.(1), Figure 1: Experimental cross-section data for 241 Am (n, 2n) 240 Am neutron reaction from EXFOR before preprocessing in which σ old is the reaction cross section data reported by the experimenter, based on old values of the attributes (A old ).

Removal of an outlier
In experimental data, the presence of an outlier in the data affects the accuracy of the estimation. In this study, for detection of outliers studentized residual (r i ) and standardized residual (d i ) have been used as these methods show the strength of error in the predicted value of each observation.

Merging multiple cross-section values at single energy to single cross-section value
In EXFOR data, single neutron energy may have multiple values of cross-section, this situation will create ambiguity in users' minds for the selection of cross-section. Therefore, merging multiple cross-section values to single cross-section value has been achieved using the weighted average method [14].

Regression on cleaned experimental data
To predict the cross-section values that are not available in the EXFOR database, polynomial regression (PR) [15] [ 16], and gaussian process regression (GPR) [17,18] have been used. In some cases, there is quite a possibility of getting negative values in the prediction, which might be nonphysical in some applications. Therefore, to avoid such a situation, in this study logarithmic transformation based on Box-Cox method given in [15] and [17] is applied on both sides of the model.

Polynomial Regression
To select the degree of PR, a statistical index such as the sum of the square of residual (SSres) has been used. For better performance of any regression model, the value of the SSres should be minimum. The SSres value calculated for each degree is given in

Gaussian Process Regression
GPR is a regression technique that considers the Bayesian approach to fit the non-linear data. In GPR, the output Y of function f at point X is written as, In GPR, f(X) is a function that follows the Gaussian process distribution as given in Equation 3. where m(x) is the mean and k(x,x') is the kernel. The kernel is known as covariance function that gives the dependency between two different points such as x and x'. The kernel function given in Equation 4 has been used in this work.
Here, σ 2 f , λ ,and σ 2 n are the hyper-parameters. σ 2 f is variance, λ is the length-scale, σ 2 n is the noise variance and δ is the Kronecker delta. In this paper, value of σ n has been considered based on uncertainty associated with energy, given in the EXFOR database. Initially, hyper parameters σ 2 f and λ had been calculated by maximizing the log marginal function [17,18] which resulted in the problem of overfitting. Therefore, to avoid this, λ has been calculated by considering the chi-square test (χ 2 ) [4]. The chi-square test (χ 2 ) is minimum for λ ranging from 0.27 to 0.3 whereas χ 2 is maximum for λ at 0.19001 (calculated by maximizing log marginal function). Therefore, for this type of data distribution σ 2 f and λ are considered to be 2.506774 and 0.25, respectively. The predictions for new inputs x ⋆ and associated uncertainty can be done by using Equations 5 and 6, respectively.
where, I is the identity matrix and σ 2 ϵ is observation noise. The regressed curves after the application of PR and GPR to the experimental data are given in Figure.2.

Result and Discussion
Figure.2 shows that PR prediction at lower energy levels is inferior as compared with GPR. This is because, the PR model has been trained for cross-section values above 7.59MeV energy, which is available in the EX-FOR database, whereas prediction is done from Q energy (6.647MeV) and therefore the prediction of PR is poor for energies between 6.647MeV to 7MeV. Also, performance of the GPR in this low energy region is better because the concept behind prediction using GPR is based on probability distribution function.
At higher energy levels (around 20MeV), PR prediction on cross-section value increases, because of the increased cross-section value at 19.95MeV and with its uncertainty being lesser than that at energies 19.36 MeV and 20.61MeV, the regression curve shows an increased trend in prediction of cross section value at higher energies.
The performance of both regression methods has been evaluated using two statistical methods as determination of coefficients (R 2 ) and SSres.  Table 2: Values of R 2 and SSres for GPR and PR methods The major advantage of the GPR method over the PR method is the estimation of uncertainty of predicted value. Uncertainty in the predicated value of the GPR is given in Figure.2 in shaded blue colour.

Conclusion
In this work, pre-processing on the neutron cross-section data of reaction 241 Am (n, 2n) 240 Am has been done, Preprocessing includes re-normalization, removal of outliers, collapsing of multiple points, and regression. For outlier detection, studentized residual and standardized residual have been used, collapsing of multiple data points to single points has been done using the weighted average method, and GPR and PR methods have been used for regression of EXFOR data. Based on the R 2 and SSres, GPR was found to be giving better performance in comparison with PR.