FEW-GROUP CROSS SECTIONS LIBRARY BY ACTIVE LEARNING WITH SPLINE KERNELS

This work deals with the representation of homogenized few-groups cross sections libraries by machine learning. A Reproducing Kernel Hilbert Space (RKHS) is used for different Pool Active Learning strategies to obtain an optimal support. Speciﬁcally a spline kernel is used and results are compared to multi-linear interpolation as used in industry, discussing the reduction of the library size and of the overall performance. A standard PWR fuel assembly provides the use case (OECD-NEA Burn-up Credit Criti-cality Benchmark [1]).


INTRODUCTION
Few-group cross sections are obtained through an homogenization process from transport calculations, that compute the neutronic flux with a detailed discretization in energy and space. Cross sections are considered real valued scalar functions in a d-dimensional rectangular domain. Without loss of generality we normalize it into X = [0, 1] d . For every lattice calculation point, the cross section set Y = {σ(x) → R,x ∈ X } of size |Y| = i × r × g is obtained for every reaction type r, group g and isotope i. These are generally smooth functions with possible strong variations in localized regions. Discrete samples X S = {x i ∈ X } define the support S = {σ(x i ),x i ∈ X S }, which is the available information to build the model. These are called "data sites" or "learning space". The modeling effort consists in finding the approximationsσ(x|S) σ, ∀σ ∈ Y.
Macroscopic cross sections (Σ) are the input data of core calculations and allow to calculate the infinite multiplication factor (k ∞ ). They are determined by a weighted sum of microscopic cross sections: being C i the i-th isotope's concentration for N I specialized isotopes. Ordinary reactor studies for fuel cycle optimization, transient simulations and core design need to resolve the multi-physics coupling with other computer codes where cross sections models have to deal with ever-growing volumes of data.

MODELING CHALLENGES AND STATE OF THE ART
In many industry implementations cross sections are modeled by a first order piece-wise polynomial interpolation here called Multi-Linear (ML) [2]. A Cartesian sampling rule for X S is normally used with support points shared by all cross sections. ML does not require an actual modeling phase, data are just stored, and can be fastly interpolated on demand. However, this advantage comes at a cost: the library of size |X S | × |Y| grows exponentially with increasing dimensions (|X S | ∝ N d ), which is known as the "Curse of Dimensionality" [3]. Higher order approximation spaces, projection into sub-libraries and regression with global polynomials [3] mitigate this to some extent still requiring a very dense support, with the possibility of either localized errors or high polynomial degrees for some cross sections [4]. Sparse Grid sampling rules, especially when anisotropic, have successfully reduced the library size though the design is left to the user [5]. Nested regular grids were needed, being actually a subset of a full grid, and Chebyshev sampling rules were used to deal with the Runge phenomenon resulting from global polynomials. Another example of the use of anysotropic sampling can be found in [6], where the model is optimized to increase the accuracy of k ∞ though limited to a Cartesian grid. In [7] Artificial Neural Networks were employed with high accuracy but at a significant computational cost. In [8] the "Empirial Interpolation Method" was used to obtain the support by reducing the model's basis errors. Yet, the supervised learning implementation is severely intertwined with the expansion space: support points candidates result from a compositional procedure based on the error of the eigen-functions. The sampling is restrained to a "Tucker grid" used to solve the integral quadrature and subjected to a second selection process for lowering the amount of additional lattice calculations. A posteriori shrinking is then performed to reject unimportant terms as in many other techniques [3], [5]. The key ideas addressed in this work are: • A full grid support suffers not only from the "Curse of Dimensionality", but includes also a significant amount of unnecessary data as evidenced by the low errors attained with sparse grids [5], by the variability in the amount of regression coefficients in [4] and by the use of a posteriori shrinking [8].
• Benefits of using unstructured support with respect to full grids have been suggested but not examined exhaustively, certainly not for a traditional function space that can be confronted to multi-linear [8]. • Though some effort has been made in studying the strategies to optimize the support's anisotropy, no comparative analysis is to be found on selection criteria based on σ, Σ and k ∞ and their resulting errors, nor the interplay between cross section sharing support and the supervised learning procedure.

KERNEL METHODS
Provided the support S of size N = |S|, our task is to find the approximationσ to σ in some Kernel Methods [9] are a very general framework that allows to pose in a vector space, equipped with a norm, a vast quantity of machine learning problems. They are based on symmetric positive-definite scalar function k : (x, y) ∈ R 2 → R called kernel that generates a Reproducing Kernel Hilbert Space (RKHS) H k . Indeed with ϕ ∈ H k the solution of Eq. 2 is found by solving the linear system (K + λI)ᾱ =σ, withσ i = σ(x i ) and the kernel matrix The regression problem in Eq. 2 may be greatly simplified when expressed in terms of the kernel and this is known as the kernel trick. The approximation is then the linear combination of partially evaluated kernelŝ as stated in the Representer Theorem [9]. Although H k ⊂ F, quite exotic H k can be obtained enabling highly customized function spaces as exploited in [8]. Yet, we choose a first-order piecewise polynomial space (with the regularization term λ = 0) reproduced by the spline kernel [10]: The two main drawbacks of Kernels Methods are the condition number of the matrix K and the possible large amount of terms in Eq. 3 for |ᾱ| = |X S |. A first order spline kernel reduces the conditioning problem. Though maybe suboptimal, this function space facilitates the comparison to multi-linear interpolation on a full grid and provides a bounded interpolation error. Higher order function spaces will be analyzed in a future paper.

POOL ACTIVE LEARNING
Kernel Methods impose no condition on the sampling rules for the domain, thus enabling the use of Pool Active Learning (AL). Lattice calculations provide a pool S P = {σ(x i ), x i ∈ X P } from which an optimal sampling X † ⊂ X P can be extracted, allowing to define an optimal support S † . This is done by computing the extrema of a loss function L which lays at the heart of the AL process and has been merely treated implicitly in other works [11]. Two approximation scenarios are considered: with and without a shared support for the cross sections, as presented in pseudo code in Algorithms 1 and 2, respectively. In both cases the model starts with an initial support S 0 ∈ S P chosen randomly and loss function values are computed within the loop to find new optimal pointsx † ∈ X P . In the former case a maximum size |X † | is selected, whereas a target error δ σ is specified per each cross section in the latter. The loss functions studied in this work, each generating a different model, are presented in Table 2d. We define the isotope's "importance" in view of their contribution to Σ as I σ = I i,r,g = C i σ i,r,g /Σ r,g .

RESULTS
The transport code APOLLO2.8 was used to generate cross section data for the OECD-NEA Burnup Credit Criticality Benchmark in which material and geometrical specifications are fully available [1].
Result: An optimal X † with respect to L of size b.
Algorithm 1: Cross sections share support; y can be any σ, Σ or k ∞ .
Result: An optimal set X σ, † . Algorithm 2: Independent support optimization for every σ.  for the reactions νσ f,1 , νσ f,2 , σ a,1 , σ a,2 , σ 1→2 , σ 2→1 resulting in |Y| = 128 (fission products have no fission reaction). Lattice calculations data are divided into two disjoint sets the pool S P and the test T (S P ∩ T = ∅). Test points properly cover the domain and are fixed independently of the cross section's support resulting in a stable error evaluation. The microscopic cross section relative error is defined as ∆σ = 100(σ/σ − 1). An arithmetic average of the error over the test set is σ = |T | i=1 abs(∆σ i )/|T |, and over all cross sections is ε σ = |Y| σ=1 abs( σ )/|Y|. Modern cross section target errors are between 10 −2 and 10 −1 in % [5]. Similar error definitions are used for Σ and k ∞ . With any representation of the form in Eq. 3, the library size amounts to the number of coefficients to store. For a cross section approximated by Kernel Methods with AL, |ᾱ| = |X † | and if they share support |Y| σ=1 |ᾱ σ | = |X † | × |Y|.

Active learning with shared support
The trend of σ with increasing library size is presented in Fig 2a. ML always uses a full grid and S marks a special discretization based on "industry's expert knowledge" for this type of fuel assembly. This defines a comparison line of constant error ε σ, while the other points show the global tendencies. Modern error limits are delimited in gray [5]. The errors decrese monotonically bounded by σ(x) and as X S → X P , they converge since the same first order approximation space is used for all the models. It can be noticed that the error does not reach zero, being the useful information of S P limited and T fixed. For comparing different L a library size of 2.2 × 10 4 coefficients is big enough to appreciate the effect of the AL with errors not yet leveled by enforcing the majority of X P . We start by noting that a random selection (RAND) already produces a significant improvement as expected: collinearity undermines the informativeness of an approximation's support [4]. In a way this isolates the effect of using an unstructured support which reduces the library size by half. All AL strategies further decrease it as presented in Fig. 3b.
The lowest σ error is obtained when considering the relative errors for the whole set (RXS). Ad-(a) Average σ error with shared support.
(c) Cross section participating in the AL. (d) L(y,ŷ) with σ, Σ or k ∞ as y. ditionally weighting with the importance (RXSI) does not improve it. Its histogram is shown in Fig. 2b characterized by centered means, normally distributed shape, small standard deviations and no error trails, unlike ML for that library size. Similar histograms are noticed for the other AL strategies. If absolute errors are considered, even if weighted by the importance (IXS), we find that the only cross section participating in the AL is 135 Xe a2 since its absolute values are significantly bigger than the others. In this case the AL effort is entirely wasted withσ computed |Y| times. Still, the error profile differs little from the others methods, in the same way as optimizing for single cross section (U). This shows that the active learning process is quite robust with respect to loss function selection. To better understand this behavior in Fig.2c loss function values are plotted showing the cross section participating in the selection of eachx † for RXS and RXSI. The use of I σ reduces significantly their amount: while RXS considers about 50 cross sections (marked in gray and being 1/3 of |Y|) RXSI only ∼ 10. L values are bigger for RXS than for RXSI (since I σ ≤ 1 and I σ ∼ 0 for a large amount of cross section) but the overall profile and error stagnation (|X S | > 1000) remains the same, suggesting that the useful information is being extracted at a similar rate from X P . The profiles present clear breaks in the derivative (at a support of 200 and 1000) that can be understood as a change in the "relevance" of the points being added to the model and could even be used as a stop criterion for Algo 1. We further notice a clustering effect where the cross sections participating (independently) in the AL, end up segregated in a few blocks mainly dominated by U 235 f 1 and RES 2→1 . These exhibit jumps in L due to the ongoing AL process. Being the support shared,x † is added to all cross sections inducing plateaus in their loss function profiles (not shown here). The optimization effort is thus subordinated to a very small cross section set while forcing the model to incorporate large amounts of unnecessary data as explained in Section 2. Finally the importance can be used to preselect a small cross section subset (RIXS) to performed AL obtaining similar results at a lower computational cost.
A L based on macroscopic cross sections (M) exhibits a particular good performance for Σ and one based in the multiplication factor (MF) in k ∞ as can be seen in Fig. 3b. Though AL could be considered an "off-line" task, performed only once during cross section preparation, we note that IXS, RXS, RXSI, M and MF require the computation of the whole cross section set unlike RIXS or U that achieve a similar error. (a)

Active learning without shared support
Since only the coefficients need to be stored we consider Algo. 2 enabling an independent AL process for each cross section. We consider two "No Shared Support" methods defined by δ σ = δ (NoS) and by weighting with the importances δ σ = δI σ (ImpNoS) with δ =0.01, 0.001, 0.0001 whose Σ are presented in Fig. 3a. For ε Σ, the library size is further reduced up to another order of magnitude arriving at 1% of ML in a Cartesian grid. However for ImpNoS we notice high σ since the AL only focuses in cross sections relevant to Σ. This is explained by Fig 4b where the amount of coefficients relative to the library size is marked by the area for every cross section. Only 235 U a1 , 235 U f 1 , RES 1→2 and RES 2→1 surpass 300 coefficients, thus achieving low errors, while this is the norm for the majority of cross sections in NoS presented in Fig. 4a.

RECONSTRUCTION TIME
With the implementation developed (Python 2.7-F2PY-Fortran) we obtained for |Y|=128 sharing support, with |X S | = 2500 and |X S | = 500 an evaluation time of T 2500 = 3 × 10 −5 and T 500 = 2 × 10 −5 seconds per cross section in this range of support. Though this is a rough machinedependent estimation, it shows that Kernel Methods can be competitive performance-wise even if the amount of terms requiring evaluation is overall grater than ∼ 30 obtained, for example, by Quasi-Regression [3]. Indeed if 1E 8 cross section reconstruction per core calculation point are usually required this would imply a total reconstruction time of ∼ 2 minutes. Sharing the evaluation K matrix among the cross sections and profiting from vectorization by using FORTRAN routines wrapped in Python were crucial to achieve these evaluation times. In a non shared support scheme, the only caveat is that for both the coefficient and evaluation matrix the non-zero part is delimited by the same monotonically diminishing profile. Ultimately if cross sections share support may depend on the core code's architecture and optimization routines.

CONCLUSION
Kernel Methods with a first order piece-wise polynomial space reproduced by the spline kernel with Pool Active Learning was utilized to model homogenized cross sections. Results were compared to multi-linear interpolation in view of σ, Σ, k ∞ . For a representative case as used in industry a significant gain was achieved by using an unstructured support, even if randomly chosen. All AL procedures improved further the model being quite robust with different choices of loss functions. In a shared support scheme, optimizing the support with a single cross section produces already nearly optimal results. This is somewhat improved by considering all the cross sections though by virtue of the importance a small subset suffices (RIXS) to reproduce the expected AL behav-ior at a smaller computational cost. Indeed by using unstructured supports |X S | ∝N d effectively dealing with the "Curse of dimensionality". We point out the need of considering relative errors to have meaningful comparisons. In the current Python-F2PY-Fortran implementation with vectorized kernel evaluations and sharing the kernel evaluation matrix an evaluation time of T σ ∼ 10 −5 seconds per cross section was observed. Further code optimization may achieve higher computational performance. A shared support penalizes the AL as evidenced by the low participation rate (clustering) of cross sections where many are forced to incorporate large volumes of unnecessary data. By dropping this constraint, σ |ᾱ σ | =|X † | × |Y| and the AL process is able to exploit the different cross section modeling requirements. Weighting with I σ resulted in a satisfying library composition, where the coefficient's cardinality intuitively follows important cross sections though high errors were noted in many cross sections negligeble to Σ. Our methodology for cross section representation not sharing the support allowed a reduction of up to two orders in magnitude, i.e. up to 1% of the original library size, for target accuracies as used in industry.