SURROGATE MODEL OPTIMIZATION OF A ‘MICRO CORE’ PWR FUEL ASSEMBLY ARRANGEMENT USING DEEP LEARNING MODELS

This paper investigates the applicability of surrogate model optimization (SMO) using deep learning regression models to automatically embed knowledge about the objective function into the optimization process. This paper demonstrates two deep learning SMO methods for calculating simple neutronics parameters. Using these models, SMO returns results comparable with those from the early stages of direct iterative optimization. However, for this study, the cost of creating the training set outweighs the beneﬁts of the surrogate models.


INTRODUCTION
This paper explores two deep learning regression models used with iterative optimization to create a SMO process.The surrogate models are evaluated in the task of optimizing the design of a 'micro core' simulation, which is constructed from 36 'standard' PWR fuel assemblies. The design is considered with order four rotational symmetry, reducing the problem to nine assemblies (Fig. 1a). Although there are legitimate limitations of this core design for applications in the real world, it is ideal for investigation and optimization strategy evaluation. The designs are relatively easy to simulate, the design space is bounded, and the design can also be optimized by human experts.
Deep Multi Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) are used as surrogate models that predict the core parameters. The parameters to be optimized were the power peaking factor (PPF) and the position of the hottest pin at the beginning of cycle (BOC); these are outputs in the MLP model and derived from pin power predictions in the CNN. The networks were trained on a set of up to 3200 randomly generated core designs and predictive performance was evaluated using another 400 similarly random designs. The software design philosophy adopted was to use standardised state-of-the-art implementations of recognised techniques and open source libraries where possible. In particular, recent years have seen the release of the advanced machine learning libraries tensorflow [2] and Keras [3] and the optimization library pygmo2 [4].
(a) Layout of fuel assemblies in the optimization problem (b) Flux distribution from an example core simulation Figure 1: The 'micro core' used in this study [1] process using NSGA2 in which the designs are evaluated by direct simulation (DSO). Although the speed of the deep learning models enables them to evaluate many thousands of designs per second, in this study the same settings are used for the NSGA2 algorithm in order to fairly compare the performance of the SMO and DSO processes.

SURROGATE MODEL OPTIMIZATION
SMO is advised for problems where the objective function is computationally expensive [6][7] and works by applying a standard optimization algorithm on a 'surrogate function', which is usually regressed from data points obtained by sampling the actual objective function. Interest in SMO from the field of nuclear engineering has been increasing, for example Wu et al. [8].

Deep Learning Models
The first type of neural network model used in this paper is an MLP type, a simple feedforward neural network commonly used for regression and classification tasks. As shown in Fig. 2a: an array of processing nodes connect inputs, x, to outputs, y; each node sums the inputs and applies a transform to the data; and between nodes a weighting factor w i,j is applied. The weights, w, define an approximation of the required outputs, y. The back-propagation algorithm made popular by Rummelhart et al. [9] is used to optimize the weights of the network.
The second type of surrogate model is a CNN [10], represented diagrammatically in Fig. 2b. This configuration of network is well suited to spatial data, such as recognition of images [11]. In this architecture, a population of filter kernels perform 'convolutions' on the input image. The resulting images have a sensitivity to the pattern of the kernel that gave rise to them. These images are usually put through a 'pooling layer' (scaled down) and these layers are repeated a number of times. The network finishes up with a deep feedforward neural network similar to an MLP. The system is successful because it does not require the designer to specify the base 'kernels' that will perform low-level image processing; they arise naturally from the convolution layers of the CNN. The CNN directly predicts the per-pin powers. The objective functions are then calculated from the surrogate outputs by the usual means. This type of network has proven extremely effective at recognising images, due to its translational invariance. Although translational invariance does not automatically occur in nuclear core design, by selecting inputs to represent the water gap and control/instrumentation pins, the effects of geometric variance are encoded into the inputs of the networks. The CNN represents an example of a typical advanced deep learning tool.

MICRO CORE OPTIMIZATION
Experiments are carried out on a small 'toy' example of a fuel arrangement problem, as described above, to explore the use of a SMO approach to fuel management problems. The 'micro core' arrangement of 36 fuel assemblies is to be optimized. For simulation, nine assemblies are arranged into a square lattice, quadrant rotational symmetry is applied on two sides, and they are surrounded by water at an equal thickness to the assemblies followed by non-reflective (black) boundaries (Fig. 1a). The assemblies, labelled 1-9, are standard PWR-type assemblies separated by a small water gap. Each assembly has an enrichment value that can be varied independently with quantised enrichment values at increments of 0.2 between 0.8 and 5.0 w/o U235. Monte Carlo simulations were run for uniformly random uranium enrichments in each of the nine assemblies; these form an initial set of simulations used to train surrogate models. In this study, all 'direct' Monte Carlo neutronics simulations were carried out using the Serpent software [12].
Two outputs are optimized: PPF and the position of the hottest pin. PPF measures the uniformity of the power generated [13, p73]. The system operates more efficiently if the PPF is lower [14, p374]. By simultaneously moving the hottest pin to the outside of the core and reducing the PPF, a flatter power profile is achieved, which results in a hotter overall coolant outlet temperature.

Optimization Algorithm
NSGA2 is considered a 'solid multiobjective algorithm'. It is widely used in many real-world applications and is easily parallelised [15]. The parameter values used in this study are shown in Table 1. Note that the number of generations, g, and population size, P , are smaller than used in other studies, such as [5] (P = 100), [16] (g = 51, P = 204) and [17] (P = 300). This was considered necessary to ensure that the DSO could actually be carried out despite its very high  computational cost. Although NSGA2 is no longer considered cutting edge, it is still widely used as a benchmark algorithm when comparing novel algorithms or techniques. The implementation used in these experiments is from the software library pygmo [4].

MLP
A simple deep MLP is used to predict the PPF and the horizontal and vertical positions of the hottest pin. Using the w/o U235 on a per-assembly level as the inputs, the input dimension for a quarter core is 9, while the output dimension is 3 (the hottest pin coordinates and PPF). In order to select an appropriate MLP topology, a short study of network topology was carried out. The numbers of hidden layers and neurons per hidden layer were varied. Figure 3 shows the results. The choice to keep the number of neurons per layer constant was made to reduce complexity and keep the design tractable on a 2D heatmap. The performance seen in Fig. 3 confirms that MLP networks are robust to a wide variety of topological values. This has been discussed in the literature by many authors (see [18, p130] and more recent deep learning discussion starts with [19]). A fairly large network with 9 layers of 80 neurons was chosen for these experiments, to minimise proximity to noisy areas of the map. The weights are modified using back-propagation and the 'Adam' algorithm [20], and the loss function used was mean absolute error (MAE). The network was trained for 500 epochs. In each training epoch, the system trained on a set of 50 randomly selected samples from the training set, N . The use of epochs limits over-fitting of the network to the training set samples compared to learning on the whole training set.

CNN
The topology of the CNN used is shown in Table 3 in APPENDIX A. Three convolution and pooling layers are followed by six layers of fully connected feedforward layers. These values were chosen based on comparison with existing example networks from literature [21][22] [23] and experimentation to fit the input image size and output data. As with the MLP, 'Adam' is used to modify weights based on a MAE loss function. The training was carried out for 100 epochs with a sample size of 50. Unlike in the MLP surrogate model, the CNN is used to predict pin-by-pin power of the system.

RESULTS AND DISCUSSION
Simulations show that the surrogate models achieve moderate errors for the objective functions on a test set of random data and they are able to generate results using hundreds of thousands of times less computational resource. The CNN predicted pin powers with a MAE around 1%; however, this translated to MAE values of 3.805% and 2.421% for the actual objective functions over the test set. Execution time measurements in Table 2 are shown relative to MLP training time in order to compensate for effects specific to the hardware and software platform. For reference, the mean MLP training time was 172.4 seconds per CPU over 30 iterations on the low-end hardware used.
This study uses a slightly larger training set for the SMO methods than the number of simulations required by the DSO (as shown in Table 2). This was chosen to enable the best possible models to be created. The most significant value reported here is the number of generations that SMO methods compete with DSO, as this will guide researchers generating training data for future work.
NSGA2 was used with the MLP and CNN surrogate models in SMO processes and in a DSO process using Serpent for solution evaluation. The Pareto fronts (showing the trade-off between the PPF and hottest pin position objectives) found by the SMO processes are then re-evaluated using Serpent (Fig. 4). The SMO outputs are compared with the Pareto fronts at different generations in the DSO process. The MLP SMO process yields a Pareto front that has similar performance to the Pareto front of the DSO at generation 5, while the CNN SMO process has a Pareto front that has similar performance to that at DSO generation 12. When NSGA2 runs using a surrogate model evaluator, the computational resource usage is less than one millionth that of the DSO process.
The MLP outperforms the CNN at predicting PPF but underperforms at predicting the hot pin location. The CNN performs better when predicting non-random (optimization) input data. This is believed to be because the CNN is predicting a more fundamental system parameter and is more robust to the extrapolation that occurs when the optimization search moves the input space away from the random training set.
The surrogate models investigated here have the potential to reduce computational resource usage by up to 25%, assuming that the training set already exists. If data created during a design study already exists, such as in previous work by the authors [25], or if an optimization is going to be carried out on a regular basis, then the creation of a surrogate can be justified. Random training data is equivalent to the random initial populations used by many iterative optimization algorithms (including NSGA2) and may be reused to create a surrogate model. The CNN surrogate model significantly outperforms the MLP model for SMO in this case study. If the optimization algorithm   used a population of 300, as per [17], and the CNN surrogate still performed comparably for 12 generations, then a net computational saving would be seen. This is not inconceivable since population size primarily discourages premature convergence to local optima rather than changing the rate of Pareto front progression [26].
Subsequent work should establish the investment versus benefit of SMO with regards to DSO where population size is similar to that in other studies using NSGA2 (see section 3.1) or consider the trade-off between smaller training sets and SMO efficacy. In this study, the training set was chosen to ensure good error performance of the surrogate models on the test set, rather than prioritising performance of the SMO vs DSO. As the computational cost of objective function evaluation increases, the justification for SMO increases, so larger problems will be more easily justified. Another possibility is to consider intelligent sampling methods for training data generation (e.g. latin hypercube [27] or Sobol sampling [28]). By training on a more space-filling training set, the surrogate model might compete more effectively during optimization.

CONCLUSIONS
In this study two deep learning surrogate models have been applied to a simple BOC loading pattern optimization problem. The results show that surrogate models can accelerate optimization at the start of the process. This has immediate applications when a corpus of training data has already been produced. For example, work derived from this study, data created during a design study or as a by-product of traditional iterative optimizations could be repurposed. The requirement of a large training set offsets the gains of the surrogate model in this study. However, a number of interesting avenues of further research exist, which might tip the computational efficiency balance.