SYSTEMATIC ANALYSIS OF MULTIVARIATE SCENARIOS USING ADVANCED CLUSTERING METHODS

The continuous improvement of fuel cycle simulators in conjunction with the increase of computing capacities have led to a new scale of scenario studies. Taking into consideration multiple variable parameters and observing their effect on multiple evaluation criteria, these scenario studies regroup several thousands of trajectories paving the different possible values for multiple operational parameters. If global methods like sensitivity analysis allow extracting useful information from these groups of trajectories, they only provide average and global values. In this work we present a new method to analyze these groups of trajectories while keeping some localization in the information. Based on principal component analysis, clustering method have been implemented in order to mathematically extract, from the ensemble of trajectories simulated for a scenario study, subgroups of trajectories that have similar behaviors. Typical trajectories, representative of these subgroups, are then determined. The application of this new method on a sample scenario for two different output, the total amount of transuranic elements within the fuel cycle and the number of time the MOX fuel could not be built during the simulated time, is presented. The comparison of the results between the two analyses shows that the method allows good clustering for continuous and regular outputs but struggle with discrete highly non-linear ones.


INTRODUCTION
If fuel cycle simulators have been used for several decades, the last years have seen a renewed interest and many new developments, new codes as well as major updates for older codes. In this context, fuel cycle simulators capabilities have improved and several simulators adapted themselves to parallels calculations [1][2][3]. For example, in CLASS [4], the use of parallel calculation and method of simplified modeling as macro reactor [5], up to 800 trajectories can be simulated in less than one hour depending on the complexity of the simulations. The FIT Benchmark [6] were 6 different fuel cycle simulators simulate 2000 trajectories of a simple scenario specifically illustrates that such capacity to run large number of simulation has spread. Analysis in scenario studies will thus involve several hundreds or thousands of trajectories.
To understand variables of impact on one observable and analyze sensitivities, a method taking advantage of this newly possible big number of calculation have been developed within the CLASS team : the Wide Parametric Sweeping method [6]. This method is a direct extension of the Global sensitivity analysis for fuel cycle studies [3]. In this method, a design of experiment is built setting out a broad scenario and a list of inputs sampled within a predefined range of variation. After simulation, because numerous trajectories are simulated, the analyst can easily apply a filter on the inputs and see the impact on the solution space without the need of any new calculations from the fuel cycle simulator.
In the past, the result trajectories have been analyzed through global sensitivity indices like Sobol indices [3,7]. However, for more in depth analysis, those global indices are not enough. Reference trajectories are a thus chosen within the sampling space and analyzed individually [7].

Scenario Definition And Sampling Space
To test our new method of clustering of trajectories and selection of representative Reference Trajectories the Wide Parametric Sweeping method [6] has been applied to generate numerous trajectories. This method asks for the design of a scenario in broad terms and the choice of several controlled variables for which constraints are released. Because the objective of this work is more focused on method testing than specific fuel cycle analysis, the scenario has been chosen to only use only reactor for which the physics and the fuel cycle behavior is well understood : PWRs, with UOX and MOX fuels. The MOX one uses plutonium from spent UOX fuel and no multi-recycling of the plutonium is considered.
The scenario is built in 3 phases : an initial stable phase, a transition phase and a final stable phase. The first phase, identical in all the trajectories of the studied scenario, is a simplified representation of the current French fleet : Total thermal power of 188 GW from which 7.6% is produced by MOX fuel [7]. The simulation starts in year 2015. The final phase has its power (P f) and MOX fraction (FrMOX f), as well as the maximum Burn-Up for both fuels (BU UOX and BU MOX) and the minimum cooling before UOX spent fuel reprocessing (TC UOX), as variable parameters used in the Wide Parametric Sweep. This phase is simulated until the year 2180. The starting time (T Start) and ending time (T End) of the transition phase are also variable parameters used in the Wide Parametric Sweep. During this phase MOX fraction and Total power change linearly. Burn-ups and cooling time change abruptly at the beginning of this phase. Selection of UOX spent fuel to be reprocessed first for MOX fuel fabrication is also a variable parameter with two possibles choices : First In / First Out (FiFo) and Last in /First Out (LiFo). It stays the same during all the trajectory. For each of these parameters used in the Wide Parametric Sweep, we set out the wide ranges presented in table 1.

. Generation Of Trajectories Bundle
Once the input space has been set out, to generate the input for the fuel cycle simulator, we used a Latin Hypercube Sampling [8] (LHS), creating a set of input parameters representing 20000 possible trajectories. A cut in the LHS has been undertaken, keeping only points for which T Start < T End − 10. and reducing the number of trajectories to be simulated to 13098. Each of these 13089 trajectories has been then simulated with CLASS [4]. The total amount of data created is 1.1T B. To facilitate the analysis and the storage of data, the individual output files are thus aggregated into a single file keeping only data of interest reducing the size to 15GB.

Outputs Considered
To These two outputs are interesting to evaluate a TRU management strategy that may want to reduce INC TRU while keeping ML equal to 0. They also are interesting for this work as they are very different by nature : INC TRU is a continuous variable that react smoothly and mostly linearly to most change in strategies while ML is a discrete variable sensible to a lot of thresholds effects leading to strong non-linearity.

Data Clustering From Principal Component Analysis
Using the Wide Parametric Sweeping method, numerous trajectories, which are representative of all our input space, have been obtained. To get a better grasp of behaviors existing in these trajectories, clustering using the unsupervised model-based clustering developed within the EMCluster package [9] in R have been used. It allows a projection of the data matrix (inputs and outputs) in the principal component space and then create clusters using the Hartigan-Wong k-means algorithm [10]. The results are then presented in the initial input-output space for a better readability.

Selection Of Representative Reference Trajectories
Once clustering carried out, groups of trajectories that tile the entirety of our ensemble of trajectories is obtained and each cluster regroups trajectories that behave similarly. However, the number of trajectories is still the same, to allow more focused and in depth analysis extraction of representative trajectories is needed. These trajectories can be re-simulated with fewer approximations for verification that the conclusions of the study are not numerical artifacts creating by simplifications and reflect the physics [7].
For each cluster, a Representative Reference Trajectory is built as the trajectory generated by the inputs that are the barycenter of the trajectories of the cluster. The input parameter i of the Representative Reference Trajectory of the cluster j is the average of all the values of the input parameter i of all trajectories from cluster j. The clustering method used creates compact clusters, so the Representative Reference Trajectory is always part of the cluster it represents.
Calculation cost of simulating a trajectory is low. An additional simulation of the Representative Reference Trajectory can be performed. However, if the initial sampling is dense enough, the closest simulated trajectory can serve a similar purpose.

CLUSTERING OF SCENARIO AND ANALYSIS
Because the number of Missed Loads is one of our outputs of interest, trajectories with missed loads are kept in all following analyses.

Clustering
The first test of our clustering analysis method was performed considering only in-cycle transuranic elements, INC TRU. To be sure that the number of missed loads have no influence on the clustering carried out in this part, the corresponding column have been deleted from the data-set. The clustering has thus been undertaken on a data-set assembling 8 inputs and 1 output variables. The ensemble of trajectories has been distributed between 4 clusters.
The pair plot presented in figure 1 allows a good visualization of which subsection of the input phase is associated with each cluster.

Representative Reference Trajectories
Using these clusters and the method presented in 3.2, 4 Representative Reference Trajectories have been generated. The input values of these trajectories are given in table 2. The representative trajectories (represented as white dots in figure 1) are well in the center of the considered cluster and carry manifestly the main characteristics of the cluster (from low to high P f and FrMOX f).  The second application of the method considers only the missed loads, ML as output of interest. Like previously, the other output, here INC TRU, has been completely removed from the data-set before clustering. The clustering has thus been performed on a data-set assembling 8 inputs and 1 output variables. The ensemble of trajectories has been distributed between 4 clusters. The three clusters with non-zero ML are completely interlocked with each other at a point were they are almost indistinguishable. The cut between clusters are clear in the output space but it does not translate well in an input space tiling. This is most probably because principal component analysis that plays a key intermediate role in our clustering process is not well suited to deal neither with discrete values neither with highly non-linear variables and ML is in both of these unwanted categories.
From the first cluster grouping trajectories with zero ML, the links between power, MOX fraction and shortage of plutonium for MOX fuels shown in several previous works [3,7] is seen : if P f is high the reserves of plutonium built up during the initial phase are not enough to supply the increase of MOX fuels proportional to the power during the transition. Furthermore, the more MOX fuel you need to fabricate the higher the risk of plutonium shortage is. This creates a cluster with all trajectories having FrMOX f< 0.15 as well as some higher MOX fraction when P f is lower than 100 GWth.
Like with the previous clustering, we see a uniform distribution of clusters in the input with low effect on the output.
Furthermore, because the value of the output is of primary importance in the clustering process, the method is also unable to create multiple clusters within the zone without missed loads.

Representative Reference Trajectories
The 4 Representative Reference Trajectories generated from the clusters have the input values presented in table 3. The representative trajectories (represented as white dots in figure 2) are by design in the center of the considered cluster. In these cases of almost superposed clusters, the positions of the centers make the difference between them more readable. These centers are placed along a diagonal line in the box of the pairs representing the link between P f and FrMOX f and helped the characterization of the cluster presented in last section.

CONCLUSIONS
A first application of a new method to analyze large number of trajectories has been presented. This method gives, with an additional calculation cost of only several seconds, an automatic and deterministic way of grouping trajectories that help to grasp the variability of analyzed output. It also provides representative reference trajectories for each of the cluster that can serve as starting point for more focused studies while assuring a good representation of the phase space.
An ensemble of trajectories exploring possibilities within a fleet of PWR using UOX and MOX fuels has been analyzed with this new method considering two very different outputs of interest : the total amount of transuranics elements within the fuel cycle (INC TRU) and the number of times the MOX fuel could not be built during the simulated time (ML). For INC TRU which behaves linearly, the method creates 4 meaningful clusters that represent 4 different behaviors of the fuel cycle and the center are good representatives of these clusters. For ML a discrete and highly non-linear output, the method identifies easily one meaningful cluster, the points without missed loads, but struggle for the rest, creating clusters difficult to discriminate. Because the value of the output is of primary importance in the clustering process, the method is also unable to create multiple clusters within the zone without missed loading decreasing its interest for this kind of output.
There is in theory no obstacle to apply this method considering multiple outputs at the same time. However early attempts have not been fruitful due to scale problems between outputs leading to clustering practically ignoring some considered outputs. More work can thus be carried out in this direction using scaling methods to overcome this limitation.