Validation of nuclear data using historical critical assembly measurements

. Predictive neutronics simulations depend on accurate nuclear data. We use integral experiments of near critical assemblies as one of the key validation experiments. The accuracy of these measurements is such that they are more constraining to a nuclear data evaluation than the underlying differential data. But the solutions found in calibrating to these benchmarks are degenerate and other data are necessary to discern which is correct. Recently, we have begun to use activation data for this purpose. This paper describes many of the early Los Alamos experiments that are used in this process, with particular attention given to some of the systematic errors that affect these measurements.


Introduction
It is essential to start with accurate nuclear data to make valid predictions for nuclear energy, astrophysics, nuclear medicine and nuclear security. Many of us are hesitant to state something that seems so obvious. But it is important to remind ourselves and those who use these data that the calculations performed are only as good as the data used. The perception is often that we have measured all the relevant data and they are good enough. Part of this perception is our good fortune that many users see us as a black box because the answers are, for many things, good enough. But we know, as our users inevitably discover, there are many applications for which our predictions are not adequate.
Evaluation of nuclear data libraries is a Darwinian process where the fundamental rule is survival of the best-fit 1 . The users of our data will not adopt a new data library unless it can be shown that it provides better results than previous versions. So we prove that new data are better by demonstrating that for a suite of relevant benchmark measurements the results have improved. But this process is not independent because we use those very observations to calibrate the data. This process is controversial. Many in our community regularly point out the flagrant inconsistency. We continue to risk disaster because we do not understand where this process fails. But we choose to ignore these risks as it is the calibration process that continues to provide us ever more predictive power.
Our failure to predict where the data are wrong, and thus where we are making poor predictions, almost a e-mail: morgan@lanl.gov 1 Paul Velleman in his article Truth, Damn Truth,, Statistics [1] discusses that the best analysis often arises from Darwinian competition among alternative models, credits John Walker as coining the phrase "survival of the best-fit." The article is an enlightening look at the pitfalls of statistics, captures the essence of many of the issues we face in nuclear data evaluation. certainly derives from the way we treat incomplete information in this system. We ignore it. We assume unstated systematic errors are zero. We assume unstated correlations are zero. These statements hold equally true for both differential and integral measurements. Imagine a density or mass measurement. While precise, if a bias exists it is fully correlated. These would lead to systematic errors in density that remained fixed and mass errors that would be additive. Our current treatment is to treat these as independent, if at all. This is a great disservice to our users and ourselves and leads to much confusion and misinterpretation. But it is a monumental challenge to solve, and we will not do so here.
If we are going to use these experiments to calibrate our data, and we do, we must continue to provide critical review of them. The US Cross Section Evaluation Working Group (CSEWG) that maintains the ENDF/B nuclear data library places a high confidence on the critical assembly experiments performed at Los Alamos in the 1950s. These early experiments were performed for exactly this goal, to calibrate predictions of neutron reactivity and transmutation. But how they went about this and the tools they had to do so are not the same as modern needs nor tools.
This paper explores the history of the early critical assembly experiments at Los Alamos. We attempt to give a sense of the state-of-the-art, the capabilities and the motivation at that time and how it differs from modern usage of these same data. Particular attention is paid to trying to discuss the systematic errors that may be present in these data.

Early history of los alamos critical assembly experiments
There are several histories of the Los Alamos critical assembly experiments including the Los Alamos reports LA-9685-H [2], LAMS-8762 [3] and LA-14306-H [4] but perhaps a more enlightening discussion of how and ND2016 why these experiments are done is found in the TOPSY design summary [5]. TOPSY was the second remotely operated critical assembly machine and the first dedicated to understanding neutronics. After two criticality accidents in which there was the tragic loss of life [6,7], it was clear that for personnel safety these operations must be done remotely. In designing these machines, there were two overarching requirements: personnel safety and reproducibility. That there have been no further accidents in which serious harm to people has occurred is a testament to their safety consciousness [8].
Having found an answer to safety, they turned to the question of what measurements to make, and how.
While the computational resources at that time were limited, then as now simulations require accurate fundamental data for the equations to be solved. For neutronics, the key equations are the Boltzmann equation that describes neutron reactivity and transport; and the Bateman equations that describe the transmutation of the isotopes. Given the complexity and expense of these experiments, getting the most bang for the buck just made good sense. A basic program of measurements would include -at a minimum -the time behavior (aka neutron reactivity or multiplication) around criticality, measurements of the neutron distributions, and material replacement studies [5, p. 28].
In the time dependent form of the Boltzmann equation, the neutron population shrinks or grows as the exponential of alpha (reactivity) times time. It is vital to understand alpha in order to accurate simulate, among other things, accident scenarios. Nature is beautiful to behold in its complexity, most often when this manifests itself in simplicity. A critical system is one in which the number of neutrons is constant over time. This is one of the simplest measurements to make. But operating a reactor at the critical condition would not be possible without the emission of neutrons on multiple timescales. We need both prompt (nanoseconds or shorter) and delayed (milliseconds and longer) neutron emissions to control these reactions. Having a small fraction of the emissions be delayed allows the time necessary for mechanical systems to respond in order to adequately control the overall neutron population.
Understanding the conditions at which a system is going to be sub-critical, delayed critical and prompt critical is necessary for the safe conduct of nuclear operations. A well-designed benchmark experiment should seek to measure this entire range of neutron reactivity. Delayed critical is the point at which a system is critical with the contribution of both the prompt and delayed neutrons. Prompt critical is the point at which it is critical with the contribution from prompt neutrons alone. The reactivity unit dollar was coined by Louis Slotin as a unit of measure to describe the reactivity due to the delayed neutrons [9, p. 595].
The Inhour equation [10][11][12] was developed during this time and has become one of the foundations of reactor physics [13, p. 247]. It describes the relationship of the period ('e' folding time for the neutron population) with the neutron reactivity (alpha or k) and the delayed neutron parameters. The term Inhour was first used to define a unit of measure that is the change in control rod position of a reactor at delayed critical resulting in a period of one hour [14 Appendix I]. This relationship was a vital component of their routine analysis procedures. It is dependent on understanding the prompt neutron lifetime of the system and the delayed neutron parameters. For this reason, the delayed neutron emission parameters were the focus of much early work as summarized by Keepin [15].
Many of the early measurements of material worth from which effective cross sections were derived are based on understanding the Inhour relationship [e.g., 12,14]. In order to measure these values, it was necessary to design in the ability to significantly change the overall reactivity. Adding or removing mass buttons or other replaceable parts allowed for larger shifts in reactivity. One or more control rods, with mechanical drive systems that could be controlled within 0.001-inch tolerance, were used for fine adjustments. The use of interchangeable blocks in assemblies like TOPSY went even further, facilitating examination of actinide enrichments, void and material worth within the core, reflector worth, and many other parameters of interest. This flexibility came at the cost of additional complexity. Gaps due to tolerances and other issues of uniformity and homogeneity of manufacture must be addressed when considering these measurements.
In measuring the distribution of neutrons, the experimentalists were interested in both spatial and energy distributions. To accomplish these measurements, a glory hole was often part of the design and facilitated irradiations from the center of the core to the outer edge of the reflector. In particular, this allowed the use of fission chambers within the assembly including taking a continuous traverse. There were many special parts, mostly at interfaces, that were made to accommodate small samples; thin wires, foils or even whole sheets could also be placed between parts [e.g. 16]. The emplacement of samples is not typically described in the articles, reports and even logbooks associated with these measurements; unfortunately these details can matter due to scattering from structural components. In addition to measurements inside the assembly, many measurements were conducted externally to help establish heath physics limitations [e.g. 17,18]. Almost all of these measurements required establishing a measurement scale to make possible quantitative comparisons of reaction rates within an assembly but even more so between assemblies.
During the 1940s and 1950s, Los Alamos developed an accurate measurement of uranium-235 fissions that has been the basis for almost our entire measurement history [19,20]. Most reactions were measured in ratio to uranium-235 fissions by way of molybdenum-99 beta counting. The state-of-the-art at that time was quantitative radiochemistry [21] coupled with alpha, beta or mass spectrometry counting; gamma counting was difficult as high-resolution detectors would not be generally available for another two decades.

Then and now, computational physics
Los Alamos started in 1943 with computers (better known as your friends and neighbours, that is people not machines) that could process about three operations per second; by 1955, this had increased to only a few thousand operations per second [22] with very limited memory, of order kilobytes. The calculations had to be programmed by hand as Fortran would not come along until 1957. While Monte Carlo and deterministic neutron ND2016 transport methods had been developed, the data available to them were limited with the first Barn Book [BNL-325] not available until 1955 and ENDF/B-I a decade later in 1968. One had to think first, and compute carefully. These experiments were used to derive few group, often onegroup, cross sections for computations that were used to help develop empirical relationships. Given these severe limitations, there was no motivating reason to consider characterizing these experiments with the kind of scrutiny we desire today. This can be seen in Table 1 Ref. [5] that is probably the very first criticality experiments benchmark handbook* (precursor the well known ICSBEP [24]).
Empirical relationships ruled the early days. Figure 8 in Ref. [25], and the equation on the preceding page, describe the critical mass of high-enriched uranium as a function of the reflector thickness for various materials. Table IV and Fig. 9 describe the relationship of critical mass in spherical versus varying height-todiameter cylindrical geometry. Reference [26] extends these relationships to the concentration and density of uranium. Curves like these could then be used to interpolate between known critical systems with a reasonable, maybe 0.5% (500 pcm), uncertainty; this was considered good. In addition to helping guide the understanding of critical systems, these measurements and others also helped set criticality safety limits for actinide processing and manufacturing [27].
Computing power grew by more than a factor of 1000 between the 1950s and 1970s and with it came renewed interest in computational benchmarks. The 1969 Re-evaluated Critical Specifications [28] discussed many of the assumptions that go into making a simplified benchmark model and updated values for many systems. This led to a new compilation of benchmark handbooks LA-3067-MS [29] (which contains many entries that have never made it into the modern ICSBEP [24]) that was combined with similar works from across the national laboratories to create the US Cross Section Evaluation Working Group Benchmark Specifications ENDF-202 [30]. ENDF-202 is noteworthy for the first serious attempt to tabulate data beyond the delayed critical configuration. Specifically, it included material worth, activation and fission reaction rate, neutron spectra and Rossi alpha measurements. However, these values all come from other summary documents, not original source documents, thus leaving some important details of the origins murky.
Preparations for the ENDF/B-VII.0 library [31] began in the early 2000s and by this time the computing power was enough (we now had tera-flops supercomputers) to examine nuclear data changes on large suites of benchmark data [32]. It is informative to read through the proceedings of the CSEWG minutes [33] during that time as it illuminates the back and forth between data evaluation and validation that typically occurs during this process. Major changes were undertaken for the uranium data evaluations. But before these changes would be acceptedno matter how well motivated from new differential data analysis -they must also preserve the good application performance seen in the previous data library. It was also desired to fix a major bias seen in low-enriched uranium thermal solutions. In the minutes, one can see the tweaks in the uranium data evaluations that lead to a new library that worked for fast critical assemblies, intermediate spectra with many reflectors and moderators, and both high-and low-enriched thermal assemblies over a wide range of above-thermal leakage fractions. This result was documented in reference [31] pages 3009-3012 and the trends addressed are shown in Figs. 93-98. This is a difficult process and the data library almost certainly works because of the compensating errors that have been deliberately introduced. The difficulty in doing this can be seen in the results associated with equivalent plutonium data discussed on pages 3012-3013 and seen in Figs. 99-101 where a combination to fit the various criticality data could not be found.
In addition to the delayed critical data, we use feedback from several other integral experiments including the time behaviour of experiments around critical, measurements using fixed source irradiations like the pulsed spheres and activation and fission rate measurements in a wide variety of these assemblies [e.g. [31][32][33][34][35][36]. Great care has been taken, in past and in the current ICSBEP efforts, to ensure that simplified models of the delayed critical experiments accurately reproduce the detailed experiments; see, for example, LA-4208 and ICSBEP PU-MET-FAST-001 revision 4. Simplified modelling of these other experiments is fraught with difficulty. Detector response functions, the influence of the experimental environment outside the area of interest (e.g., room return), and other issues greatly complicate these efforts. The devil is in these details. We want to use these data, but worries remain about the adequacy of the simplified models often used.
Similar to the relationships developed for criticality, the original use of the reaction rate measurements was to develop empirical relationships that could be used to measure the neutron energy distribution in those assemblies. Examples of these relationships can be seen in Figs. 105-109 [31] and Figs. 48, 52, 56-58, 93 [34]. More recently, we would like to make use of the multiple thresholds of these activations to inform our understanding of the original fission neutron emission spectrum and subsequent scattering in these systems. Figures 5 and 7 [36] show activations at the center of bare HEU and plutonium spheres or spheres with thick natural uranium reflectors. One of the most fundamental questions that must be clear when computing these values is to understand exactly what was measured. For example, the iridium-191 (n,2n) reaction produces the iridium-190 (11)-isomer with a 3.087 hour half-life that decays by electron conversion 91.4% of the time [38]. It is essential to account for this branching ratio when calculating the detection efficiency [37, p. 2725]. Each measurement must be examined to understand how it was made, when it was made and, potentially, if decay values must be updated.
Most of the early Los Alamos activation measurements were made by radiochemistry techniques, specifically radiochemical separations followed by alpha, beta or mass spectrometry counting. These techniques are particularly susceptible to systematic errors during the separations. This fact was not lost on the chemists and great care was taken to establish the procedures [21] and to verify their expected yields. Strict analytic chemistry with tracers was used to verify the systematic error expected in this step over the years and even decades [39]. To date, these systematic errors have not been given as part of the reports of these values. Even having these in hand, care must be used in their interpretation. This is not a Gaussian error; it will have a long tail on the side of mass loss. The other side of this distribution is unphysical and must be truncated.
The most disturbing aspect in the use of these measurements is the scattered documentation. Thankfully, much of it still exists, but in the form of a giant jigsaw puzzle that must be found in a scientific archaeology process. We have found memos and logbooks [e.g. 40,41] that give many of the details necessary to have high confidence in their use. But a great deal of persistence, and often luck, is necessary to find these documents.

Summary
Great work has been done by a great number of people to dig the information out of our archives to make use of many of the historic Los Alamos critical assembly measurements. These experiments were often done by the very researchers who established the techniques that we take for granted now. But much remains to be done to ensure that we fully understand what they did. And to be able to take the new measurements that are needed.