Characterization of Neutron Detection Systems with Confidence via Statistical Parameter Correction

Neutron detection systems utilize statistical alarm techniques where a measured false alarm rate (FAR) can vary drastically from the FAR predicted by a theoretical model. The ability to set an alarm threshold that results in a practically controlled FAR is crucial to characterize detector sensitivity with both accuracy and precision. A generalized and automated method is presented to statistically evaluate FAR performance by assuming that the FAR itself is not deterministic, but a normal stochastic process over a specific parameter to be corrected that will hereafter be referred to as the correction. In this manner, a specific correction results in not only a point estimate of FAR, but also a confidence interval. The central objective is focused exclusively on characterization assuming that experiments are executed in a tightly controlled environment so that an accurate comparison is enabled across detectors. Once a correction is calculated, the estimated FAR is only assumed accurate in a similar environment for sensitivity evaluation. Initially, the calculated correction factor was used to compare FARs across various distributions including normal, corrected normal, Poisson, and a simplified normal distribution. Later verification data sets were used to empirically demonstrate the rate of containment of measured confidence coefficients using two detectors of different technology. A second application uses the correction method to improve the signal-to-noise ratio metric to agree more with dynamic sensitivity results. Finally, a third application studies the effect of altering the duration of background acquisition on FAR performance.

Abstract-Neutron detection systems utilize statistical alarm techniques where a measured false alarm rate (FAR) can vary drastically from the FAR predicted by a theoretical model. The ability to set an alarm threshold that results in a practically controlled FAR is crucial to characterize detector sensitivity with both accuracy and precision. A generalized and automated method is presented to statistically evaluate FAR performance by assuming that the FAR itself is not deterministic, but a normal stochastic process over a specific parameter to be corrected that will hereafter be referred to as the correction. In this manner, a specific correction results in not only a point estimate of FAR, but also a confidence interval. The central objective is focused exclusively on characterization assuming that experiments are executed in a tightly controlled environment so that an accurate comparison is enabled across detectors. Once a correction is calculated, the estimated FAR is only assumed accurate in a similar environment for sensitivity evaluation. Initially, the calculated correction factor was used to compare FARs across various distributions including normal, corrected normal, Poisson, and a simplified normal distribution. Later verification data sets were used to empirically demonstrate the rate of containment of measured confidence coefficients using two detectors of different technology. A second application uses the correction method to improve the signal-to-noise ratio metric to agree more with dynamic sensitivity results. Finally, a third application studies the effect of altering the duration of background acquisition on FAR performance.

I. INTRODUCTION
B Y partitioning a notably large set of background data into potentially thousands of mutually exclusive periods, a distinct alarm threshold can be calculated using a specific correction and statistics on an individual background period. By applying this distinct threshold, a single alarm rate sample can be calculated over the remaining background data. Each subsequent background period is used likewise to create additional alarm rate samples for the entire data set. A maximum likelihood problem is formulated in section II-D1 assuming that a detector's false alarm rate (FAR) is a normal random process of correction. Using numerical optimization in section II-E, a first-order search is utilized to determine a correction to meet a target FAR.
False alarm performance of the corrected model is compared to three statistical models in section III-C. In section IV, additional background data is gathered to verify that the estimated correction results in the desired FAR with statistical confidence.
In section V three neutron detection systems demonstrate that the signal-to-noise ratio (SNR) metric may result in a theoretical sensitivity that contradicts sensitivity measured with a more rigorous dynamic test [1] according to American National Standards Institute (ANSI). Section V-A2 introduces a correction to SNR and section V-C demonstrates that the correction results in a sensitivity that agrees more with the rigorous ANSI test.
Section VI studies the effect of reducing the duration of background acquisition from 5 minutes down to a minute. As a result, system sensitivity decreased via the increase in critical value required to meet a target FAR. In addition, the overall precision of FAR drastically decreased via the estimated increase in standard deviation of FAR by a factor of 28.
In the context of the experiments performed, FAR is simply a scaled version of significance level, critical value indicates a specific alarm threshold, and standard critical value represents the number of standard deviations above the mean for an alarm threshold. FAR is used in place of significance level for convenience when presenting measured false alarm rates.

II. STATISTICAL MODEL
A. Data Collection Periods 1) Large Experimental Set of Data: Assume that count rate samples for a neutron radiation detector under natural ambient background conditions follow a normal distribution with random variable D ⇠ N (µ D , σ 2 D ) and a realization of 2) Background Partitions of Data: These samples are partitioned into N P B non-overlapping background periods of length N B with the i-th period defined asB i , For each background period i the statistic that is a minimum variance unbiased estimator (MVUE) of mean µ D iŝ For each background period i the statistic that is a MVUE of variance σ 2 D is The number of samples N B should be chosen as large as possible within system initialization in a known environment without artificial sources.
3) Integration Partitions of Data: The samples in vector D are partitioned again into N P I << N P B non-overlapping integration periods of length N I with the i-th period defined as For each period i the statistic that's a MVUE of mean µ D isμ The number of samples N I is typically chosen to optimize system detection sensitivity for a given source strength and speed.

B. Determine the Theoretical Parameters of a Hypothesis Test
Since it can be shown that the test statistic, a linear combination of normal samples, also follows a normal distribution according toμ I,i ⇠ N (µ D , if a one-sided hypothesis test is used then either one of the following choices are possible: 1) Null Hypothesis H 0 (radiation is background) a)μ I,i = µ D 2) Alternate Hypothesis H 1 (radiation is significantly higher than background) a)μ I,i > µ D The critical region of statisticμ I,i is the region where background radiation is determined to erroneously contain an artificial source (H 0 is rejected) and is defined for some constantZ ↵ > 0 such that A desired significance level ↵ 2 [0, 1] together with the probability P H0 of the error region of H 0 constrain a specific solution for constantZ ↵ according to the equation

1) Determining a Value for the Standard Critical Value Constant:
The standardized test statistic for the i-th period of an integration partition is ⌘ and the critical region C i can be rewritten in terms of the probability of a standard normal distribution (or the cumulative distribution function (Z)) as where the thresholdZ ↵ , referred to as the standard critical value of the hypothesis test, and corresponding significance level ↵ can be looked up in a standard normal Z table.
2) Critical Value of the Hypothesis Test: Once the standard critical valueZ ↵ of the standard normal distribution is found, the corresponding critical valueμ I,i for the statisticμ I,i is 3) Verification of Significance Level: In order to facilitate calculating the value of the error probability in (2), an indicator function 1 µ (µ) : R ! {0, 1} is defined as where the expected value can be estimated using a sample mean of 1 µ (μ I,i ) wherê

C. Determine the Practical Significance Level of a Hypothesis Test
Note that in section II-B the population mean µ D and variance σ 2 D of a detector are not known exactly. Therefore, assume that a background period of sufficient length N B is used so thatμ B,i ⇡ µ D andσ 2 i ⇡ σ 2 D are substituted in the formulas of the previous section.

1) Execution of a Very Large Number of Experiments via Background Partitions:
Typically, when a user performs a false alarm test using a radiation detector the user programs a standard critical value, a single background period is acquired, statistics are performed resulting in an alarm threshold, and this threshold is applied using a live system for a long duration in order to verify a specified FAR.
Instead, if a large number N D of data samples are recorded and if these samples are partitioned as in II-A, then thresholds can be calculated and a false alarm experiment can be performed in post-processing for every possible background period in the entire data set. Also, if the theoretical standard critical valueZ ↵ results in a measured significance level that doesn't agree with the theoretical significance ↵ within acceptable tolerance, then adjusted standard critical values can be used to iterate through the whole process again.
Furthermore, significance level estimates↵ can be computed for each background and statistics can be calculated on the resulting set to obtain realistic false alarm performance metrics across backgrounds.
2) Verification of Significance Level using Empirical Data: The indicator function defined in II-B3 is redefined using the j-th actual background parameter estimates as 1 µ,j (µ) : R ! {0, 1} according to where the corresponding estimated significance level for the j-th background period iŝ Theorem 1 (Unique Significance): For a one sided upper hypothesis test, if significance level ↵ is written as a function of standard critical value Z ↵ in the form ↵(Z ↵ ) : R ! [0, 1] then the function is strictly decreasing over the domain of the function as illustrated in Fig.2. Therefore, given a unique significance level ↵ 0 2 [0, 1] there exists a corresponding unique standard critical value Z ↵0 .
3) Experimental Error: Assume that the theoretical standard critical valueZ ↵ does not result exactly in the desired significance level ↵ due to experimental data not fitting the model exactly, but insteadZ ↵ results in the significance valuẽ ↵. The significance level experimental error is then defined as Assume that if experimental data approximately retain the property of Theorem 1, then there exists a unique experimental standard critical valueẐ ↵ that results in the theoretical significance level ↵ within tolerable precision. The standard critical value experimental error is then defined as

5) Practical Significance Level Estimate
: Therefore, the sample mean of↵ j is the MVUE of significance level The sample variance of↵ j is the MVUE of significance level varianceσ

D. Determine the Practical Standard Critical Value of a Hypothesis Test
Since samples of↵ j are assumed to approximately follow a normal distribution with mean ↵ and variance σ 2 ↵ , the probability density function (PDF) is

1) Maximum Likelihood Estimate of Standard Critical Value:
The only quantity in (9) that is a function of standard critical value Z ↵ is↵ j , which can be thought of as either a continuous-time stochastic process [3] over parameter Z ↵ or a random variable over the family of PDFs [2] indexed on the parameter Z ↵ written as↵ j (Z ↵ ) when used in the likelihood functions.
The joint probability of N P B random samples taken from the PDF in (9) results in the likelihood function and the corresponding log likelihood function is Therefore the maximum likelihood estimate of standard critical value isẐ E. Optimization Problem Formulation 1) Cost Function: In order to facilitate computation of the likelihood function in (10), all constants independent of Z ↵ are eliminated and the negative sign is removed resulting in the following minimization cost function Therefore, the maximum likelihood estimate of (10) is equivalent to minimizing the mean squared error of significance level ↵ over standard critical value.
2) Optimization Problem: Using the cost function from (11) in a minimization problem results in the following optimization problem formulation: 3) Optimization Problem Solver: The optimization problem formulated by (12) can be solved using various first order numerical methods, but care must be taken to use a method that is tolerant of noise as illustrated by the solution in Fig.3. In this case, the problem was solved via simulated annealing [4].

F. Generalization of Parameter Correction
The upper hypothesis test utilized normal samples {μ I,i : i 2 {1, . . . , N P I }} while adjusting the parameter Z ↵ to yield a target theoretical significance ↵. However, the model can be made even more flexible by slightly altering definitions including the indicator function in (5). Instead let µ 2 R be a measurable random variable that follows all axioms of probability with the indicator function where j is a set of statistics on background period j and the parameter ✓ ↵ 2 R is the target parameter to correct. The indicator function must also be a simple inequality for an upper hypothesis test,↵ j must be approximately normal according to (6), and experimental samples must approximately adhere to Theorem 1. In this case, a much more flexible framework results allowing modification to other system factors such as distribution, correction parameter ✓ ↵ , sampling techniques as in section III-B, or another system parameter as in section VI. As factors are adjusted the impact on measured FAR performance valuesμ ↵ andσ 2 ↵ could be evaluated.

A. Theoretical Probability Distribution vs. Detector Probability Histogram
In order to qualitatively illustrate the disparity between model and detector data, in Fig.4 a histogram was created for neutron data and scaled theoretical PDFs were overlaid for the normal models. Curie's criteria was used where variance is assumed equal to the sample mean of data [5][pp.75]. Matlab's histogram function was used with the "Normalization" parameter set to "probability" in order to create a probability mass function (PMF). Both PDFs were scaled to have the same maximum value as the PMF. Not only is the histogram data highly discretized but the overall shape does not match either normal model at lower count rates.

B. Correlated Sampling
During problem formulation in section II-A the term "random samples" was used to describe the N D number of data samples acquired. This is a fundamental assumption often referred to as a set of independent, identically-distributed samples of data. The integration samples as defined in section II-A3 follow this convention of sampling. However, the neutron detector data acquired are not sampled independently with non-overlapping segments, but in a temporally correlated and overlapping manner with a slightly different definition ofĨ sample is discarded from the beginning and a new sample is inserted to the end. Using this method of sampling is likely a significant contributing factor to the discrepancy between the theoretical standard critical valueZ ↵ and the practical valuê Z ↵ .
If independent sampling were performed as originally defined the sample rate would be decimated by a factor of N I . This would result in under sampling where a neutron detector could miss a moving source at sufficient speed that would have been otherwise detected using correlated sampling. Correlated sampling allows the increased sensitivity of averaging periods while retaining the sample rate. Consequently, when counting alarms all alarms that occur within a distance of N I − 1 samples of another alarm are counted as a single alarm event. The simple heuristic is employed where all correlated samples are first compared as normal to the thresholdμ I,j for the j-th background period and each alarm index is appended to a list. Alarms are processed in sequential order. Assuming that the i-th sample is an alarm then if any alarm exists after the i-th sample in the interval [μ I,i+1 , . . . ,μ I,i+N I −1 ] then the i-th alarm is removed from the alarm set. The same heuristic continues at the subsequent alarm. The whole process is repeated for all background periods.
The alarm heuristic described is incorporated into a class that is used for all presented false alarm measurements of correlated samples.

C. Theoretical Model Significance vs. Measured False Alarm Rates
In order to demonstrate the discrepancy between theoretical and measured significance levels, several theoretical models were evaluated. For convenience estimated significance levels were scaled to false alarm rates measured per hour. An 8 segment neutron detector was used to collect 68 hours of data with a sample period of 0.2 seconds, an integration period of 4 seconds, and a background period using the first 5 minutes of data. The requisite statistics for each model were computed on a single background period. The resulting critical values were employed on integrated samples using a similar hypothesis test of section II-B and compensated according to section III-B1. Each distribution was configured with the theoretical critical value to meet a target FAR of 0.5 alarms per hour.
Refer to the results compiled in Table I according to distribution. Since a Poisson model is a discrete distribution of natural numbers, measurements of count rates included in the correlated sampling of III-B are not possible. In this case the unprocessed integer counts were used at the 0.2 second sampling interval. Again, results include a normal model assuming the Curie criteria [5][pp.75]. The last entry labeled "Corrected Normal" uses a corrected standard critical value that was estimated using the theory developed in this paper. The valueZ ↵ = 4.0309 that results in a false alarm every 2 hours was corrected toẐ ↵ = 5.7235.
The results in Table I indicate that the statistical model with the closest FAR to the desired rate is actually a Poisson distribution. However, justification for correlated sampling becomes apparent when the resulting thresholds are all placed on the same scale in the critical value column. The Poisson threshold is approximately six times that of the corrected normal rendering the Poisson model relatively insensitive.

IV. STATISTICAL VERIFICATION OF FALSE ALARM RATE USING EMPIRICAL DATA
In this section the statistical model of section II is applied to hundreds of background periods using a single data set referred to as the training data. A statistical inference is made during training concerning FAR performance and the training data is set aside. Verification of FAR utilizes a large number of individual false alarm tests [1][pp.17] each of duration 10 hours where the verification library consists of over 1000 hours a The critical value was actually 2 counts but was scaled up to count rate per second for comparison purposes. of background data per scintillating detector for both a 6 Li optical fiber detector and a Li 6 coated waveguide detector. Using the statistics estimated during training, confidence intervals were constructed at various coefficients and applied across all verification false alarm tests.
In this section several common experimental procedures are followed. Data is acquired with a sample period of 0.2 seconds, an integration period of 4 seconds, and a background period of 5 minutes. Again, the standard critical value is tuned to meet a target FAR of 0.5 alarms per hour.

A. Confidence Interval using Training Data
Each training data set is acquired such that the number of samples N D results in about 70 hours of data to estimate a correctedẐ ↵ in tandem with the FAR performance statisticŝ µ ↵ andσ ↵ of section II-C5.
Instead of using the training statisticμ ↵ over N P B background periods, the target statistic↵ j over a single background is used in verification to estimate a confidence interval that models an individual ANSI false alarm experiment [1] 1], and probability of containment is Note that the critical valueZ ↵/2 can be looked up in a standard normal Z table accounting for a two-sided test whilē Z ↵ in (2) uses a single-sided hypothesis test. A realization of a confidence interval [2][pp.216] of FAR for a specific training set is and is tested for accuracy via the verification data set described next.

B. Containment of Verification Data
Verification data is segmented into background periods and processed like training data as in section II except thatẐ ↵ ,μ ↵ , andσ ↵ were already calculated via training data. Each verification data set is acquired such that the number of samples N D results in exactly 10 hours of data. For a given verification data set and j-th background period,↵ j is calculated according to (5) usingẐ ↵ from training while the selected verification data set is used to calculate both background statisticsμ B,j and σ j . The value↵ j is tested to determine if contained within the interval in (14). All verification data sets are processed similarly for the j-th background period. In this manner a measured confidence coefficient is calculated as a simple decimal fraction of successful containment. In order to account for background variation, a confidence coefficient is calculated for every possible background period in addition to the j-th and the result is averaged.μ ↵ is the sample mean andσ 2 ↵ is the sample variance of↵ j over every background period and every verification data set.

C. Confidence Testing Results
A total of 8 training files were used per detector with all results listed in each column of Tables II and V.μ D is the sample mean of all samples measured in counts per second. σ D is the sample standard deviation of all samples measured in counts per second.Ẑ ↵ is an estimate of the standard critical value that will achieve the desired false alarm rate measured in the number of standard deviations above the mean.μ ↵ is the estimated significance level of thresholdẐ ↵ scaled to FAR measured in alarms per hour.σ ↵ is the estimated standard deviation ofμ ↵ .↵ l and↵ u are the lower and upper bounds of FAR, respectively, for a 95% confidence interval.
The results of testing each training file with verification data is summarized in Tables III, IV Table VI. Measured confidence is in percent and is a function of not only↵ l and↵ u , but alsoẐ ↵ .

V. CORRECTED SIGNAL-TO-NOISE RATIO
A. Corrected Signal-to-Noise Ratio Definition 1) Signal to Noise Ratio Definition: With a neutron detector setup with ANSI moderation [1], a sample mean of count rate measurements denoted µ is acquired using a background period of 5 minutes. Using a 252 Cf source with an emission rate of 20,000 n/s placed at 1.5m from the detector [1], a sample mean of count rate measurements notated ⇢ is acquired using a period of 5 minutes. It is assumed in [5][pp.94] that data follow a normal distribution where σ 2 ⇡ µ.
SNR is then defined [5][pp.94] as SNR , 2) Sigma Correction Factor: When utilizing a neutron detector to alarm using a specific threshold, the threshold is set using a critical value that assumes that the detector will result in a specific FAR. Section III-C demonstrates how a theoretical standard critical value may result in a measured error rate that is drastically different than the rate that the model predicts.
When comparing two detectors a simple metric often used is SNR. The noise value encompassed by the denominator parameter σ represents a certain spread of data from the mean that occurs for a specific percent of samples. In other terms, this spread in data can be viewed as occurring at a specific rate for a real-time system. Let's define a correction factor γ 2 to adjust the measured variance σ 2 to account for the disparity between theoretical and measured rates. Using the alarm threshold developed in (3) with the additional correction factor and notation of this section results in a new alarm threshold Using the notation of this section and the practical standard critical value of section II-D, the practical threshold results in To correct the theoretical critical value according to (16) this threshold should be equal to the practical or measured threshold in (17) resulting in Therefore, if using the theoretical thresholdZ ↵ then σ could be corrected using the factor 3) Corrected Signal to Noise Ratio: Using the sigma correction factor in (18) and the approximation σ 2 ⇡ µ [5][pp.94], SNR of section V-A1 is adjusted to B. Corrected Signal-to-Noise Ratio Application 1) Backpack Test Overview: With respect to false alarms, the system should tolerate no more than five alarms due to natural background in a 10 hour period [1]. In addition, once the system is configured with an alarm threshold to meet the mentioned error rate, a source that is moving at a specific rate is required to be detected successfully in 96 out of 100 trials [1]. Note that an alarm threshold equal to or greater than the threshold to meet the specified FAR must be used in the moving source test to meet all system requirements.
In the subsequent sections several detector technologies are evaluated using the same parameters. The sampling periods described in section III were used so that the valueZ ↵ = 4.0309 results in a false alarm every 2 hours.
2) Fiber Detector Version 1 Test Results: The first fiber detector tested labeled D1 required modification due to an inherent flaw. A standard critical value ofẐ ↵ = 5.87 resulted in a measured FAR of 0.5 alarms per hour. Using the corresponding threshold in a moving source test yielded a detection rate of 90%. Detector D1 measured a background mean of µ = 4.94 according to section V-A1. When measuring the 252 Cf source a flaw with D1 occurred where the measurement slowly increased in time. Consequently, the initial value of count rate ⇢ = 20 was used instead of a 5 minute average. A sigma correction factor of γ = 1.46 resulted. Li coated waveguide was labeled D3. A standard critical value ofẐ ↵ = 5.61 resulted in a measured FAR of 0.5 alarms per hour. Using the corresponding threshold in a moving source test yielded a detection rate of 97%. The detector measured a background mean of µ = 0.26 according to section V-A1. When measuring the 252 Cf source the detector resulted in a count rate of ⇢ = 3.63 after a 5 minute average. A sigma correction factor of γ = 1.39 resulted.

C. Signal-to-Noise Ratio Test Results
The conventional method resulted in a SNR value for D1 about twice that of the value for D2 despite sensitivity testing [1] indicating D2 is marginally more sensitive than D1. Likewise, the conventional method resulted in a SNR value for D1 that was higher than the value for D3 despite sensitivity testing [1] indicating the contrary. The conventional method also indicated a SNR value for D3 that was substantially higher than D2 despite similar detection sensitivity.
Notice that in Table VIII in both cases the relative spread of SNR γ values more accurately reflected the corresponding spread in measured detection percent compared to the conventional SNR metric. However, the value of SNR γ for detector D1 remained higher than the value for detector D2 despite D1 proving less sensitive in a dynamic test.

VI. IMPLICATIONS OF SYSTEM PARAMETER MODIFICATION
Utilizing the method developed to process many background periods has additional advantages with respect to the analysis of practical system parameters. Parameters such as sample period, background duration, or even alarm methods discussed in section II-F could be adjusted in order to measure the effects on system response such asẐ ↵ in (10) or the observed FAR in (6).
For instance, the background period was altered from 1 minute to 5 minutes in 1 minute increments using the same detector, configuration, and data set as in section III-C. The resulting maximum likelihood estimate ofẐ ↵ , mean estimatê µ ↵ , and variance estimateσ 2 ↵ are listed in Table IX where significance levels are scaled to FAR per hour for convenience.
With FAR approximately constant, notice as background period decreases system sensitivity decreases via the estimated standard critical valueẐ ↵ increase. Each estimated FAR data point remains approximately equal to the target rate but slightly lower when the background period falls below 3 minutes. However, note the drastic increase in the standard deviation of FAR when the background falls below 3 minutes.

VII. CONCLUSION
When utilizing all practical aspects of a radiation detection system to recognize neutron sources including the background period, integration period, correlated sampling, and the sparse nature of acquired data, the measured noise performs at a FAR significantly different than the FAR that the theoretical model predicts. This practical error performance of a neutron detection system must be normalized to some known reference in order to properly characterize practical detection sensitivity.
Modifications to detection algorithms, manufacturing process, or physical composition of neutron detectors could result in a deviation in the necessaryẐ ↵ to meet a required FAR. A method was introduced to adjust theoretical alarm parameters to more accurately quantify and meet a desired FAR using a static measurement and an automated algorithm.
Performing an additional static measurement with a source while applying the correction factor γ results in a SNR metric that better agrees with more involved dynamic detection tests.
In conclusion, the estimate of standard critical value in (10) and the sigma correction factor in (18) should be considered as methods to characterize how the measured error rate and detection sensitivity are affected by differences in neutron detection systems.