An Error Analysis Toolkit for Binned Counting Experiments

We introduce the MINERvA Analysis Toolkit (MAT), a utility for centralizing the handling of systematic uncertainties in HEP analyses. The fundamental utilities of the toolkit are the MnvHnD, a powerful histogram container class, and the systematic Universe classes, which provide a modular implementation of the many universe error analysis approach. These products can be used stand-alone or as part of a complete error analysis prescription. They support the propagation of systematic uncertainty through all stages of analysis, and provide flexibility for an arbitrary level of user customization. This extensible solution to error analysis enables the standardization of systematic uncertainty definitions across an experiment and a transparent user interface to lower the barrier to entry for new analyzers.


I. INTRODUCTION
Despite the importance and complexity of systematic uncertainty evaluation in HEP, there is no communitywide standard for the storage and propagation of uncertainty through analysis infrastructures. At the experiment level, an error analysis system must be a central consideration during the design of analysis infrastructure. In the absence of such a system, consideration of systematic uncertainty is incumbent on individual analyzers and can be relegated to the final stages of analysis after substantial design choices have already been made. This impedes the effectual integration of error analysis and represents a substantial barrier to entry for new analyzers. To this point, analyzers within an experiment are often left to implement solutions to this independently of one another, creating an inefficiency of effort and increased opportunity for inconsistency and mistakes.
We present a solution to this problem, the MINERvA Analysis Toolkit (MAT), which can be adopted to streamline an experiment's approach to evaluating systematic uncertainty. The MAT (1) implements the manyuniverse error method, (2) is built on a unique and powerful histogram object (MnvHnD), and (3) offers experimentwide standardization of systematics and simultaneous customizability and extensibility via Universe classes. This toolkit was developed within the MINERvA analysis environment where it has facilitated dozens of neutrino cross section measurements over the last decade  and forms the basis for MINERvA's data preservation effort [35]. The MAT was designed to be adopted directly, in part or in whole, by any HEP experiment. It is directly applicable to any ROOT-based [36] analysis environment in which simulated physics interactions are evaluated on an event-by-event basis. We also believe that the underlying approach and some of the classes which underpin the MAT's functionality could be implemented in any software environment for a binned counting experiment in any field of study.
In this paper, we introduce in detail the design of the MAT, provide examples for its current and future use within MINERvA, and describe how it could be adopted more widely. Section II introduces the "many universe" approach to evaluating systematic uncertainty, which is a prerequisite for this toolkit. Sections III and IV describe the technical implementations of the MnvHnD and Universe classes, and describe how MAT analyses are structured, including pseudocode examples. Finally, Section V discusses MAT integration into different experimental analysis environments and benefits of adopting the MAT wholesale.

II. "MANY UNIVERSE" ERROR ANALYSIS
Modern experimental HEP is inextricably coupled to complex simulations that are designed to replicate the physics we measure. These simulations are parameterized by variables that correspond to the design of an experiment and the related physics processes. In general, the output of these simulations is a collection of individual interactions (events), which are subject to the same reconstruction and analysis used for data. The observables of an analysis are binned as functions of kinematic variables and stored in histograms. When analyzed in aggregate over many events, the distribution of simulated event kinematics can be compared to data to judge the efficacy of the simulation. In this manner data is used to fine-tune the simulation and ultimately to improve the underlying physics models. In the "many universe" approach to evaluating systematic uncertainty, aspects of the simulation and reconstruction are shifted to reflect distinct sources of uncertainty. An analysis is repeated many times, and in each instance the effect of a particular shift is propagated through subsequent stages of the simulation-reconstructionanalysis chain. The particular configuration of the simulation and reconstruction in any individual instance defines a "universe". Two universes may be distinct from one another, for example, because of the choice of values for parameters within the beamline simulation or within a track reconstruction algorithm. The choice of nominal values (or "best guesses") for all parameters in the simulation and reconstruction defines the "Central Value" (CV) universe and forms the basis against which data measurements are compared, and against which uncertainties are evaluated. Each systematic universe is constructed to isolate the impact of shifting a particular aspect of the CV universe in a measured way, and the uncertainty on a measurement corresponding to each individual source is determined based on the spread of outcomes across two or more universes.
It can be constructive to distinguish between two conceptually different classes of shifts away from the CV. The first, vertical shifts, arise from sources of systematic uncertainty which do not directly affect any kinematic variable, and their effect is only to modify the weight a particular event is given in an analysis. Vertical shifts cause the contents of a histogram bin to be increased or decreased but do not lead to event migration between bins, and therefore will never lead to events migrating into or out of a selected sample. Because of this, the effect of a vertical shift of a far-upstream aspect of the simulation can be characterized downstream in the analysis by changing the relative weighting of events based on their kinematics. This "reweighting" enables evaluation of vertical systematic shifts without the need to repeat computationally intensive stages of the simulation. The second, lateral shifts, arise from sources of systematic uncertainty that do directly affect a kinematic variable and therefore can cause events to migrate between bins and into and out of a selected sample. A lateral shift may cause an event which does not pass a selection cut in the CV universe to instead pass in the shifted universe, or vice versa. The impact of a lateral shift on the final result must be determined by repeating the portion of the simulation-reconstruction-analysis chain downstream of the shift, and cannot be evaluated via reweighting.
In practice, the value of a systematic shift is determined a priori, and often corresponds to the standard deviation of a Gaussian distribution. For example, a dedicated study of muon momenta might inform a ±2% shift applied to muons on an event-by-event basis to represent the uncertainty associated with their reconstruction. In this case, the impact of this uncertainty on an observable is calculated using the variance between the +2%-and −2%-shifted universes in each bin of the analysis. Many uncertainties are adequately represented by two universes because the pair of universes reflects a simple underlying distribution (usually assumed to be Gaussian). Others motivate a large number of universes to encapsulate the effect of simultaneously varying multiple parameters within a more complex model. For example, neutrino beamline fluxes are predicted using notoriously complex simulations, in which the correlations among internal parameters are nontrivial. In this case, an ensemble of universes is used, and in each multiple parameters are varied according to their individual statistical distributions. The impact of this uncertainty on an observable is then calculated using the variance between all of the universes in each bin of the analysis.
Within the MAT, individual systematic universes are represented by Universe classes, and universes corresponding to the same source of uncertainty are grouped into "error bands". The MnvHnD class, described in detail in the following section, provides the interface to propagate an analysis histogram in many systematic universes simultaneously. The uncertainty on an observable stored in an MnvHnD can be calculated on-demand at any stage of the analysis by using the spread across universes to construct a covariance matrix corresponding to any individual systematic uncertainty. The covariance matrix corresponding to any individual systematic uncertainty, or to the total uncertainty, can be reported, or the diagonal entries of the covariance matrix can be used to produce vertical error bars on a histogram. This approach to reporting uncertainty is common in HEP, but the many universe technique does not exclude alternate error treatments.

III. THE MNVHND CLASS
A successful implementation of the many universe approach requires that an analysis be repeated many times. In the MAT, this is achieved using an MnvHnD object, which stores the central value and systematic universe histograms for an analysis observable. With all histograms stored in a single object, analysis calculations can proceed simultaneously in all universes. The interface of the MnvHnD is implemented by broadcasting the THnD interface from ROOT to each systematic universe, so uncertainties are automatically propagated correctly when MnvHnDs are Add()ed, Divide()d, or Scale()d. An MnvHnD can Write() itself to a ROOT file to persist an analysis result with everything needed to produce a covariance matrix, defer further processing to a separate program, or separate plot formatting from computationally intensive analysis routines.
An MnvHnD encapsulates a parallel histogram for each systematic universe as shown in Figure 1. To populate a histogram in a particular universe, an analyzer need only tell an MnvHnD which Universe a Fill() value belongs to (as in line 12 of Code Example 1). The covariance matrix for any group of Universes can then be calculated on demand. An uncertainty is usually evaluated using two or more Universes grouped into an error band (implemented in the MAT as the MnvErrorBand class). Covariance matrices are calculated for each error band individually, and the total covariance matrix on a result is equal to the sum of all of the individual covariance matrices. In this way, the variance of each bin is added in quadrature as expected for uncorrelated uncertainties.
The MAT provides tools to visualize the data and uncertainties stored in an MnvHnD. Because each error band is propagated independently, it is straightforward to see how each source of uncertainty contributes to the total uncertainty of a distribution. Figure 2 shows an example of an uncertainty breakdown as a function of an arbitrary kinematic variable, and the associated total correlation matrix. One of the key advantages to the MAT is that, through the bookkeeping provided by the MnvHnD, a user retains access to an observable in all systematic universes throughout every stage of their analysis. This enables, for example, detailed studies comparing the relative impact of uncertainties on different measurements.

IV. SYSTEMATIC UNIVERSE CLASSES
The MAT implements the many universe approach to error analysis, described in Section II, using the MnvHnD to store histograms and a Universe interface to abstract the concept of systematic universes. In a typical analysis, the user defines a CV Universe which includes methods to calculate all required event kinematics and weights (using information from the event record), and systematic Universe classes model alternatives to the default analysis by overriding the behavior of the CV Universe. The methods of the CV Universe serve as a base interface for all other Universes, with each systematic Universe overriding one or more of the CV methods. In the case of vertical systematic shifts, one or more methods which calculate an event's weights are overridden, and in the case of lateral systematic shifts, one or more methods which calculate an event's kinematics are overridden. Code Example 1. An example of an analysis event loop in the MAT. Within the loop over independent events is a loop over systematic Universes. In each Universe, cuts are applied and event kinematics and weights are calculated. The PassesCuts() function needs no special logic to handle systematic uncertainties that may affect it. Rather than fill a TH1D, the user instead fills an MnvH1D.
Code Example 2 shows an example of a Universe. This class serves both as the interface that all systematics implement and as the CV implementation itself. The MAT is written in C++, so weights and observables are virtual functions that can be overridden by derived classes. BaseUniverse is the parent class to all Universes, and can provide experiment management and optimization tools.  Code Example 2. Example of a Universe, including methods to calculate event kinematics and weights. This class serves both as the Universe interface for an analysis and as the CV Universe implementation. Every observable that could be affected by a systematic uncertainty is implemented as a virtual function. An observable can be shifted by overriding its virtual function as done in Code Example 4.
Systematic Universes are implemented as mixins, as in Code Example 3, so that uncertainties central to an experiment are implemented the same way by all analyzers. In this example, ResonantPionUniverse<> is a template for a class that derives from a CV Universe with a GetWeight() function. This class template only needs to know about the weight it shifts. When we construct the list of Universe pointers to loop over, the base Universe class carries the rest of the CV interface into ResonantPionUniverse<Universe>. Similarly, in Code Example 4 the GetLeptonEnergy() observable is shifted while an arbitrary Universe interface is left intact. If these Universes were plugged into the systematics container in Code Example 1, CalibrationLeptonEnergyUniverse (Code Example 4) would change which events make it past line 6, and ResonantPionUniverse (Code Example 3) would change the shape of the histogram stored in the TH1D of the corresponding error band in energyHistogram. energyHistogram itself is an MnvH1D which contains the CV histogram and two MnvErrorBands.
The separation of concerns between physics calculations and the evaluation of systematic uncertainties brings some broad advantages to MAT-based analyses. An analysis routine that fills MnvHnDs is modeled by the same code for each Universe. Analyzers only need to study one path of control to understand what quantity each histogram contains. A developer focused on a particular systematic uncertainty only needs to read a Universe's implementation rather than hunt for blocks of code scattered throughout the analysis routine. Version control systems automatically provide a concise summary of how a systematic uncertainty algorithm has developed over time. Furthermore, any MAT analysis program is modular and does not need to be changed to introduce a new systematic uncertainty.
1 template < class UNIVERSE > 2 class R e s o n a n t P i o n U n i v e r s e : public UNIVERSE 3 {  Code Example 3. The event weight is scaled up 10% for events which are truly resonant pion interactions in this Universe. This is an example of a "vertical" systematic shift. Code Example 4. The event's lepton energy is increased by 50 MeV in this Universe. This is an example of a "lateral" systematic shift.

V. MAT ADOPTION
The MAT's flagship tools, the MnvHnD and Universe classes, are not reliant on particular technologies, but are merely powerful aids to many-universe binned analyses. They work well together, but are also standalone classes and can each be adopted individually to support an experiment's systematics infrastructure. The MnvHnD on its own can simplify analysis calculations in many universes, provide on-demand covariance matrices, enable visualization of individual error bands (as shown in Figure 2), and serve as a standard histogram container for experiments. The Universe classes can standardize systematics treatment across an experiment and isolate systematic impacts to the single simulation parameter or analysis quantity that they affect.
Beyond these tools, the MAT has an intended approach to analysis, which, when adopted, unlocks the full potential of the toolkit. This approach, demonstrated in Code Example 1, uses a single event-universe loop to measure an analysis quantity and collect into an MnvHnD all information needed to calculate its systematic errors. This approach simplifies analysis flow, gives a central role to systematic uncertainties, and makes transparent the particular many-universe strategy employed. This approach, however, is best-used in concert with certain experiment-wide choices regarding data organization and analysis infrastructure design.
It would be difficult to capture the full range of possible conditions in an experiment's analysis environment and their consequences on MAT implementation, so we consider only MINERvA here as an illustrative case. On MINERvA, the common reconstruction pass over raw data is relatively minimal. Each analyzer thus has access to low-level variables, and runs their own reconstruction (for example, specialized tracking) on top of the general reconstruction to produce a custom tuple with new branches tailored to their analysis needs. This design choice is well-suited to the MAT approach because access to low-level objects allows users to dynamically study new sources of uncertainty that affect low-or high-level variables alike. In general, sources of systematic uncertainty can shift quantities throughout the simulation-reconstruction-analysis chain. Some are most conveniently expressed as a shift in a high-level, downstream analysis variable, which can be directly encoded in a Universe class (see, for example, Code Example 4). Others may most conveniently shift a low-level simulation variable with consequences downstream on the analysis that are too-complicated to summarize in terms of high-level variables (see Code Example 5). After quantifying the new source of uncertainty and the point in the simulation-reconstruction-analysis chain where it should be introduced, MINERvA analyzers can then share across the experiment a corresponding Universe class and (if needed) a reconstruction prescription to produce systematically shifted higher-level branches. Code Example 5. The uncertainty in target mass has a complex impact on a track's particle identification score. The shift is evaluated during reconstruction and its effect on particle identification is stored as a tuple branch, track_PID_score_target_mass_shift.
MINERvA's infrastructure choices, individual reconstruction and access to low-level objects, were feasible within MINERvA's processing and data storage constraints, and they enable MAT's event-universe loop to be a practical one-stop-shop for filling the CV and systematically-altered histograms of the MnvHnD. These choices further enable agile development of experiment-wide systematic errors in which users are able to express systematics in their most convenient form (as shifts in either high-or low-level variables).
It is not practical in every case to provide the user access to the lowest-level parameters which may be shifted by a source of systematic uncertainty. Interaction event generators, for example, provide event weights which are relatively high-level expressions of an underlying shift on a model parameter. MINERvA has found a good compromise between providing users direct access to some low-level variables for studying new systematics and offering higher-level aggregated shifts as weights to avoid computationally expensive re-simulation. For any compromise, in order to make use of MAT's single-loop analysis approach, it is critical that experiments make available to the user a representation of a systematic shift that they can propagate through their analysis.

VI. CONCLUSION
In section I, the need for common error analysis tools and a top-down approach to error analysis was motivated and the MAT was introduced. Section II introduced the abstract concept of universes and their role in the many universe error analysis method. Sections III and IV described MAT's foundational tools, the MnvHnD and Universe classes, and it showed an example of typical MAT analysis flow. Finally, section V discussed the adoption of all or part of the MAT, and the important experiment-wide choices that impact its potential benefits.
There are extensions to the MAT which build on the functionality we have presented. In addition to providing the analysis foundation for dozens of MINERvA cross section measurements, the MAT is now serving MIN-ERvA's data preservation effort. For eventual analysis of preserved MINERvA data, measurements in terms of new observables is challenging because the expertise needed to assign systematic uncertainties to those observables may no longer exist. A MAT analysis solves this problem automatically for any combination of preserved observables. MINERvA has also built on a central principle of the MAT and had some success recasting new models as reweights of existing models [26]. MAT analyses can introduce reweights of the underlying model the same way a vertical systematic universe is implemented.
As experiments grow, accessible and flexible analysis systems will become increasingly important for maximizing physics output. The MAT lowers the barrier to entry for analyzers by providing them a working error analysis system, thereby allowing users to spend less time writing code and more time thinking about physics. MAT's systematics standardization further reduces code duplication and potential for mistakes.
The MINERvA collaboration is preparing to publicly release the MAT this year, and continues active development to expand its capabilities. The MAT has not yet been used to perform a neutrino oscillation analysis or to apply a wide variety of machine learning methods, but we see no obstacles to those applications. Users of the MAT are currently exploring the idea of sharing systematics inter-experimentally. For example, we believe that broad adoption of the MAT could facilitate shared implementation of GENIE [37] model uncertainties in the neutrino interactions community. Outside of HEP, MAT's ideas could be implemented in new technologies and in any field which performs a many-universe binned analysis.