Report on Tests and Measurements of Hadronic Interaction Properties with Air Showers

We present a summary of recent tests and measurements of hadronic interaction properties with air showers. This report has a special focus on muon density measurements. Several experiments reported deviations between simulated and recorded muon densities in extensive air showers, while others reported no discrepancies. We combine data from eight leading air shower experiments to cover shower energies from PeV to tens of EeV. Data are combined using the z-scale, a unified reference scale based on simulated air showers. Energy-scales of experiments are cross-calibrated. Above 10 PeV, we find a muon deficit in simulated air showers for each of the six considered hadronic interaction models. The deficit is increasing with shower energy. For the models EPOS-LHC and QGSJet-II.04, the slope is found significant at 8 sigma.


Introduction
Cosmic rays with energies larger than 10 15 eV can only be indirectly observed via extensive air showers. A detailed understanding of the hadronic physics in an air shower is needed to infer the energy and mass of the cosmic ray from air shower measurements. The fluorescence technique has reduced the model-dependence for the energy measurement by tracking the longitudinal shower development, but inferring the mass accurately is still a challenge. * e-mail: hdembins@mpi-hd.mpg.de The energy-dependent mass composition of cosmic rays carries a unique imprint from the origin and propagation of cosmic rays. Attempts to measure the mass composition are complementary to direct searches for cosmic-ray sources via coincident observation in a multi-messenger approach or statistical correlation with potential sources. In Fig. 1, predictions (lines and markers) are shown for the mean-logarithmic-mass lnA from different theories. A measurement to an accuracy of 10 % of the protoniron difference is technically possible, but uncertainties in Figure 1. Mass composition of cosmic rays quantified by lnA as a function of cosmic-ray energy E. Model predictions (markers and lines) are compared to data bands, all taken from the review by Kampert and Unger [1]. Vertical arrows at the sides indicate the instrumental error achieved by leading experiments at low and high energies. The figure is discussed in the text.
hadronic interactions prevent the full exploitation of existing measurements. The two leading observables to infer lnA are the depth X max of the shower maximum in the atmosphere (yellow band in Fig. 1), and the number N µ of muons produced in the shower (green band in Fig. 1). The width of those bands has two main contributions: the experimental uncertainties, and the uncertainties inherent in converting the air shower observables into lnA , which requires air shower simulations with hadronic interaction models. Leading models are EPOS-LHC [2], QGSJet-II.04 [3], and SIBYLL-2.3c [4]. These models are called post-LHC models, due to their tuning to LHC data. Their variation is dominating the uncertainty. Moreover, the mass composition obtained with X max and N µ is not consistent, which implies that the models are not correctly describing all aspects of hadronic physics in air showers.
Tests and measurements of hadronic interaction properties with air showers address these issues. They can reduce uncertainties in modeling the air shower development, so that accurate estimates of the cosmic-ray mass can be obtained. They also offer opportunities to test the standard model of particle physics under extreme conditions. The cms-energy in the first interaction of an air shower initiated by a 10 20 eV cosmic ray is 432 TeV in the nucleon-nucleon system, 33 times higher than what is currently accessibly at the LHC.
In this report, we will review recent tests and measurements of hadronic interaction properties, starting with the measurements of the electromagnetic shower component, but then focusing on muon density measurements. There is a long-standing problem with the correct simulation of muons in air showers. The HiRes/MIA collaboration already reported a discrepancy in simulated and measured air showers between 10 17 to 10 18 eV in the year 2000 [5]. The NEVOD-DECOR experiment reported an increase of muon density relative to simulation from 10 15 to 10 18 eV in 2010 [6]. Above 10 17 eV, an excess of multi-muon events at energies around 10 18 eV over the expectation was observed, and a muon deficit in simulations was suggested as an explanation. The experiments KASCADE-Grande [7] and EAS-MSU [8] reported no muon discrepancy in this energy range when the latest hadronic interaction models tuned to LHC-data are used to simulate air showers, while the SUGAR array [9] reported a muon deficit even for these models. The Pierre Auger Observatory [10,11] and Telescope Array [12] also observed a muon deficit in 10 19 eV showers simulated with the latest models. These measurements, preliminary data from IceCube [13] and AMIGA [14], and unpublished data from Yaktusk [15] are systematically compared in this report.

Measurements of electromagnetic shower component
Most measurements of the electromagnetic component of air showers show good or acceptable agreement with simulations, especially when the post-LHC generation of hadronic interaction models is used. We list recent measurements briefly and focus on deviations.

Proton-air cross-section
The proton-air cross-section has been measured based on the slope of the tail of the X maxdistribution [16,17] by the Pierre Auger Observatory and Telescope Array. Conceptually, this is the most direct measurement of a hadronic interaction property with air showers, the dependence of the analysis on uncertainties in the mass-composition and the shower development is reduced. The measurements start to constrain hadronic interaction models.
Moments of X max -distribution The first two moments of the X max distribution measured by the Pierre Auger Observatory have been mapped to the first two moments of the lnA-distribution with parameters from simulations [18]. Some unphysical second moments were found for QGSJet-II.04.
Longitudinal shape The average longitudinal shapes of air showers recorded by the Pierre Auger Observatory have been compared to simulations, more specifically the width of the profile around the maximum and the asymmetry of the rising and falling edge [19,20]. The measurements are compatible with simulations using post-LHC models.
Lateral density profile The slope of the lateral density profile of electrons and photons is sensitive to the cosmicray mass. At the IceCube Neutrino Observatory, measurements of the slope with the surface detector array in 2.8 km altitude a.s.l. were compared to measurements of muon bundles below a kilometer of ice, which are also sensitive to the mass [21]. SIBYLL-2.1 [22] and EPOS-LHC are inconsistent with the combined measurements.
Attenuation with zenith angle The attenuation of the signals measured at ground with increasing zenith angle have been compared by Telescope Array up to 45 • [23], and up to 40 • by KASCADE-Grande [7]. No deviations to simulations were found.

Measurements of muonic shower component
Most measurements of the muonic component of air showers show disagreement to simulations. The deviations are difficult to reduce to a single cause, further research on muons in air showers is needed.
Lateral density A large body of data on lateral density measurements is available and has been converted into a comparable format. The comparison of lateral density measurements will be discussed in the next section.
Production depth or height The muon production height has been inferred from signal arrival times in ground detectors of the Pierre Auger Observatory [24], and via a muon tracking detector within KASCADE-Grande [25]. The Pierre Auger Observatory converts the measured heights into equivalent slant depths, which is recommended, since hadronic cascades develop along slant depth. Disagreement to simulations are found for QGSJet-II.02 [26] at PeV energies, and for QGSJet-II.04 and EPOS-LHC at EeV energies.
Attenuation with zenith angle The attenuation of the muon lateral profile with growing zenith angle has been measured by KASCADE-Grande up to 40 • [7]. Simulations show larger attenuation factors than data for the models QGSJet-II.02, QGSJet-II.04, SIBYLL-2.1 and EPOS-LHC. The attenuation has a dependence on the lateral distance to the shower axis, which is not reproduced by simulations.

Multiplicity in muon bundles
The ALICE experiment has recently measured the multiplicity in muon bundles [27]. The relative abundance of high and low multiplicity events is well reproduced by QGSJet-II.04.
Atmospheric flux of TeV muons The IceCube Neutrino Observatory has measured the atmospheric flux of TeV muons [28,29]. The total flux is well reproduced by SIBYLL-2.1. Some tension is found in the zenith-angle dependent flux, which could potentially be fixed by adding charmed mesons and their decays to the simulations. Contributions of charmed mesons are negligible for most air shower measurements, but contribute significantly to multi-TeV muons and neutrinos.
Lateral separation of TeV muons The IceCube Neutrino Observatory has observed events with laterally separated muons from muon bundles [30]. The events are interpreted as single muons with high transverse momentum, which dominantly originate from the first interaction in an air shower. The statistical distribution of the lateral separation is related to the transverse momentum distribution of hadrons produced in the first interaction. Partial agreement is found for SIBYLL-2.1 and 2.3, disagreement for EPOS-LHC and QGSJet-II.04.
Rise-time of shower front The Pierre Auger Observatory has published mass-composition measurements based on the normalized rise-time [31] and the rise-time asymmetry [32]. The rise-time is the time interval in which the collected surface detector charge rises from 10 % to 50 % of the final value. Muons in an air shower tend to arrive before electrons and photons, so the rise-time is a tracer of the muon content of the shower. Both analyses show disagreement between data and simulations for QGSJet-II.04 and EPOS-LHC. At lateral distances larger than 1000 m from the shower axis, however, the rise-time asymmetry is compatible with QGSJet-II.04 predictions.
We further show HiRes-MIA data [5] for comparison in several plots, but exclude them from the final results. The HiRes-MIA result systematically differs from all more recent measurements and we did not succeed in contacting one of the authors to better understand the differences. A direct comparison of muon measurements is not possible, since the muon measurements are performed under very different conditions and using different techniques. The muon density at the ground depends on many parameters which differ from experiment to experiment: • Cosmic-ray energy E, • Zenith angle θ, • Shower age (depends on altitude of the experiment, local atmosphere, and zenith angle of the shower), • Lateral distance r from shower axis, • Energy threshold E µ,min of the detectors for muons.
Since direct comparisons of the measured muon density are unfeasible, each experiment usually compares to air shower simulations. The data/MC ratio is comparable between different analyses and different experiments. In a way, air shower simulations provide a universal reference. The caveat of this approach is that two measurements are only comparable, if simulations with the same hadronic interaction model are available for both. How the measurements cover the space of parameters is shown in Fig. 2. The measurements as a whole cover most of the parameter space. This is very valuable, since it allows us to detect a possible dependence of the data/MC ratio along all dimensions of the parameter space. Comparing data/MC ratios instead of just the data introduces a complication. The value of the ratio depends on how the corresponding air shower simulations are selected. While most of the shower parameters can be easily matched in simulation and experiment, the cosmic-ray energy E is difficult to match. This has a large impact. According to the Matthews-Heitler model of air showers [34], the muon number depends on the energy E and mass A of the cosmic ray in the following way with power-law index β 0.9 and energy constant C. The muon number scales almost linearly with the cosmic-ray energy, and with a small power of the mass. For an easier discussion, we take the logarithm on both sides of Eq. 1 and compute the mean, which gives a linear equation Air shower experiments usually have independently calibrated energy scales with systematic uncertainties of 10 % to 20 %. Two otherwise identical experiments with an energy-scale offset of 20 % would find a 18 % offset in the data/MC ratios, simply because equivalent measurements are compared to air showers simulated at different (apparent) energies. Cross-calibrating the energy-scales removes these offsets. When lnN µ is directly compared to simulations, the value of lnA matters. The situation is better when comparing data from different experiments. The value of lnA is a function of the energy E only, so the effect on two experiments is the same, if both experiments compute the muon density from an unbiased sample of air showers, meaning that cosmic rays are included in the sample with a probability independent of their mass A [35]. If the sample has a different mass composition lnA , an offset (1 − β)( lnA − lnA ) is introduced in a comparison to an unbiased sample. A poor detector resolution or using wide energy bins can also introduce biases [36].
Based on this discussion, we can classify the experiments by the measured observables into three groups.
Shower energy and muon density The Pierre Auger Observatory & AMIGA, Telescope Array, and the Yakutsk experiment are all capable of measuring the shower energy E cal using Cherenkov or fluorescence light, which can be converted into the primary energy E with a low modeldependence. These experiments come close to measuring the primary cosmic-ray energy E independently of the muon density at the ground, and can compute the data/MC ratio for showers with the same energy.
The IceCube Neutrino Observatory also falls into this category, although it does not observe the air shower optically. It can measure the shower energy with low modeldependence thanks to its high altitude, which places the detector close to the average depth of the shower maximum [37].
Muon and electron density KASCADE-Grande and EAS-MSU measure signals from electrons and muons separately. These experiments compute the data/MC ratio for showers in the same electron-density interval, not for showers in the same cosmic-ray energy interval. The electron density is correlated to the cosmic-ray energy, but also to the muon density [1]. Computing the data/MC ratio for showers in the same electron density interval is conceptually different from the previous case, since Eq. 1 and 2 do not apply. The data/MC ratios computed by the first and second class are not directly comparable.
Muon density only NEVOD-DECOR and SUGAR are pure muon detectors, without a separate energy estimator. The data/MC ratios are again computed differently. The flux of showers is measured in intervals of an eventwise muon density estimate. The measured flux is then compared with a simulated flux in the event-wise muon density estimate, computed from an external model of the cosmic-ray all-particle flux using a mass-composition assumption (proton or iron) and air shower simulations. If  the fluxes differ, one can infer the data/MC ratio R based on this equation: (3) NEVOD-DECOR uses an average cosmic-ray flux computed from multiple experiments, while SUGAR uses the flux measured by the Pierre Auger Observatory. An energy is assigned to R based on the muon density and E ∝ N µ 1/β .

Energy scale offsets and cross-calibration
If an experiment uses the same energy proxy for measurements of the cosmic-ray flux and the muon density, relative offsets in data/MC ratios can be corrected that arise purely from the different energy scales. The cross-calibration uses that the cosmic-ray flux is very isotropic up to 10 19.2 eV and can serve as a universal reference. If we assume that all deviations in measured fluxes between different experiments arise from energyscale offsets, then a relative energy-scale ratio E data /E can be found for each experiment so that the all-particle fluxes overlap, based on an equation analog to Eq. 3. This approach is well-known and has been used successfully in other works [38,39].
The cross-calibration factors used in this report are given in Table 1. The Spectrum Working Group formed  The factor for Yakutsk is obtained by matching a preliminary update of their all-particle flux against the GSF model, see Fig. 5, to obtain a ratio E Yakutsk /E ref,GSF × E ref,GSF /E ref = 1.15×1.08 ≈ 1.25. The factor for NEVOD-DECOR is obtained by comparing their custom all-particle flux parameterization with GSF (see same figure). The two models differ locally, which could be translated into energy-scale offsets within ±2 %, but no global offset is apparent. The energy scale of NEVOD-DECOR is therefore taken to be the same as GSF, No cross-calibration factor can be given for KASCADE-Grande, since the KASCADE-Grande flux is computed using a different energy estimator. For EAS-MSU, no all-particle flux is available for crosscalibration. SUGAR uses the flux from the Pierre Auger Observatory in its computation of the data/MC ratio and therefore has the same energy-scale adjustment factor.
We emphasize that the cross-calibration cannot eliminate a global offset of all experiments to the true energy scale, with corresponding shifts in the data/MC ratios. The energy scales of leading experiments have uncertainties in the order of 10 to 20 %, we assume that the reference energy-scale has an uncertainty of at least 10 %.

Combined measurements
Eq. 2 displays a simple relationship between the measured muon density, lnA and logarithmic shower energy. To compare all the measurements, we introduce the z-scale, which is inspired by Eq. 2, where N µ det is the muon density estimate as seen in the detector, while N µ det p and N µ det Fe are the simulated muon density estimates for proton and iron showers after full detector simulation. The z-scale, while being rather abstract, has advantages over other choices that were proposed. The energy-dependence of N µ is removed and the expected range is from 0 (pure proton showers) and 1 (pure iron showers), if there is no discrepancy between real and simulated air showers. This is convenient. Furthermore, biases of the form ln N µ det = A + B ln N µ in the measured muon density estimate N µ det with respect to the true muon density N µ cancel in z.
Shown in Fig. 6 are the converted measurements. The z-values are computed relative to simulations and therefore a different result is obtained for each hadronic interaction model although the same data are used. The conversion to z is only possible when N µ det p and N µ det Fe are available for that model. Therefore not all data points can be shown for all models. Overall, the data suggest an energy-dependent trend, but with a large scatter.
The scatter is drastically reduced after the crosscalibration, as shown in Fig. 7. The cross-calibration causes a shift in the simulated values N µ p and N µ Fe , which were computed for the energy E data , but are needed for E ref . Based on Eq. 2, we get ln N µ ref = ln N µ data − β ln(E data /E ref ). The shift is the same for proton and iron showers. It cancels in the denominator of Eq. 4, but enters with the opposite sign in the numerator. We get The values of N µ Fe and N µ p are taken for each model from CORSIKA simulations. The points also move horizontally by the relative amount (E data /E ref ) −1 , a minor effect. As expected, the cross-calibration improves the agreement of data from different experiments. Before and after the cross-calibration, the z-values rise above the iron line beyond 10 19 eV. The interpretation at lower energies changes, however. In case of IceCube, the originally negative z-values suggested that the muon density in proton showers simulated with EPOS-LHC for shower energies below 10 16 eV was too high. After the correction, the zvalues fall between proton and iron. In case of Yakutsk, the original data suggested very low muon densities with partly negative z-values. After the correction, the Yakutsk data is consistent with others within uncertainties. We emphasize again that the reference energy-scale after crosscalibration has a remaining uncertainty of at least 10 %. This means that z-values in all plots can be collectively varied by about ±0. 25.
To further refine the conclusions, we consider the effect of an energy-dependent mass composition. With Eq. 2 and Eq. 4 the expected value z mass for a given meanlogarithmic-mass lnA is computed as z mass = lnA ln 56 . As mentioned in the introduction, the experimental value of lnA is uncertain. Shown in Fig. 7 is a band, an envelope over optical measurements of the depth X max of shower maximum from several experiments, and converted to lnA based on air shower simulations with EPOS-LHC. We will use this as a rough estimate of the mass composition. The band is independent of the muon measurements here, and therefore can be used as a reference. The z mass value computed from the GSF model is also shown, which is based on optical and muon measurements and averages over experiments and model interpretations of air shower data. The line mostly falls inside the envelope.
If the measured z values follow z mass , the model describes the muon density at the ground consistently. This is overall not the case. The pre-LHC generation of hadronic interaction models, SIBYLL-2.1, QGSJet-II.03, and QGSJet01 [41], show larger muon deficits than the models tuned to LHC data, EPOS-LHC, QGSJet-II.04, and SIBYLL-2.3. EPOS-LHC, QGSJet-II.04, SIBYLL-2.3, and QGSJet01 give a reasonable description of data up to a few 10 16 eV. At higher shower energies, a muon deficit in simulations is observed (z > z mass ) in all models. Shown in Fig. 8 are zoomed plots for EPOS-LHC and QGSJet-II.04, the two latest-generation models with most data points. Shown in Fig. 9 is the difference ∆z = z−z mass . Subtracting z mass is expected to remove the effect of the changing mass composition. An energy-dependent trend in ∆z remains.

Energy-dependent trend
To quantify the observed trend in ∆z as a function of energy, a line-model is fitted to the data shown in Fig. 9,      The value of the slope b and its deviation from zero in standard deviations are of interest. The uncertainty of b scales with the uncertainties of the data. The error bars of most data points are dominated by systematic uncertainties, which are correlated for data points from a single set. Correlated uncertainties are the reason why the points in Fig. 9 do not scatter randomly as much as the error bars suggest. The exact amount of correlation is not known. We work around this problem by repeatedly fitting the data under different correlation assumptions.
We use the least-squares method for correlated uncertainties in the data. The score Q is minimized, where C −1 is the inverse of the covariance matrix C of the measurements. The matrix C is constructed as follows where α is the assumed correlation coefficient for the error bars within a data set. The matrix C has a block-diagonal form.
The fit is performed with MINUIT [42] and repeated for values of α from 0 to 0.95. The HESSE algorithm is used to compute the standard deviation σ b of b. To adjust for over-or underestimated uncertainties in the input, the raw result from MINUIT is corrected with the χ 2 value and the degrees of freedom n dof of the fit, σ b = σ raw b χ 2 /n dof [43]. The slope b and its deviation from zero in standard deviations as a function of the assumed correlation are shown in Fig. 10 for EPOS-LHC and QGSJet-II.04. The deviation from zero is always larger than 8 standard deviations, making the slope highly significant. The result is insensitive to the assumed correlation coefficient.

Summary and outlook
We presented a summary of recent tests and measurements of hadronic interaction properties in air showers with energies from PeV up to tens of EeV. Better agreement between simulation and experiment is found for the electromagnetic than for the muonic shower component.
We put a special focus on muon density measurements in this report. A comprehensive collection of muon measurements is presented. We developed the z-scale as a comparable measure of the muon density between different experiments and analyses. The z-scale uses air shower simulations as a reference to compare muon density measurements taken under different conditions. We demonstrated the importance of cross-calibrating energyscales of experiments and apply it were possible, using the isotropic all-particle flux of cosmic rays as a reference.
After applying the cross-calibration, a remarkably consistent picture is obtained. Muon measurements seem to be consistent with simulations based on the latest generation of hadronic interaction models, EPOS-LHC, QGSJet-II.04, and SIBYLL-2.3, up to about 10 16 eV. At higher energies, a growing muon deficit in the simulations is observed, visible as an increase in z over the expectation. An analog trend is observed in older hadronic interaction models, with a more severe muon deficit. The slope of this increase in z per decade in energy is 0.22 to 0.35 for EPOS-LHC and QGSJet-II.04, with 8 σ significance.
We plan to further study the collected data, looking for other trends in the deviation between simulations and data. This will provide hints which aspects of hadronic interaction properties are the likely cause for these deviations.