Measurements of UHECR Mass Composition by Telescope Array

Telescope Array (TA) has recently published results of nearly nine years of $X_{\mathrm{max}}$ observations providing its highest statistics measurement of ultra high energy cosmic ray (UHECR) mass composition to date for energies exceeding $10^{18.2}$ eV. This analysis measured agreement of observed data with results expected for four different single elements. Instead of relying only on the first and second moments of $X_{\mathrm{max}}$ distributions, we employ a morphological test of agreement between data and Monte Carlo to allow for systematic uncertainties in data and in current UHECR hadronic models. Results of this latest analysis and implications of UHECR composition observed by TA are presented. TA can utilize different analysis methods to understand composition as both a crosscheck on results and as a tool to understand systematics affecting $X_{\mathrm{max}}$ measurements. The different analysis efforts underway at TA to understand composition are also discussed.


Introduction
Telescope Array (TA) is a large hybrid cosmic ray observatory located in Millard County, Utah (39.3 • N, 112.9 • W, 1400 m asl) designed to observe ultra high energy cosmic rays with energies in excess of 10 18 eV. The addition of the TA Low Energy Extension (TALE) has extended the minimum observable energy down to 10 15.3 eV.
TA utilizes 507 plastic scintillation counters, also referred to as surface detectors (SDs), placed over 700 km 2 and 36 fluorescence detector (FD) telescopes to measure the energy, depth of air shower maximum (X max ), and arrival direction of UHECRs. Three communications towers are also deployed to allow wireless communications with the SDs to facilitate readout of SD data for cosmic ray events and SD system monitoring. Each SD consists of two layers of plastic scintillator, each with area of 3 m 2 and 1.2 cm thick. Each layer has 104 wavelength shifting fiber optic cables embedded in them which are optically coupled to a photomultiplier tube (PMT). Plastic scintillators are sensitive to charged particles that arrive at ground level as well as γs, which is important in the case of UHECR air showers since ≥ 90 % of the primary particle energy is stored in the electromagnetic component of the shower. Each SD is equipped with an electronics box, GPS antenna, solar panel, battery, and a wireless local area network (WLAN) antenna. SDs are arranged in a grid-like fashion with 1.2 km spacing between adjacent units. SDs digitize signals from the PMTs via 12 bit flash analog-to-digital (FADC) electronics with a 50 MHz sampling rate (20 ns sampling interval). A single waveform is 2.56 µs (128 FADC bins) long and when event readout is ordered by one of the communication towers up to * e-mail: whanlon@cosmic.utah.edu 10 waveforms can be sent. Signals greater than 0.3 minimum ionizing particles (MIP) generate a level-0 trigger, and signals greater than 3.0 MIP generate a level-1 trigger. Communications towers query all SDs each second for detection of level-1 triggers. When three adjacent SDs record a level-1 trigger within 8 µs, a level-2 event trigger is generated. When a level-2 trigger is generated, the communication towers collect all level-0 triggers recorded by the SDs within ±32µs of the event trigger time. The event data is regularly transferred by wireless communications to a central data facility located several kilometers away for offline analysis. Refer to [1] for further details of TA's SD operations.
There are three FD stations at TA located on the periphery of the SD array. Each station points towards the center of the SD array at a central laser facility which is located 21 km away. The northern site, Middle Drum (MD) FD station, employs 14 FD telescopes repurposed from the HiRes experiment, each consisting of a 5.1 m 2 spherical mirror, 16 × 16 PMT cluster, and electronics rack for event readout, trigger logic, and communications with a central timing computer and data acquisition (DAQ) unit to store data [2]. The 14 telescopes are arranged in a two ring configuration, with seven telescopes viewing from 3 • to 17 • in elevation and seven telescopes viewing from 17 • to 31 • . Total azimuthal coverage is 112 • . Each PMT observes 1 millisteradian solid angle of the sky. Also located at the Middle Drum site are 10 FD telescopes used for the TALE detector. At the southwest and southeast corners of the SD array are the Long Ridge (LR) and Black Rock Mesa (BR) FD stations. These stations are comprised of newly built FD telescopes and electronics for the TA experiment. Each consists of 12 telescopes, each utilizing a 6.8 m 2 spherical mirror, a 16×16 PMT cluster camera, and associated electronics and power rack. The twelve mirrors at each of these stations are also arranged in a two ring configuration, with six mirrors in ring 1 observing between 3 • -18 • in elevation, and six mirrors in ring 2 observing between 18 • -33 • . The total azimuthal coverage of each of these stations is 108 • [3,4]. Middle Drum uses sample and hold electronics readout, while Long Ridge and Black Rock Mesa use FADC electronics providing effectively 14 bit FADC samples at a 10 MHz rate. The details of operation and triggering differ between Middle Drum and the southern stations, but in general terms the telescopes look for fluorescence light generated by a UHECR generated air shower. Time coincidence and pattern recognition algorithms are employed by each telescope's electronics to generate mirror triggers, which when directed by a stationwide central computer, are collected to form event triggers which may involve multiple mirrors. Event data is read out from each mirror's electronics rack and stored for later offline analysis. Because the PMTs are operated at very high bias, FDs only operate on moonless nights under favorable weather conditions, thereby limiting their duty cycle to 10%. SDs, on the other hand, operate continuously with 100% duty cycle.
Cosmic ray energy, arrival time, and direction can be measured independently by either the SD array or FDs. FD reconstruction requires measuring a shower-detector plane which is determined by fitting the tube pointing directions and trigger times. This fit provides the distance to the shower for any given pixel which observes its crossing. An inverse Monte Carlo procedure is performed with the fitted geometry to find the air shower profile that results in the best agreement between simulated and observed tube signals. The energy of these air showers can be very well measured because most of an air shower's energy is transferred to an electromagnetic component (e ± , γ), the shower evolution is observed along many degrees of the sky, and calorimetry of electromagnetic air showers is well understood. The largest systematic uncertainty is the atmosphere, which is considered to consist of molecular scattering and aerosol scattering components. For the atmosphere to act as an ideal calorimeter, the temperature and pressure profiles as a function of height would be well known and would be free of aerosols. SD event reconstruction proceeds by using the timing information of FADC waveforms, the known geometry of each SD's placement, and assumptions about the lateral evolution of an UHECR induced air shower to determine the spatialtemporal aspects of the primary particle. Monte Carlo simulations are used to relate the observed signal and reconstructed zenith angle to primary particle energy. This relationship is dependent upon the hadronic model used. Finally an energy correction is made based upon the observed relationship of E FD and E SD , which is determined by measuring the reconstructed energy of the same events observed by FD reconstruction and SD reconstruction respectively. They are related by E FD = E SD /1.27 [5].
To improve reconstructed parameters of the primary cosmic ray, hybrid measurements can be performed, which combines the information of SD and FD measurements for events that are simultaneously viewed by both types of de-tectors. Stereo FD measurements which utilize event data from two or more FD stations to improve UHECR measurements are also possible. For purposes of performing a UHECR composition measurement, which relates the primary particle energy and atmospheric depth of shower maximum (X max ), these improvements to reconstruction are desirable to make precision measurements because the combined power of multiple observations of a shower track greatly improves the resolution in X max . Monocular FD measurements of X max can typically have resolution of 70 g/cm 2 or worse, while hybrid and stereo measurements improve resolution to 20 g/cm 2 or better depending on energy. Invoking the superposition principle, for a given energy bin, X max of an ensemble of air showers generated by a single chemical species with mass number A is related to primary particle mass as X max ∝ D ln(E/A), where E is the primary particle energy and D ≡ d X max /d ln E is the elongation rate. For a fixed energy, shower-to-shower fluctuations in X max are large for a single element. σ(X max ) is roughly proportional to σ p / √ N, where σ p is the standard deviation of the X max distribution of protons, and N is the number of nucleons in the primary particle. Effects such as multiplicity make this estimate more of a lower bound. Mixtures of primary elements exhibit similar relationships based upon ln A . Utilizing these relationships among the first and second moments of X max distributions, observed X max can be used to measure UHECR composition. The relationships between UHECR mass, X max , and σ(X max ) can be summarized as X max and σ(X max ) are larger for light primaries.
The issue of model dependence must also be recognized as a key ingredient in inferring composition from data when compared to simulations. X max and σ(X max ) of hadronic showers are dependent upon parameters such as multiplicity, inelasticity, and cross section, all of which cannot be measured in the lab at the UHECR energy scale (E lab ≥ 10 18 eV). Model uncertainties are a major source of systematic uncertainty in measuring UHECR composition. See [6][7][8] for detailed expositions of theory of UHECR composition measured through observations of air showers. Telescope Array has performed four different measurements of composition: two hybrid measurements, one stereo measurement, and one utilizing only SDs. Section 2 will discuss hybrid and FD measurements, section 3 will discuss the SD measurement of composition, and section 4 will summarize TA's composition results.  and 4 km, respectively). Middle Drum also uses smaller mirrors. These differences reduce the acceptance of Middle Drum compared to Black Rock Mesa and Long Ridge, and the data analysis for the two data sets are carried out independently.

FD and Hybrid Composition Measurements
Hybrid reconstruction is done by searching for coincident events in the SD and FD data streams that occur within a small time window. The timing and geometry of the event from the SD event data is used to constrain the location of the shower core on the ground, which greatly improves the determination of the shower track in the FD shower-detector plane. Using the improved geometry fit, the light profile of the shower is fit using the FD information, providing accurate measurements of energy and X max that are better than monocular FD reconstruction alone. Uncertainties in angular quantities important to reconstruction of the shower track improve to less than a degree, and relative uncertainties in distances improve to less than 1% when performing hybrid reconstruction.
Results of five years of Middle Drum hybrid data have been published in 2015 [9] and extended to seven years of analysis in 2016 [10]. Results and plots presented here are from the seven year analysis covering the period from May 31, 2008 to April 24, 2015, which resulted in 613 events collected. Using the hybrid reconstructed events observed by Middle Drum a pattern recognition algorithm is applied to ensure the shower profile is well behaved and the rise and fall of the shower is in the FD field of view. Events that pass the pattern recognition step have further quality cuts applied: good weather cuts to remove clouds in FD field of view, E > 10 18.4 eV, zenith angle < 58 • , shower core not further than 500 m outside the SD array boundary, SD/FD core difference < 1600 m, geometry χ 2 /DOF < 5. Figure 1a shows observed X max of the seven year Middle Drum analysis. Red, magenta, and blue points show the predictions of QGSJET II-03 proton, nitrogen, and iron. The green box in the figure indicates the system-atic uncertainty on X max for this analysis, calculated to be 16 g/cm 2 . The bias in X max measurements are generally characterized in three parts, total bias, acceptance bias, and reconstruction bias, and are measured using Monte Carlo. Total bias is the difference between reconstructed X max and thrown (true, simulated) X max : β T = X rec max − X thrown max . Reconstruction bias is the mean difference between the reconstructed value of X max and the thrown value of X max for each event that passes all reconstruction cuts: β R = X rec max − X thrown max . Acceptance bias, β A , is the difference between β T and β R . Acceptance bias is typically of order 10 g/cm 2 and represents the shift away from true X max an experiment measures due to events that fail to fully reconstruct either due to the cuts imposed on the analysis or perhaps detector design. Reconstruction bias is typically of order of a few g/cm 2 . Total bias is the sum of β A and β R . These biases are usually measured in the same energy bins in which X max moments are reported, but may also be measured for all energies observed or smaller energy ranges. Unless explicitly stated, in this paper discussion of reconstruction bias, or simply X max bias, refers to β R as described above and does not include contributions from β A . For the Middle Drum data shown in figure 1 the X max reconstruction bias for E > 10 18.4 eV is < 2 g/cm 2 , and resolution is 22 g/cm 2 for this analysis.
TA Middle Drum X max appears to be consistent with a predominantly light composition over the entire energy range shown in the figure. Current generation UHECR observatories have sufficient exposure to collect large event samples. Because X max distributions are naturally skewed, especially in the case of light primaries, the first and second moments of these distributions may obscure some valuable information related to composition, namely if a deep X max tail is present. If there is a significant contribution of light primaries their deeply penetrating nature will be revealed by the presence of a tail in the X max distribution. A better way to test agreement between data and models is to use the full X max distributions instead of simply X max and σ(X max ). Some statistical tests which may be useful are distribution-free, two sample tests such Anderson-Darling, Kolmogorov-Smirnov, and Cramér-von Mises. Traditional χ 2 and maximum likelihood hypothesis testing can also be used to calculate pvalues to test the assumption that observed data is compatible with models. Because of the skewed nature of X max distributions, the Anderson-Darling or Cramér-von Mises test, which are quadratic empirical distribution function (EDF) tests that measure the integrated squared difference between two EDFs, are better suited for tests of compatibility between observed and simulated X max distributions. The Kolmogorov-Smirnov test calculates the supremum of the EDFs under comparison and is less sensitive to differences in the tails of the distributions [11].
To test the compatibility of observed Middle Drum X max with single species simulated using the QGSJET II-03 hadronic model, the Cramér-von Mises test was used. The data distributions were uniformly shifted to find the ∆X max shift which provided the best test statistic, and the p-value was calculated for the statistic. Because one of the distributions was allowed to shift when calculating the test statistic, it is referred to as a shape value, or s-value. The interpretation of the s-value will be the same though, if the s-value is large, then the probability that data and the simulated X max distributions are drawn from the same parent distribution is more likely. Figure 1b shows the X max shift of data required to maximize agreement of the Cramér-von Mises test statistic with Monte Carlo predictions for QGSJET II-03 proton, nitrogen, and iron. Color of the data points indicate the s-value of the test. Colored bands show the range of shifts obtained for the same test for several different models. Large s-values with relatively small shifts are observed for QGSJET II-03 protons indicating Middle Drum X max data is more likely compatible with protons than either nitrogen or iron.
Stereo FD reconstruction is similar to hybrid except SD information is not used. Instead, events that are observed simultaneously by multiple FD stations are utilized. By using the intersecting shower-detector planes to constrain the parameters of the shower track observed by two or possibly three FD stations, the shower geometry can be very well measured. Because of the relatively large separation between FD stations at TA, 30 − 40 km, the stereo aperture is smaller than hybrid at low energies. FD reconstruction allows for larger zenith angle acceptance though, because SD reconstruction is limited to θ < 60 • due to large uncertainties in the shower footprint at such large zenith angles. TA's FD analysis can reconstruct showers with zenith angles up to 80 • . Therefore at high energies the stereo aperture continues to grow because higher energy showers must be inclined sufficiently to ensure X max occurs in the air and not in the ground. These steep zenith angle events are not reconstructed in hybrid analyses. For the stereo analysis results presented here, the minimum accepted energy is 10 18.4 eV. Figure 2 shows the observed stereo X max collected using nine years of data resulting in 1458 events collected, as well as the predictions for QGSJET II-03 protons and iron. The systematic uncertainty of the data is 15 g/cm 2 . X max bias and resolution is 0.9 and 19.2 g/cm 2 respectively for QGSJET II-03 protons, and energy resolution is 5.1% [12]. The observed data is consistent with a predominantly light composition.
TA's highest statistics measure of composition is done using Black Rock Mesa and Long Ridge (BR/LR) hybrid. Events that trigger the BR or LR FD stations are time matched to events that also trigger the SD array. If an event is observed by both FD stations, the shower parameters from the site with the better hybrid shower profile are chosen. Figure 3 shows the observed X max along with predictions of QGSJET II-04 proton, helium, nitrogen, and iron for nearly 9 years of data. Data and Monte Carlo are processed via the same analysis software and the same quality cuts are applied: the event core must be greater than 100 m from the SD array boundary, FD track length 10 • or greater, 11 or more good tubes recorded by FDs, shower-detector plane angle less than 130 • , time extent of the FD track greater than 7 µs, zenith angle less than 55 • , X max must be observed, and weather cuts to ensure atmospheric quality is good. 3330 data events were reconstructed after application of these cuts. The systematic uncertainty on the data is 17 g/cm 2 (black band in the figure). X max bias and resolution are -1.1 and 17.2 g/cm 2 respectively, and energy resolution is 5.7% [13]. Figures 9  and 10 show the observed and simulated X max distributions for each energy bin.
In addition to systematic uncertainty in event reconstruction and detector acceptance, systematic uncertainty in hadronic models is a major source of uncertainty when trying to answer the question of how well data compares to them. This is because hadronic models require knowledge of energy dependent quantities such as multiplicity, inelasticity, and cross section to properly describe air shower evolution. These parameters must be extrapolated from values which are measured at relatively low energy and small pseudorapidity in collider experiments. The Large Hadron Collider (LHC) currently reaches √ s = 13 TeV which is equivalent to 10 17 eV in the lab frame, one to  three orders of magnitude below the typical event energy analyzed in the results presented here. Additionally, collider experiments typically observe processes with high transverse momentum and small pseudorapidity, whereas air shower development is driven by very high energy forward scattering processes. Current experiments at LHC such as LHCf [14] and TOTEM [15] are designed to probe physics in the high pseudorapidity regime. Ulrich, Engel, and Unger examined the effect on various air shower observables such as N max , RMS(X max ), and EM fraction by varying multiplicity, inelasticity, and cross section [8]. They found that RMS(X max ) is mainly dependent on tuning of the cross section parameter and weakly dependent on elasticity. Abbasi and Thomson extended this work to examine the systematic uncertainty of X max for several different hadronic models over a wide range of energies and estimated the uncertainty in X max of QGSJET II-04 to range from σ( X max ) = ±3 g/cm 2 at 10 17 eV to ±18 g/cm 2 at 10 19.5 eV [16]). Figure 4 shows the expected band of systematic uncertainty of TA BR/LR hybrid data, as well as QGSJET II-04 proton and helium predictions. The wide range of systematic uncertainty in the models provides justification for not merely relying upon X max and σ(X max ) to interpret UHECR data, but also testing the entire X max distributions with X max shifting. The BR/LR hybrid analysis also implemented a test which compares the entire X max distribution shapes, as was done for the Middle Drum hybrid analysis. Here the maximum likelihood of the data X max distribution given the expected X max distribution of proton, helium, nitrogen, or iron in each energy bin was computed while allowing the data distribution to shift systematically. The shift which provided the best likelihood value was recorded as well as the p-value of observing a likelihood at least as extreme. Figure 5 shows the results of these tests. The ordinate of the plot indicates the probability after systematic shifting of observing a likelihood at least as large as that observed in the data given that the distribution was generated from one of the pure chemical elements under test here. Small p-values indicate disagreement with data and the model and is deemed incompatible with the data. Large p-values QGSJet II-04 proton sys.
QGSJet II-04 helium sys. Figure 4: TA BR/LR hybrid X max compared to systematic uncertainty of the QGSJET II-04 model predictions for proton and helium primary sources. Model systematics are estimated from [16].
indicate that the model cannot be rejected as being compatible with the data. The color of each point indicates the amount of systematic shifting that was required to measure the observed p-value. In the last energy bin for example, iron is shown to be compatible with data because of the large p-value obtained, but a 60 g/cm 2 shift is required. This shift is much larger than the systematic uncertainty of this analysis, and therefore is should be considered skeptically. As figures 9 and 10 show the X max distributions obtained in the last few high energy bins show a marked suppression in the deep X max tail. This lack of the tail feature, which can act as a discriminator of light elements in the X max distribution, allows for agreement of the data with elements that lack this signature feature. If the deep X max tail is missing in the data distributions one must carefully assess that it is missing due to an astrophysical feature in the spectrum, or it is missing due to acceptance of the detector. As energy grows X max increases and the possibility increases that X max can occur outside the field of view of the detector, perhaps even in the ground for near vertical showers. Given the relatively small exposure TA has for E > 10 19 eV, we cannot reliably say either way yet. Given that starting at 10 19 eV our maximum likelihood tests result in simultaneous agreement with multiple single species composition models, some with very large differences in mass, we believe that we lack the statistical power to draw firm conclusions about composition in this energy range. The conclusions we draw from figures 3 and 5 is that BR/LR hybrid data is consistent with a predominantly light composition below 10 19 eV at the 95% confidence level, and more events must be collected above that energy to draw further conclusions. These results do not imply that UHECR composition is monospecific. Our tests were designed to be simple with few free parameters. Figure 6 shows the BR/LR hybrid data and Monte Carlo X max distributions after the maximum likelihood procedure has been performed in one energy bin. Even though figure 3 indicates X max of data and QGSJET II-04 protons differ, this figure does not account for the systematic uncertainties in the data or in the model. After shift- ing, figure 6 shows that the shapes of the data and proton distributions agree quite well, especially in the tail of the distributions. The same figure shows that even when shifting the same data distribution to maximize the likelihood function for QGSJET II-04 helium, the tails of the distribution don't agree as well as seen for protons. This disagreement leads to a smaller probability (the p-value) of observing the same likelihood recorded for the data given the parent distribution is actually helium. The shapes of QGSJET II-04 nitrogen and iron have even worse agreement with the shape of the data X max distribution even after relatively large X max shifts are applied.
If we compare the data of all TA FD-based analyses, they look consistent within systematic uncertainties. Figure 7a shows the observed data of Middle Drum hybrid, TA FD stereo, and BR/LR hybrid, along with predictions of BR/LR hybrid for four chemical elements. Figure 7b shows σ(X max ) for the same measurements as well as the BR/LR hybrid predictions of QGSJET II-04 proton and helium. We conclude that all three of these analyses are consistent with a predominantly light composition.

SD Composition Measurement
The strength of hybrid analysis is that it actually observes the depth of X max , along with recording the development of the shower before and after, resulting in a very reliable measurement of X max if the shower geometry is sufficiently constrained. Methods that measure X max via air fluorescence have a low duty because of the requirement to run during clear, moonless nights. SDs do not suffer from this requirement, providing 100% on time. If a method can be developed to measure UHECR mass by SD array, then the gain in statistical power will be enormous. TA has undertaken such an analysis which attempts to classify the mass of the primary on an event-by-event basis using a multivariate boosted decision tree technique. This analysis uses 14 composition sensitive variables collected by SD event analysis to assign a continuous value, ξ ∈ [−1 : 1], which classifies an event as pure signal, ξ = 1, pure background, ξ = −1, or some value in between the two. In this analysis, background is based upon pure QGSJET II-03 proton composition, and signal is pure QGSJET II-03 iron composition. The variables used in the multivariate analysis (MVA), such as area-over-peak of the SD waveforms at 1200 m, shower front curvature parameter, signal asymmetry in SD upper and lower layers, are used to train a boosted decision tree (BDT) classifier using proton and iron Monte Carlo. The classifier is then applied to the data to calculate ξ for each event. Using this information ln A is calculated for the data. Figure 8 shows the observed ln A using the MVA technique, as well as the predicted ln A using the BR/LR hybrid data. Using nine years of SD data, this analysis collects 18007 events for E ≥ 10 18 eV, about six times the statistical power of the BR/LR hybrid analysis. The figure indicates that the SD analysis finds TA data compatible with a predominantly light composition for all energies above 10 18 eV. Under the assumption of a flat prior probability of an equally weighted four component mixture of QGSJET II-03 proton, helium, nitrogen, and iron (fraction = 0.25 for each element), the calculated systematic uncertainty of this analysis is δ ln A = 0.36. See [17] for further details about this analysis.

Summary
Telescope Array has been operating for ten years and has completed composition analyses using four quasiindependent techniques: two hybrid analyses using the SD array in conjunction with the Middle Drum FD station and the combined Black Rock Mesa and Long Ridge FD stations, stereo FD analysis using events observed only by two or more FD stations, and a SD-only analysis using machine learning algorithms. All four analyses observe UHECR composition consistent with a predominantly light elements for E 10 18 eV. Our highest statistics measurement of X max , BR/LR hybrid, has limited statistical power above 10 19 eV, and we cannot draw reliable conclusions about composition in this energy range. Comparison of X max and σ(X max ) of the three hybrid analyses show very consistent results over all energies.   Because TA has sufficient exposure below 10 19 eV to make meaningful measurements of the shapes of the X max distributions we observe, and model dependence and systematic uncertainty of hadronic models is an issue for UHECR composition, we have developed methods to test our data distributions against simulations, not just relying on the first and second moments. These methods leverage the additional information that comes from using the entire set of data in a given energy bin, by calculating the probability of observing our data given that it was generated according to some model. These are non-parametric, distribution free tests which test the shapes of distributions in question and allow for systematic uncertainty in the data or models. These tests allow us to empirically measure the agreement of data with simulations at whatever confidence level we choose.
The SD-only analysis using MVA is a promising measurement which does not use the traditional method of observing X max to infer composition, but has vastly greater statistical power. It also measures light composition and compares well with the predicted ln A of the BR/LR hybrid analysis.
TA will continue collect data with the planned TAx4 upgrade already underway. This will improve TA's exposure above 10 19 eV, providing needed statistics to allow us to firm up our measurements in this energy range [18].