Ultra-high-energy cosmic rays mass composition studies with the Telescope Array Surface Detector data

The results on ultra-high energy cosmic rays’ chemical composition based on the data from the Telescope Array surface detector are reported. The analysis is based the boosted decision tree (BDT) multivariate analysis built upon 14 observables related to both the properties of the shower front and the lateral distribution function. The multivariate classifier is trained with MonteCarlo sets: proton-induced, which is considered as background events, and ironinduced, considered as signal events. The classifier results in a single variable ξ for data and Monte-Carlo sets, available for one-dimensional analysis. The data to Monte-Carlo comparison results in an average atomic mass of UHECR for energy range 1018.0 − 1020.0 eV. The average atomic mass of primary particles corresponds to 〈ln A〉 = 1.52±0.08(stat.)±0.1(syst.). The comparison with TA hybrid composition results and the other experiments is presented.


Introduction
Ultra-high-energy cosmic rays (UHECRs) are particles and nuclei of energies more than 10 18 eV entering the Earth atmosphere.While UHECRs are registered for many years, their origin remains a puzzle for physicists.Knowledge of the UHECR mass composition is crucial for understanding the source mechanism and propagation of cosmic rays.
UHECRs can't be observed directly due to it's low flux.Particles with corresponding energies interact with the atmosphere, causing an extensive cascade of the secondary particles -so-called extensive air shower (EAS).Two large-scale EAS facilities: Pierre Auger Observatory [1] at Southern hemisphere and Telescope Array [2] at Northern hemisphere operate in the hybrid mode, measuring both the particle flux on the ground with the surface detectors and the emitted fluorescence light with the fluorescence telescopes.
The Telescope Array is an experiment designed for observation of extensive air showers from high energy cosmic rays, located in Utah, USA.The Telescope Array surface detector (SD) array consists of 507 detector units, placed in a square grid with 1.2 km spacing with total area covered approximately 700 km 2 .Each detector has two layers of 1.2 cm thick plastic scintillator of 3 m 2 area each.
In this work, a method of composition studies with the Telescope Array surface detector (SD) data is suggested.A common way to obtain UHECR composition is the calculation of the X max , the average value of the atmospheric depth where the shower development reaches its maximum, which requires operation of fluorescence telescopes.The main advantage of SD data usage is that surface detectors obtain much more data than fluorescense telescopes, which can operate only on clear moonless nights (10 % duty cycle).Surface detectors, on the contrary, operate full duty cycle.

Multivariate analysis
Multivatiate analysis is a common name for a variety of statistical techniques, used to analyze data described by more than one variable.Given a set of observables, MVA transforms it into a single variable, usually called ξ, which then allows to apply conventional one-dimensional techniques.
In case of the EAS data, during the reconstruction procedure a number of observables is obtained, some of which are known to be composition-sensitive.After applying the MVA method, ξ disitribution for data may be compared with the corresponding distributions for Monte-Carlo (MC) event sets and as a result, average atomic mass ln A as a function of energy is derived.

Boosted Decision Trees
The Boosted Decision Trees (BDT) technique is used to build a p-Fe classifier based on multiple observables [3].The BDT classifier works as follows: 1.For each variable a splitting value with best separation is found.This value divides the full range of the values of the variable into two ranges, which are called branches.It will be mostly signal in one branch, and mostly background in another.The classifier is trained using the proton MC and iron MC training sets as background and signal respectively; 2. Then the algorithm is repeated recursively on each branch.It may use a new variable or reuse the same one; 3. The decision tree will iterate until the stopping criterion is reached (for example, number of events in a branch).The terminal node is called a leaf.
The concept of boosting allows one to create a good classifier using a number of weak ones.In this work, "adaptive boosting", or AdaBoost, was implemented [4].In AdaBoost, a weak learner is run multiple times on the training data, and each event is weighted by how incorrectly it was classified.An improved tree with reweighted events may be built now, and as a result, averaging over all trees allows to create a better classifier.
The classifier derives a single variable ξ for each event from the data and from the MC sets.This variable ξ is the main parameter which is subsequently used to distinguish between primaries.

Composition-sensitive variables
A set of fourteen composition-sensitive variables is used: 1. Linsley front curvature parameter, a.
Joint fit of lateral distribution function (LDF) and shower front is performed with 7 free parameters: x core , y core , θ, φ, S 800 , t 0 , a [6]: where t plane is a shower plane delay, S 800 is a scintillator signal density at 800 m core distance and a is the Linsley front curvature parameter.
Given a time resolved signal from a surface station, one may calculate it's peak value and area.
AoP (r) is fitted with a linear fit: where r 0 = 1200 m, α is AoP (r) value at 1200 m and β is it's slope.10.Asymmetry of the signal at the upper and lower layers of detectors.
11.Total number of peaks within all FADC traces.
12. Number of peaks for the detector with the largest signal.
13. Number of peaks present in the upper layer and not in the lower.
14. Number of peaks present in the lower layer and not in the upper.
In addition, zenith angle θ and S 800 are included in the data set.

ξ parameter conversion to ln A
After applying the BDT method, ξ parameter distribution is derived for proton and iron MC and for the data in each energy bin.The range between ln A = 0 (proton) and ln A = 4.02 (iron) is divided into 40 equal parts and at every point a "mixture" of protons and iron (e.g. 5 % p and 95 % Fe) is created.
The Kolmogorov-Smirnov (KS) distance between ξ parameter distribution of the each "mixture" and data is performed, and the case with the smallest KS-distance is chosen, thus allowing to determine ln A at a particular energy bin.
ln A is estimated as: where p is a fraction of protons in the mixture.

Data set and simulations
In this work, 9-year data collected by the Telescope Array surface detector is used (2008-05-11 -2017-05-10).Following cuts are applied: 1. event includes 7 or more triggered stations; 2. zenith angle is below 45 • ; 3. reconstructed core position inside the array with the distance of at least 1200 m from the edge of the array; 4. χ 2 /d.o.f. doesn't exceed 4 for both the geometry and the LDF fits; 5. χ 2 /d.o.f. doesn't exceed 5 for the joint geometry and LDF fit.
6. an arrival direction is reconstructed with accuracy less than 5 • ; 7. fractional uncertainty of the S 800 is less than 25 %.
After the cuts, the data set contains 18077 events.The analysis uses two Monte-Carlo event sets modeled for the primary p and Fe with CORSIKA and QGSJETII-03 hadronic model.The dethinning procedure [10]  reproduce the shower fluctuations while using thinning procedure [11] to reduce required computing resources.MC sets are split into three equal parts: (I) for training the classifier, (II) for MVA estimator calculation, (III) for the estimation of systematic uncertainty [12].

Results
Figure 1 shows ξ parameter distribution histograms for all the energy bins.In the Figure 2 the average atomic mass ln A is shown in comparison with the Telescope array hybrid results [13].In the Figure 3 the average atomic mass ln A is shown in comparison with the Pierre Auger Observatory surface detector results, based on muon X max and risetime asymmetry calculations, respectively [14].
The mass composition obtained is qualitatively consistent with both the Telescope Array fluorescence detector and the Pierre Auger Observatory results.Within the sensitivity of the method, no energy dependence may be seen in the chemical composition.If one assumes steady composition and the QGSJETII-03 hadronic model, the average atomic mass of primary particles corresponds to ln A = 1.52 ± 0.08(stat.)± 0.1(syst.).

Figure 1 .
Figure 1.ξ parameter distribution for different energy bins.Proton MC is shown with red lines, iron MC is shown with blue lines and black dots represent the data.

4. Number of detectors hit. 5 . 9 .
Number of detectors excluded from the fit of the shower front by the reconstruction procedure [8].6. χ 2 /d.o.f. 7. and 8. S b = S i × r b parameter [9] is the signal of i-th detector, r i is the distance from the shower core to this station in meters and r 0 = 1000 m -reference distance.The two S b for b = 3 and b = 4.5 are used in the analysis.The sum of the signals of all the detectors of the event.

Figure 2 .
Figure 2. Average atomic mass ln A in comparison with the Telescope Array hybrid results [13].

Figure 3 .
Figure 3. Average atomic mass ln A in comparison with the Pierre Auger Observatory surface detector results, based on muon X max and risetime asymmetry calculations[14].