Reconstruction and identification of tau leptons in CMS

Tau leptons constitute an important experimental signature for analyses at the CERN LHC related to Higgs boson, Standard Model, and beyond the Standard Model measurements. We describe the algorithm used by the CMS experiment to reconstruct and identify decays of tau leptons into hadrons and a neutrino during Run 1 of the CERN LHC. The performance of the algorithm is studied in proton-proton collisions recorded at a center-of-mass energy of 8 TeV, corresponding to an integrated luminosity of 19.7 fb−1. The algorithm achieves an identification efficiency of typically 50-60%, with misidentification rates for quark and gluon jets, electrons and muons that vary between per mille and percent levels.


Introduction
Tau leptons constitute an important experimental signature for analyses at the CERN LHC related to Higgs boson, Standard Model, and beyond the Standard Model measurements [1][2][3][4][5][6].These analyses benefit from the high reconstruction and identification performances of the CMS tau algorithm [7], whose the description is the main subject of this proceeding.
The tau lepton has a mass of 1.777 GeV and a lifetime of 290.6 fs (cτ ≈ 87μm) [8].The branching fractions for the main tau decay modes are given in Table 1.

Decay channel
Resonance BR (%) τ − → e − νe ν τ 17.8It decays into hadrons in about two-third of the cases, typically into either one or three charged mesons (π or, less frequently, K) and up to two neutral pions.The π 0 decays instantaneously in γγ.The remaining one-third of the case the τ lepton decays into an electron or muon and two neutrinos.The electrons and muons originating from tau decays are reconstructed and identified using standard CMS algorithms [9,10], while the algorithm for the tau reconstruction and identification presented in this paper focuses on "hadronic" tau decays, denoted by τ h .
The main challenge in identifying τ h decays is related to quark and gluon jets that can mimic τ h decays and have a cross section for QCD jet production that exceeds by many orders of magnitude the rate at which tau leptons are produced at the LHC.To mitigate the misidentification of quark and gluon jets as τ h , we exploit some of the tau properties: small track multiplicity, collimation and isolation of the decay products, non-negligible lifetime.The misidentification of electrons or muons as τ h represents another possible source of background and specific identification criteria have been developed in order to reduce this possibility.

The CMS detector
The Compact Muon Solenoid (CMS) [11] is a general-purpose detector which operates at LHC.The natural coordinate frame used to describe the CMS detector geometry is a right-handed cartesian system with its origin at the nominal interaction point, the x-axis pointing radially inward-towards the centre of the LHC, the y-axis pointing up (perpendicular to the LHC plane), and the z-axis along the anticlockwise-beam direction.The polar angle θ is measured from the positive z-axis and the azimuthal angle φ is measured in the transverse (x, y) plane.The pseudorapidity, η, is defined as −ln(tan θ 2 ), the radius r = x 2 + y 2 .The CMS subdetectors, radially installed inside-out, are: tracker, electromagnetic calorimeter (ECAL), hadronic calorimeter (HCAL), muon systems.A detailed description of the CMS detector can be found in [11].

The hadronic tau reconstruction and identification
In this section we describe the hadrons-plus-strips (HPS) algorithm used by CMS to tag τ h .The algorithm follows two main stages: the reconstruction step, in which charged and neutral particles are combined to look for specific τ h decays; and the identification step, in which some discriminators are defined to separate τ h decays from quark and gluon jets, electrons, and muons.

The hadronic tau reconstruction
CMS uses a particle-flow (PF) algorithm [12,13] to provide a global event description.The information available in all CMS subdetectors is employed to reconstruct and identify individual particles: muons, electrons, photons, charged and neutral hadrons.These particles are then used to reconstruct higher-level objects like isolation variables for leptons or the vector imbalance in transverse momentum in the event (whose module will be referred to as E miss T ).Particles reconstructed by the PF algorithm are also used as input to reconstruct jets, using the anti-k t algorithm [14] with a distance parameter R = 0.5.Jets are the seed for the HPS algorithm.
In order to reconstruct the tau decay modes, the HPS algorithm considers charged particles and neutral pions.The charged particles are the constituents of the seeding jet which have a minimum transverse momentum, p T , of 0.5 GeV.The neutral pions, given the high probability for photons from π 0 → γγ to convert to e + e − pairs, are accounted for by clustering the photon and electron constituents of p T > 0.5 GeV of the jet that seeds the tau reconstruction into η − φ "strips".The strip size is 0.05 × 0.20, enlarged in φ-direction to account for the bending of e + e − pairs produced by photon conversions in the 3.8 T magnetic field of CMS.Only strips with p T > 2.5 are considered.
For each jet, charged hadrons and strips, from which the name of the algorithm hadrons-plus-strips originates, are combined to reproduce the tau decays of  ICNFP 2015 that has a branching fraction of 4.8% and suffers huge contamination by jets.Mass window cuts are applied to test the compatibility of each hypothesis with the signatures expected for the different hadronic tau decay modes given in Table 1: • h ± h ∓ h ± : Combination of three charged particles that sum to unit charge and have a mass 0.8 < m τ < 1.5 GeV.
• h ± π 0 π 0 : Combination of a single charged particle with two strips with 0.4 < m τ < 1.2 • p T [GeV]/100 GeV.The size of the mass window is enlarged for τ h candidates of high p T to account for resolution effects.For hadronic tau candidates of p T < 100 GeV (> 1111 GeV) the upper limit of the mass window is set to 1.2 GeV (4.0 GeV).
• h ± π 0 : Combination of one charged particle plus one strip with 0.
GeV.For τ h candidates of p T < 100 GeV (> 1044 GeV) the upper limit of the mass window is set to 1.3 GeV (4.2 GeV).
• h ± : A single charged particle with no strips.
The four-momentum of each τ h candidate hypothesis (p T , η, φ, and mass) is given by the momentum sum of charged particles plus strips that are included in the respective decay mode combination.The two decay modes h ± π 0 and h ± π 0 π 0 are analysed together and referred to as h ± π 0 s.The distribution of tau decay modes reconstructed in Z/γ * → ττ events and of the reconstructed mass of the τ h candidates in those events are shown in Fig. 1.

The hadronic tau identification
Reconstructed τ h candidates are required to satisfy some identification criteria in order to mitigate the background related to quark and gluon jets or due to electrons and muons.
The HPS algorithm has two isolation discriminators: cut-based and MVA-based (where MVA stands for Multivariate Analysis).Cut-based: The isolation of τ h candidates is computed by summing the transverse momenta of charged particles of p T > 0.5 GeV plus photons of E T > 0.5 GeV, reconstructed by the PF algorithm, within an isolation cone of size ΔR = 0.5, centered on the τ h direction.The effect of pileup is reduced by requiring the tracks associated to charged hadrons considered in the isolation p T sum to be compatible with the production vertex of the τ h candidate within a distance of 0.2 cm in direction of the beamline and 0.03 cm in the transverse plane.Charged hadrons used to built the τ h candidate are excluded from the isolation p T -sum, as are photons used to built any of the strips.The effect of pileup on the photon isolation is compensated on statistical basis, by applying so-called Δβ corrections: Two sets of cut-based tau isolation discriminators are computed, by requiring that at least 8 (HPS combined isolation 8-hit) and 3 (HPS combined isolation 3-hit) hits in the pixel plus silicon strip tracking detectors are associated to each track.Loose, medium and tight working-points are defined by requiring the isolation p T -sum defined by Eq. 1 not to exceed thresholds of 2.0, 1.0 and 0.8 GeV respectively.
MVA-based: A boosted decision tree discriminator (BDT) [15] is used to discriminate hadronic tau decays ("signal") from quark and gluon jets ("background").The variables used as inputs to the BDT are: • The charged particle isolation p T -sum p charged T (d Z < 0.2cm) and the neutral particle isolation p T -sum p γ T defined in Eq. 1 as separate inputs.• The reconstructed tau decay mode.
• The transverse impact parameter, d 0 , of the highest p T track of the τ h candidate and its significance, The position of the primary event vertex is refitted after excluding the tracks associated to the τ h candidate.The inputs are augmented by the p T and η of the hadronic tau candidate and by the Δβ correction defined by Eq. 1.The purpose of the p T and η variables is to parametrise possible dependencies of the other input variables on transverse momentum and pseudo-rapidity.The Δβ correction parametrises the dependency on pileup for the neutral particle isolation p T -sum.
The BDT is trained on event samples produced using the Monte-Carlo (MC) simulation.Samples of Z/γ * → ττ, Higgs → ττ, Z → ττ, and W → τν τ events, covering the range 20-2000 GeV in τ h candidate p T , are used for the "signal" category.Reconstructed τ h candidates are required to match hadronic tau decays within ΔR < 0.3 on generator level.W+jets and QCD multi-jet events are used for the "background" category.The τ h candidates that match leptons originating from the W decays are excluded from the training.Half of the available event statistics is used for the training, the other half for evaluating the MVA performance and conducting overtraining checks.
Different working-points, corresponding to different τ h identification efficiencies and jet → τ h misidentification rates, are defined by varying the cut on the MVA output.
The performance of the various isolation discriminators described so far (3-hits, 8 hits, MVA with and without lifetime information) are compared in Fig. 2, in terms of efficiency versus fake-rate (probability of misidentification).The efficiency is computed separately in simulated Z/γ * → ττ (left) and Z → ττ (right).The fake-rate is measured in simulated samples of multijet events.The inclusion of tau lifetime information in the MVA discriminator reduces the jet → τ h fake-rate by about a factor two, for the same tau identification efficiency, compared to the cut based tau isolation discriminator.
Electrons and muons may be reconstructed as τ h in the h decay mode or, in case of electrons that radiate a bremsstrahlung photon which subsequently converts, in the h + π 0 decay mode.Specific discriminators have been developed to mitigate this contamination.
An MVA-based discriminator is used to separate hadronic taus from electrons.The discriminator is built using the energy deposits in the ECAL and HCAL calorimiters as inputs.Track-related variables as the p T , η, or the number of hits in the tracker detector are also used.The discriminant against muons vetoes τ h candidates when signals in the muon system are found near the τ h direction or when the sum of the energies in the ECAL and HCAL corresponds to < 0.2 of the momentum of the leading track of the τ h candidate.

√ s = 8 TeV
The performance of the τ h algorithm is studied in proton-proton collisions recorded at a center-ofmass energy of 8 TeV, corresponding to an integrated luminosity of 19.7 fb −1 .Different samples of events are used to check the τ h reconstruction and identification in data and to validate the rates with which quark and gluon jets, electrons, and muons get misidentified as τ h decays.

Measurement of the hadronic tau identification efficiency
The τ h identification efficiency measurement is performed on a sample of high-purity Z/γ * → ττ in the μτ h final state.The events are selected by pairing an isolated muon with p T > 25 GeV and the highest p T jet with at least 20 GeV, |η| < 2.3 that is separated from the muon by ΔR( (Δφ) 2 + (Δη) 2 ) > 0.5.The muon and the p T leading track of the jet are required to have opposite charge and to be compatible with originating from the same vertex.The transverse mass of the muon and E miss T , m T (μ, E miss T ) = 2p μ T E miss T (1 − cosΔφ), lower than 40 GeV and the topological cut p ζ − 1.85p vis ζ > −15, described in [7], are further demanded.A veto on the presence of b quark jets and extra leptons is asked.No τ h identification criteria are directly applied during the event selection.
The τ h identification efficiency is obtained through a simultaneous fit of the number of Z/γ * → ττ events, N τ pass and N τ f ail , with hadronic taus passing ("pass" region) and failing ("fail" region) the τ h identification discriminant.The τ h identification efficiency is taken as the parameter of interest μ in the fit.The number of Z/γ * → ττ events in the pass and fail regions as a function of μ is given by N τ pass = μN Z/γ * →ττ and N τ f ail = (1 − μ)N Z/γ * →ττ respectively.The multiplicity of tracks within a cone of size ΔR < 0.5 centered on the τ h direction is used to perform the fit.For more details on the fitting procedure, the background contaminations, and the systematic uncertainties estimations we turn to the main article [7].
Figure 3 shows the results of the efficiency measurement as a function of p T for the cut-based (left) and MVA (right) isolation discriminators for different working points, considering both data (solid symbols) and MC simulation (open symbols).The MVA-based isolation performs better, reaching an efficiency of up to 70%.All data-to-MC ratios are compatible with unity within the estimated uncertainties of about 5%.
The efficiency to reconstruct and identify τ h of higher p T in more dense hadronic environments is measured using t t events.A detailed description of this measurement is reported in [7].

Measurement of the hadronic tau energy scale
The energy scale for τ h (referred to as τES) is defined as the average reconstructed τ h energy relative to the generator level energy of the visible τ decay products.MC-to-data τES corrections are determined in Z/γ * → ττ events selected in the μτ h channel as described in Sec.4.1, with the additional requirement that the selected 6jet passes the medium working point of the MVA-based τ h isolation.
The corrections are obtained by fitting the reconstructed mass of the τ h candidate, m τ h , and the mass of muon and τ h candidates, m vis .The m τ h and m vis templates are computed by changing the τ h four-momentum as a function of τES, and recomputing m τ h and m vis after each such change.
The τES is measured separately for the reconstructed τ h decay modes and as a function of p T .The variable m τ h is seen to be the more sensitive observable compared to m vis , as indicated by smaller uncertainties.τES correction is always found to be lower than 2%.There is no indication of a dependence of the measured τES corrections on τ h p T .

Measurement of the rate of jets, electrons, and muons misidentified as hadronic tau
The rate for quark and gluon jets to be misidentified as τ h decays is measured in W+jets and multijet events.W+jets events are selected requiring the presence of a muon of p T > 25 GeV, |η| < 2.1, the m T (μ, E miss T ) > 50 GeV, and at least one jet of p T > 20 GeV, |η| < 2.3 separated from the muon by ΔR > 0.5.Multijet events are selected demanding the presence of two jets with p T > 20 GeV, |η| < 2.3.The misidentification rates measured in W+jets and in multijet events are shown in Fig. 4.  In general, the misidentification rates are higher in W+jets than in multijet events.The difference is due to the higher fraction of quark jets in W+jets events that typically have a lower particle multiplicity and are more collimated than gluon jets.Overall, the jet → τ h misidentification rates vary between low percentages and about 10 −4 , decreasing as a function of jet p T , as the particle multiplicity increases for higher energy jets.MVA-based isolation, which includes τ lifetime information, performs better than cut-based isolation.
Notable differences are observed in the comparison between data and expectations.The rates for jet → τ h measured in data exceed the MC expectation at low p T while the rates measured at high p T fall short of the simulation.The magnitude of the change on data/MC is ≈ 20%.
The fraction of times in which electrons or muons pass the τ h identification criteria of the dedicated discriminants against electrons or muons, are measured through the tag-and-probe technique using Z/γ * → ee and Z/γ * → μμ events.The measured e → τ h (μ → τ h ) fake-rate is on the level of per-mille for a τ h efficiency of 90% (98%).The rates measured in data exceed the MC prediction by up to a factor 1.7 for electrons.The full details of these measurements are reported in [7].

√ s = 13 TeV
Preliminary results related to τ h has been obtained by analysing proton-proton collisions recorded at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 5.6 pb −1 .Z/γ * → ττ → τ h , = e, μ, have been selected by requiring the presence of an isolated electron (muon) with p T > 20 (p T > 18) GeV, |η| <2.1 and an isolated τ h with p T > 20 GeV, |η| <2.3 passing the identification criteria against electrons and muons that have an opposite charge to the other lepton.m T (μ, E miss T ) < 40 GeV is also required.Figure 5 shows the visible mass distribution (left) and fully reconstructed mass using the SVFit [16] algorithm (right).The simulation is normalised to the total integrated luminosity for all contributions except from the QCD background that is estimated by same sign data.A good agreement is observed overall.

Conclusions
The algorithm used by the CMS experiment for the reconstruction and identification of hadronic τ decays in Run 1 data from the LHC have been presented, and their performance validated with protonproton collision data recorded at √ s = 8 TeV, corresponding to an integrated luminosity of 19.

Figure 1 .
Figure 1.Distributions of the reconstructed tau decay mode (left) and of τ h candidate mass (right) in Z/γ * → ττ events selected in data compared to the Monte-Carlo expectation.

Figure 2 .
Figure2.τ h identification efficiency versus fake-rate for HPS cut-based isolation with 3-hits (green) and 8-hits (blue) and MVA-based isolation with (red) and without (pink) tau lifetime information.For a given discriminator, the markers correspond to the different working points.

Figure 3 .
Figure 3. τ h identification efficiency measured in Z/γ * → ττ → μτ h events as function of p T for the cut-based (left) and MVA-based (right) tau isolation discriminants, for data and MC expectation.

Figure 4 .
Figure 4. Fraction of times as a function of jet p T in which quark and gluon jets in W+jets (left) and multijet (right) events pass the cutoff-based and MVA-based τ h isolation discriminant.

Figure 5 .
Figure 5.The visible mass distribution (left) and fully reconstructed mass using the SVFit algorithm (right) obtained by analysing proton-proton collisions recorded at a center-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 5.6 pb −1 .