Studies of the tt̄H production at 13 TeV

Abstract. The latest results of searches for the standard model Higgs boson produced in association with a top quark-antiquark pair (tt̄H), where Higgs decays into photons, bottom quark-antiquark pair or leptons via WW, ZZ and ττ are presented. The analyses have been performed using the 13 TeV pp collisions data recorded by the CMS experiment in 2015 and part of 2016. The results are presented in the form of the best fit to the signal strength (μ = σ/σS M) measured with respect to the Standard Model prediction and its expected and observed 95% CL upper limits.


Introduction
Following the discovery of a Higgs boson at the LHC, exploring its properties such as production and decay rates, as well as its couplings with other Standard Model (SM) particles has become the main focus of interest. Higgs boson mass (m H ) has been measured in both experiments CMS and ATLAS and amounts to 125.09 GeV [1]. Out of the four main production mechanisms (gluon-gluon fusion -ggH, vector boson fusion -VBF, Higgs associated to a vector boson -VH and Higgs in association with a tt pair -ttH), this search focuses on the process which involves coupling of the Higgs boson to fermions (ttH). The final signal strength measurement is obtained combining the results of measurements in several Higgs decay channels: H → bb (which occurs in approximately 58% of the cases), H → WW * (in 20% of the cases), and H → τ + τ − , ZZ and γγ (in less than 10%). Direct measurement of top Yukawa coupling is performed in channels where Higgs is associated with tt pair or a single top quark. Since this particular channel has a small cross section comparing to gluon-gluon fusion, where this coupling cannot be measured directly, one can use the results obtained with direct measurement to put constrains on contributions from Beyond the Standard Model (BSM) processes.

Previous results and motivation
The analysis of the ttH production has been already performed with LHC Run I data in both experiments (CMS and ATLAS), and the relevant signal strength measurements are listed in the Tab. 1.
Comparing the signal strength measurements obtained for each of the Higgs production mechanisms with Run I CMS and ATLAS data [2], one can see that ttH signal strength has the largest uncertainties comparing to the other production modes, mostly due to the low cross section of the ttH process. First 13 TeV combined measurement of the signal strength and its 95% confidence level upper limit obtained with 2015 CMS data is shown on the Fig. 2 Figure 1. Feynman diagrams showing the ttH → bb decay that includes subsequent decays of the top quarkantiquark pair in the lepton+jets channel (left), the ttH with the Higgs boson decaying into a pair of photons, i.e. ttH → γγ (middle) and one possible Feynman diagram for ttH production, where the Higgs boson decays to WW * , producing the final state with four leptons (right).  Figure 2. Combined signal strength measurement of the ttH production process, µ ttH , via decays of the Higgs boson to bb [3], γγ (ttH-tag categories only) [5] and multilepton [7] using 2.3 − 2.7fb −1 of data collected at √ s = 13 TeV (left). Corresponding 95% confidence level upper limits on the rate of ttH production relative to the SM expectation, µ ttH = 1 (right).

ttH H → bb
The strategy of the ttH with H → bb analysis is based on selection of events compatible with H → bb and one of the tt decays: lepton+jets or dilepton. Selected events are then categorized by number of jets and number of b-tags into 13 distinct categories with different background compositions ( Fig. 3 and 4). The next step is to build discriminators for each of the categories and for this pur-pose two different approaches are used: physics motivated matrix element method (MEM) and a machine learning method based on the boosted decision tree algorithm (BDT). The main reason why the MEM is used is that some of the hadronically decaying background components, such as tt + bb, are difficult to distinguish from the ttH → bb signal. Both methods are used in one of the following ways: pure BDT, MEM included as an input to the BDT, and two-dimensional MEM and BDT. For each of the categories, one of the approaches is used according to the best performance: BDT-only for ≥ 6 jets, 2 b-tags (lepton+jets) and all dilepton categories, 2D MEM-BDT for ≥ 4 jets, 4 b-tags, ≥ 5 jets, ≥ 4 b-tags and ≥ 6 jets, ≥ 4b-tags and BDT including MEM for the rest of the categories.
On the Fig. 5, one can see what the shape of a discriminator looks like in one of the categories (e.g. lepton+jets category having 5 jets and 3 b-tags). A good agreement between distributions of the prediction of signal plus various background components and the data is achieved. Such distributions are used to extract the signal with a maximum likelihood fit combining data in all the categories.   The results of the ttH with H → bb analysis with first 13 TeV data are presented in form of the measured signal strength for each of the categories and their combination (Fig. 6, left) and the corresponding 95% C.L. upper limits, both expected and observed (Fig. 6, right). The relevant numbers are listed in the Tab. 2.    [4], which is ability to measure diphoton mass with good resolution. Main backgrounds here are tt + γγ and tt + fake photons. Main event categories are chosen according to tt decay: semi leptonic and fully leptonic (both merged into one leptonic category) and hadronic tt decays. The ttH tagger performs with high efficiencies in both hadronic (∼ 77%) and leptonic (∼ 95%) category [6].
Signal is extracted using a parametrized model of the Higgs boson mass shape obtained from simulation, while the combined signal and background models for these two categories are shown on the Fig. 8. The model used to describe the background is extracted from data with the discrete profiling method, which is designed to estimate the systematic uncertainty associated with choosing a particular analytic function to fit the background distribution. The method treats the choice of the background function as a discrete parameter in the likelihood fit to the data. The resulting systematic uncertainty is then calculated in an analogous way to systematic uncertainties associated with other contributions [6].  The results of the ttH production measurements performed in H → γγ are summarized in plots shown on Fig. 9.

ttH → multilepton
The third complementary analysis channel is the ttH decaying to leptons [7,8]. The strategy of ttH → multilepton is to select events compatible with coupled tt and Higgs decaying to a pair of vector bosons (WW * or ZZ * ), or a pair of τ leptons, producing four, three or two leptons with same sign of the charge in the final state, respectively. Dominant backgrounds are tt+jets (misidentified as prompt leptons) and ttV. Selection of the signal events is done applying a certain set of criteria, such as: trigger and kinematic requirements, linear discriminant based on MET, charge quality, Z-veto and conversion veto, as well as the jet and b-tag multiplicity requirements.
Multivariate method for lepton identification (lepton MVA) is used to identify the leptons from tt and Higgs decays (i.e. prompt leptons), and final 2D MVA is used for signal extraction. 2D BDT discriminant is obtained using simulated ttH trained against two main background components: tt and    ttV. Two main event categories are two lepton same sign (2lss) and at least three leptons (≥ 3l). In order to gain sensitivity of the analysis, an additional categorization is performed by lepton flavour (broken down to ee, eµ and µµ in 2lss final state), presence of 2 b-tags ("b-tight" representing the category that has two b-jets in the final state and the complementary "b-loose" category), presence of reconstructed hadronic τ leptons (in 2lss) and the total lepton charge (motivated by the charge asymmetry present in several background processes), which ultimately amounts to 15 distinct categories in total.
Signal extraction is performed using final discriminators obtained with 2D MVA approach. The MVA outputs, separately trained against tt and ttV are used to build two dimensional BDT discriminator, which is further split into bins according to minimal uncertainty on the signal strength. The MVA takes into account several kinematic variables (such as maximum pseudorapidity of the leading and sub-leading leptons), jet multiplicity, angular separation between lepton and jet and between jets, and missing transverse energy.
In addition to that, MEM approach has been recently introduced, in order to obtain better discrimination power between ttH signal and irreducible ttW and ttZ backgrounds in category with at least three leptons. 2D MVA has been performed separately in 2lss and ≥ 3l, postfit plots (Figs. 10 and 11) shown good agreement between the data and the MC distributions. The final results are obtained performing simultaneous fit on the binned BDT output distributions. The final results obtained in ttH to multilepton are presented in Fig. 12 and the corresponding numbers are listed in Tab

Summary
Analysis has been performed in order to measure ttH production rate at the center of mass energy of 13 TeV, targeting ttH, H → bb with 2015 CMS data only, and ttH, H → γγ and ttH → multilepton with 2015 and part of 2016 CMS data. The results, both separate and combined, are compatible with the SM predictions within the present uncertainties. The goal of Run II is to perform top Yukawa coupling measurement, which requires collecting higher integral luminosity data. Excellent operation conditions of both LHC and CMS, followed by an increasing trend in data taking have been preserved during 2016 with the objective to collect more than 30 fb −1 by the end of the year.