PDFs and LHC data : current and future constraints

The LHC has released an enormous amount of precise measurements that provide a unique constraint on Parton Distribution Functions (PDFs). In this contribution a short overview on our current knowledge of the the structure of the proton is given and the effect of the inclusion of the LHC data in PDF fits is illustrated. In particular, the current measurements of the transverse momentum of the electroweak vector bosons by ATLAS and CMS, and of the forward W production by LHCb, are analysed and their impact assessed.


Introduction
All theoretical predictions at hadron colliders rely on the idea of collinear factorisation.In particular, for most inclusive observables measured at the LHC, the total cross section can be written as a convolution of parton distribution functions (PDFs), which parametrise the proton in terms of its elementary constituents, and the partonic cross section describing their interaction.While the short range dynamics of a proton's constituent particles can be described by means of perturbative QCD, an understanding of the low energy behaviour cannot be obtained through perturbative methods.PDFs parametrise the unknown nonperturbative dynamics of the proton.Due to their universality and the fact that their dependence on scale is predicted by perturbative QCD via the DGLAP evolution equations, PDFs may be determined from available experimental data.They are then applied in the calculation of predictions for other experiments, therefore making QCD a predictive theory at hadron colliders, which can be tested via comparison to data.
Parton Distribution Functions are an essential ingredient of the LHC phenomenology.For several key processes, their uncertainty is comparable or larger than the uncertainty due to missing higher orders in the perturbative computation of short-distance cross sections.This is the case for most Higgs production channels [1].The impact of PDF uncertainty is especially important when considering BSM heavy particle production, see for instance Ref. [2].Therefore, in order to precisely characterise the Higgs boson and to extend the reach of new physics searches, it is crucial to have reliable PDF determinations and to devise strategies to reduce their uncertainties.In this respect, thanks to the huge amount of precise measurements, the LHC has the potential to provide the most in depth information on parton densities to date.a e-mail: ar578@cam.ac.uk b e-mail: ubiali@hep.phy.cam.ac.uk,speaker In this contribution, we first provide a personal overview on the milestones that have been reached in recent years in determining PDFs.We then review the challenges that the PDF fitting community is facing due to the precision requirements of the LHC.We discuss the LHC data and the effect of their inclusion in recent PDF fits.Finally, we investigate the potential of the inclusion of vector boson transverse momentum and forward rapidity distribution measurements in future parton determinations.For more detailed reviews, see Refs.[3,4].

State of the art
The accurate determination of parton densities in the proton is an ongoing effort, with several groups providing competing sets of parton distributions.The most recent PDF sets from these collaborations are ABM12 [5], CT10NNLO [7], HERAPDF1.5 [6], MSTW08 [8] and NNPDF3.0[9].For a recent benchmark between different PDF sets see Ref. [11] In the past decade, huge progress has been made in this field.Up to fifteen years ago, only the central values of PDFs were determined to verify the consistency of the collinear factorisation framework.With the increased precision of the experimental data and the progress made in the calculation of the hard cross sections, a careful estimate of the uncertainty associated to PDFs became essential.In particular, it was soon realised that the parametrisation of PDFs in the early fits was over-restrictive, resulting in unrealistically small uncertainty bands in regions that were not constrained by data.Significant progress has been made in this direction thanks to the unbiased PDF parametrisation proposed by the NNPDF collaboration [12] and to a number of insightful analyses performed in the context of traditional polynomial parametrisation [13,14].A minimally-biased parametrisation and a solid understanding of the statistical aspects of the fitting methodology are crucial to gain confidence in the robustness of the results of PDF analyses.The recent NNPDF3.0 set [9] is the first set to be fully determined from a methodology validated by closure test.The latter was suggested [15] as a way to ensure that PDFs determined from pseudo data generated from a known underlying law reproduce the statistical distribution of results expected on the basis of the assumed experimental uncertainties.A study pointing in the same direction was performed by the MSTW collaboration [16].
On the theory side, recent developments include: the combination of QED and QCD corrections for the DGLAP evolution in a PDF fit, see NNPDF2.3qed set [21] that presents a combined (N)NLO QCD and LO QED fit of the photon PDF based on LHC data.Moreover, whereas until recently only electroweak vector boson production had been computed at NNLO in QCD [22,23], now the full top quark cross section [25] and the Higgs plus jet calculations [24] have been completed, suggesting that the NNLO computation of vector boson plus jet may soon become available.Moreover the NNLO corrections to jet production have been computed in the gluon-gluon channel [26].Even if the full calculation is not yet available, the recent comparison [28] between the exact predictions for the available gg channel with an approximate prediction based on threshold resummation [27] is helpful to determine the kinematic regions in which the threshold approximation -used to include jet data in many NNLO PDF fits -is reliable.A related methodological improvement is the possibility of calculating collider observables in PDF fits without any approximation based on local K-factors, thanks to fast interfaces to NLO calculations, such as AP- PLgrid [17], FASTnlo [18,19] and MCfast [20].These tools allow for improved theoretical accuracy in parton fits, and there are proposals to extend the interfaces to NNLO calculations.
The treatment of heavy quarks has also been studied by various groups for a number of years.Several socalled General Mass Variable Flavor Number (GM-VFN) schemes have been devised.The latter combine powersuppressed heavy quark mass terms, which are relevant in the vicinity of heavy quark thresholds, with the resummation of the collinear logarithms that dominate at high energies.There is a vast literature on this topic, see for instance Ref [31] for a detailed explanation and a benchmark between schemes.The effects of the treatment of heavy quarks and other aspects entering PDF determinations, such as the use of nuclear corrections and the inclusion of higher twist terms, has been investigated in two recent analyses [32,33].These studies indicate that the Fixed Flavor Number (FFN) schemes lead to softer largex gluons and harder quarks as compared to PDFs determined in a GM-VFN scheme, suggesting that the use of FFN versus GM-VFN scheme might explain the difference between the ABM and the other global PDF analyses.Finally, the role of the value of parametrical value of the heavy quark masses used in PDF fits has been discussed by several collaborations [34][35][36].The running charm mass m c (m c ) can be extracted from the combined HERA dataset by using the measurements of the heavy quark structure functions.To conclude, the value of other parameters en-tering the fit, in particular the value of the strong coupling constant α s (M z ), has been also investigated [31].It turns out that the choice of α s induces an uncertainty correlated to the PDF uncertainty that must be accounted for, especially for observables predominantly initiated by the gluon PDF, see for instance Refs.[37,38].
The excellent performance of the LHC demands an even higher level of precision in the determination of PDFs.There are several precision frontiers that will be challenged in the coming years.First of all, the consistent inclusion of the electroweak corrections in fits including photon PDF.It is well-known that such corrections can be significant at large values of the invariant mass of produced particles [39].Therefore, in view of the second run of the LHC, they can no longer be neglected.The pure electroweak corrections have been included in the NNPDF3.0fit.Their combination with the QED corrections is one of the next frontiers.Another aspect that it is worth exploring is the impact of small-x and threshold resummations in PDF fits.Their inclusion may be a key to decrease the theoretical errors in parton determinations especially in the extremely small-and large-x regions that are going to be explored in the second run of the LHC.Finally, up until now, PDF error bands have only reflected the uncertainty of experimental data, while the uncertainty associated with the finite precision of theoretical predictions in the fit has never been assessed.A consistent definition of the theoretical error associated with PDFs would be crucial step forward.Before turning to discuss the constraints from the LHC data, it is worth mentioning two important experimental issues that have been extensively debated: namely the treatment of correlated systematics and the way of dealing with inconsistent data.As far as correlations are concerned, the importance of correctly including the normalisation error and all correlated systematics and the cross correlation among experiments was outlined and backed up in several studies [3,40].As for the treatment of inconsistent data, this is a topic still under discussion.With more data to be included it is a challenge to define the quantity of "tolerance" needed to account for inconsistencies, or define a maximal set of consistent data.On the other hand, the choice of data that enters a global PDF analysis has a significant impact on the resulting parton densities.Various definitions of a conservative PDF set have been advocated in literature, such as the NNPDF2.3collideronly fit [44] or the MRST2004 conservative partons [41].The former is based on the assumption that collider data are more robust than the fixed-target data, and the latter excludes datasets that may be affected by large nonperturbative corrections.An alternative definition of conservative partons, which is free from the theoretical bias affecting the aforementioned sets, was proposed in the latest NNPDF paper [9,10].There, a criterion based on Bayesian reweighing [42,43] is devised to build up a set of maximally consistent data and the implications on LHC observables are explored.

Current constraints from the LHC
The precise Deep Inelastic Scattering data taken at the HERA collider constitute the backbone of any modern PDF analysis.Recently, on top of the combined HERA-I dataset, several HERA-II inclusive cross section measurements have been released [45].Further information will be provided by the combined HERA I+II data, thanks to the reduction of systematic uncertainties, due to the crosscalibration among different datasets.On top of the HERA data, global PDF determinations still rely on relatively old fixed-target DIS and DY data with nuclear target to have a handle on the isospin triplet combination and on the strangeness.Data from Tevatron on jet and electroweak boson production do also help in constraining the gluon at large-x and several light quark combinations at medium-x respectively.
The LHC measurements have already started bringing in new information on PDFs.With the higher statistics analyses the quantity of information provided by the early data is rapidly increasing and in coming years it is expected to continue to grow as more processes are studied.Traditional measurements such as jet production and electroweak boson production are now available in a wider range of x and Q 2 and with larger precision, including full breakdowns on experimental systematic uncertainties.There are processes such as vector boson associated with heavy quark and isolated photons that have been made available for PDF fitting only by the LHC experimental collaborations.Additionally, the LHC experiments have released high and low mass Drell-Yan production and precise measurements of jets associated with vector bosons.The first PDF set to include LHC data was NNPDF2.3 [44].A much wider set of LHC observables has been included in the NNPDF3.0set [9].In Fig. 1 a comparison of the kinematic coverage of the new LHC data compared to that of the data included in the previous analysis is shown.New LHC data include: the ATLAS high-mass Drell-Yan production data [49], based on an integrated luminosity of 4.9 fb −1 , which carries important  [9].Comparison of the down (top row) and gluon (bottom row) NNPDF3.0NNLO PDFs at Q = 100 GeV to PDFs obtained using the same dataset, but with all LHC data excluded information on the large-x quark-antiquark separation; the transverse momentum distribution of W bosons [62] measured by ATLAS, which provides a complementary constraint on the gluon in the medium-x region; the CMS charged W muon asymmetry based on the full statistics (5 fb −1 ) of the 7 TeV run [50], which displays substantially reduced statistical and systematic uncertainties with respect to the previous measurement in the electron channel; the 7 TeV double differential Drell-Yan cross sections [50], which cover a large range of lepton pair invariant mass and provide an important handle on quark-flavor separation for a wide range of x; the W production in association with charm quarks measured by CMS, [51], and the 940 pb −1 forward Z → e + e − production data taken at LHCb in 2011 [52].The former allows for directly probing of the strangeness, which is the least well known of all light quark PDFs.The latter provides a handle on the poorly known very small and very large-x regions.The 5 fb −1 7 TeV inclusive jet cross section dataset from CMS are also included [53], as well as the 2.76 TeV dataset from ATLAS, [54] supplemented by its correlation with the 7 TeV data, [55], and six measurements of the total top pair production cross section at 7 and 8 TeV provided by the ATLAS and CMS collaborations.
In Fig. 2 the impact of the LHC data in the NNPDF3.0set is quantified.The latter is at a half-sigma level, both for central values and for uncertainties, and it always leads XLIV International Symposium on Multiparticle Dynamics (ISMD 2014) to an improvement in uncertainties, pointing to the consistency of the LHC data with respect to the bulk of data in global analyses.Central values are mostly affected for quarks at medium and large x, and to a lesser extent for the gluon.In particular, the LHC improves the uncertainty of the gluon at medium and large x, due to the jet and top data.The down and strange quarks are also affected by the inclusion of the W + c, and the double differential distribution from CMS.In the next section we illustrate with two examples that the situation can be further improved in the future by including recently published data that can give a further handle on PDFs.
The ABM collaboration has recently released the ABM12 set tuned to the LHC data [5].They find that the fit of LHC W, Z production data improves the determination on the quark distributions at x ∼ 0.1 and especially constrains the d-quark distribution.The fit tuned to the top quark production data displays a reduction of gluon uncertainty.Other global fitting collaborations are planning to release new parton sets with LHC data in the near future.While some of these studies have been carried out by PDF fitting groups, thanks to the open-source PDF fitting package HERAfitter [58], experimentalists have developed an extensive program of PDF determinations from their own measurements.While these sets do not substitute the PDFs determined from a wide set of experiments, they provide an insightful view on the constraining power of measurements and open new avenues for measuring observables that are most useful for PDF determinations.

W, Z transverse momentum distributions
The measurements of the W and Z transverse momentum distributions that have been recently published by ATLAS and CMS, have the potential to constrain the light quark distributions and the gluon in the medium-x region, see Ref. [59] for a thorough analysis.Only the W p T measurement from the 2010 run of the LHC at √ s =7 TeV, corresponding to an integrated luminosity of 31 pb −1 , has been included in the NNPDF3.0analysis, while the Z measurements are left for future fits, once the higher statistics measurements at 7 and 8 TeV are going to be released.As an exercise, we have studied the effect of adding the ATLAS and CMS Z p T distributions and of the ATLAS W p T distribution in the NNPDF2.3fit.The ATLAS Z p T measurement is based on the combination of a 35 pb −1 integrated luminosity for Z/γ * → e + e − and 40 pb −1 luminosity for Z/γ * → μ + μ − .The CMS measurement at 7 TeV is based on a 36 pb −1 integrated luminosity of Drell-Yan muon and electron pairs in the Z-boson mass region.The p T ranges of the experimental data and references to the experimental papers are given in Table 1.
Fully differential inclusive boson-production cross sections can be obtained to second order in the strong coupling constant α s , i.e. up to next-to-leading order accuracy in the case of p T distribution, by using FEWZ 3.1 [29] and DYNNLO1.3 [30].The former allowed us to compute NLO electroweak corrections, which we assessed to have no discernible effect in this case.In the large p T region, the   cross section is dominated by the radiation of high-p T gluons and fixed-order theoretical predictions well describe the data.At lower values of p T , multiple soft-gluon emission dominate and fixed-order predictions are no longer valid.The resummation of soft gluons is needed to reconcile data with theoretical predictions.This is illustrated by Figs.3-4, in which data and fixed-order predictions are observe to depart from each other for p T 10 GeV.In the exercise we therefore exclude the p T bins ranges for which NLO calculations do not adequately describe data.
We apply the Bayesian reweighting procedure detailed in Refs.[42,43].The prior probability is given by the NNPDF2.3NLO set, which gives a Monte Carlo ensemble of N equally probable PDF replicas.The first step in reweighting the PDFs is to compare the predictions of each replica to the measured data.This gives each replica a χ 2 value, a measure of how well it agrees with the data.The χ 2 is computed by using the covariance matrix determined by experimentalists.While for the ATLAS W and CMS Z measurement, full information on the correlated systemat-EPJ Web of Conferences 07001-p.4Results are shown at Q 2 = 2 GeV 2 .The plot is made using the public APFEL PDF plotting library [69].
ics is given, in the case of ATLAS Z p T distribution we had to add in quadrature the systematic and statistical uncertainties.The χ 2 value is converted into a weight factor which is used in computing the reweighted observables O through the equation where n is the number of experimental data points used and N is the number of replicas.The weight factor is a measure of how much each replica contributes to the new PDF.Replicas whose predictions agree well with the data have large weight factors.Prior to reweighting, the weight factor of every replica in a NNPDF ensemble is one, as every replica is equally probable.Having calculated the weight factors, we can reweight the PDF using (2) with O = f a (x, Q 2 ).Fig. 5 shows the impact of the inclusion of the data on the gluon PDF.The figure shows negligible reduction in PDF uncertainty, only slightly in the region x < 10 −2 .We found the effective number of replicas close to 100, suggesting that the data is only weakly constraining.However, we expect the data based on a higher statistic sample at 7 TeV and 8 TeV [64][65][66] to have a much stronger impact.It would be interesting to explore the effect of the data that measure the ratio of the W and Z distributions [63], as it is suggested in Ref. [59].

Forward rapidity distributions
The LHCb has recently released a measurement of the inclusive W to muon production cross-section using data collected at the 7 TeV run of the LHC based on an integrated luminosity of approximately 1 fb −1 [68].Using the reweighting methodology described in the previous section, we explore the impact this data has in a global fit, as compared to the low-statistics data that have been included The plot is made using the public APFEL PDF plotting library [69].
in previous global fits [67], such as NNPDF2.3[44].Results are displayed in Fig. 6.The prior probability is given by the global NNPDF3.0set, excluding the LHCb forward rapidity distributions data that have been superseded by the new W → μν data, which we add via reweighting.
Results are encouraging as they show that the new data have a very significant impact on light quarks at small-x.Despite the prior probability is given by a global sets in which such distributions are already well-constrained by other data, the inclusion of this data is competitive and the effect is visible.

Conclusions
In this contribution we have given a broad overview of our knowledge of the structure of the proton, by emphasising great progress that has been made in the recent years; from the methodological, theoretical and experimental points of view.We have tried to summarise the main challenges and opportunities to come.Due to an enlarged dataset and precise theoretical settings we can realistically hope that PDF uncertainties will significantly decrease in the coming years.We have explored the impact of two datasets, out of the plethora of data to be investigated, to indicate the potential that future LHC data has.

Figure 1 .
Figure 1.Fig. from Ref. [9].The kinematic coverage in the (x, Q 2 ) plane of the NNPDF3.0dataset.The green stars mark the data already included in NNPDF2.3, while the circles correspond to experiments that are novel in NNPDF3.0.

Figure 2 .
Figure 2. Fig. from Ref. [9].Comparison of the down (top row) and gluon (bottom row) NNPDF3.0NNLO PDFs at Q = 100 GeV to PDFs obtained using the same dataset, but with all LHC data excluded

Figure 5 .
Figure 5.Comparison of the medium-x gluon for prior set (NNPDF23_nlo_as_0118) and reweighted set with the transverse momentum distribution data added to the fit.Results are shown at Q 2 = 2 GeV 2 .The plot is made using the public APFEL PDF plotting library[69].

Figure 6 .
Figure 6.Comparison of the small-x up quark distribution for prior set (NNPDF30_nlo_as_0118) and reweighted set with the forward W → μν data added to the fit.Results are shown at Q 2 = 2 GeV 2 .The plot is made using the public APFEL PDF plotting library[69].

Table 1 .
Range of transverse momentum measurements