Parton distributions in the LHC era

Analyses of LHC (and other!) experiments require robust and statistically accurate determinations of the structure of the proton, encoded in the parton distribution functions (PDFs). The standard description of hadronic processes relies on factorization theorems, which allow a separation of process-dependent short-distance physics from the universal long-distance structure of the proton. Traditionally the PDFs are obtained from fits to experimental data. However, understanding the long-distance properties of hadrons is a nonperturbative problem, and lattice QCD can play a role in providing useful results from first principles. In this talk we compare the different approaches used to determine PDFs, and try to assess the impact of existing, and future, lattice calculations.


Introduction
The description of physical processes involving nucleons is based on factorization, i.e. on the separation of scales between short-distance (hard) partonic interactions, described in perturbation theory, and the large-distance nonperturbative effects that are responsible for the internal structure of the nucleon, see e.g.Ref. [1].The nucleon structure is encoded in Parton Distribution Functions (PDFs), which are universal, i.e. they describe properties of the nucleon that do not depend on the physical process under study, e.g. the same PDFs describe the parton content of the nucleon both in Deep Inelastic Scattering (DIS), and in experiments at hadronic colliders like the LHC.A brief introduction to PDFs is presented in Sect. 2.
The lack of evidence for new particles at the LHC experiments suggests that signs of new physics beyond the Standard Model will only show up as small deviations from the Standard Model predictions, which can only be established by precision studies.Increased precision in the experimental results calls for more precise theoretical computations, which severely challenge the techniques currently in use.PDFs are a crucial input in any analysis of collider experiments involving hadrons, and therefore the precision in their determination needs to match the overall precision required in searches for new physics.
PDFs can be extracted from global fits to experimental data.The ever-increasing number of experiments that are included in these fits has led to better fits, with robust methodologies that allow the propagation of the experimental error from data to the fitted functions.Two main concerns arise as we enter the era of precision measurements at hadron colliders.In the kinematical regions that are constrained by data, the reduction in the statistical errors requires a careful assessment of the systematic errors in global fits.At the same time there are also kinematical regions that are not particularly constrained by data, but are relevant e.g. for searches of new physics, or precision Higgs measurements.In Sect. 3 we summarise the recent results of global fits, trying to assess the statistical and systematic errors, and the target precision that is necessary for lattice calculations to have an impact.
Numerical simulations of QCD allow the computation of PDFs directly from first principles.Recent theoretical developments have paved the way to significant contributions from lattice QCD, which can have an impact on the overall precision of the PDFs determination.We present a review of lattice results in Sect.4, summarising the current status of each methodology.While we do not aim to present an exhaustive review of results, we hope to address the main theoretical questions that arise in these lattice calculations, and refer to the bibliography for a more exhaustive list of references.

Parton Distribution Functions
As mentioned above, a generic hadronic observable involves both long-and short-distance contributions, and cannot be obtained simply by a perturbative computation.Factorization theorems allow a separation of the long-distance and short-distance contributions.The latter are obtained by computing matrix elements at the partonic level in perturbation theory, while the distribution of partons in the hadron is encoded in the PDFs.The factorization theorems guarantee that different observables (structure functions, cross sections, ...) can be expressed as functions of the PDFs; inverting these relations allows the PDFs to be determined from experimental data.Moreover PDFs are universal, i.e. the same parton distributions enter in all processes involving a given hadron.In this work we will only consider the parton distributions inside the proton, and we will illustrate the relation between PDFs and data in the familiar environment provided by DIS experiments.

Deep Inelastic Scattering
Deep Inelastic Scattering is the scattering of a lepton, with momentum k, off a nucleon, more precisely a proton in our case, with momentum P: where k is the momentum of the outgoing lepton, and X denotes a generic hadronic final state.We will consider here the simplest case, namely the case where a photon is exchanged between the lepton and the proton, in order to illustrate the main ideas without unnecessary complications.The reaction is depicted in Fig. 1.The computation of the amplitude for this process involves the evaluation of the matrix element of the electromagnetic current between the initial and final hadronic states, which encodes the effects of QCD nonperturbative dynamics.The differential cross section is given by where M N denotes the mass of the proton, 2 , and s = (p + k) 2 .The leptonic tensor L µν is readily computed in perturbation theory, while the hadronic tensor involves the nonperturbative matrix elements.Using the transformation properties of W µν under Lorentz transformations and parity, and current conservation, the hadronic tensor can be expressed as a function of two independent scalar functions.Introducing the kinematical variables ν = p • q, and x = Q 2 /(2ν), yields where F 1 and F 2 are dimensionless structure functions, which depend on x and Q 2 .The reaction is deeply inelastic when (p + q) 2 M 2 N .Eqs. 3, and 5 allow to rewrite the cross section in Eq. 2 in terms of F 1 and F 2 , so that a measurement of the cross section yields the structure functions; therefore the structure functions can be considered as physical observables, in particular they are independent of the renormalization scheme, or the renormalization scale.
Parton distributions appear in the factorization theorems for the structure functions.Introducing the convolution the factorization theorem yields The sum in both equations is over all partons, labelled by a, where a runs over all quarks, antiquarks, and the gluon;1 for each parton a there is a PDF f a (x, µ 2 ), and the observable structure functions are obtained by taking the convolution of the PDFs with the coefficient functions C 1a and C 2a .

General properties of PDFs
A discussion of the quantities that enter these two equations provides an effective summary of the important properties of PDFs.First of all it is important to realise that Eqs. 8, and 9 are derived up to terms of order O Q −2 , i.e. factorization works up to corrections that are suppressed by powers of Q 2 .It is conventional to call this term the leading twist contribution to the structure functions.
The coefficient functions C 1a and C 2a describe the perturbative, hard, partonic scattering.They are computed at some given order in perturbation theory, in a given renormalization scheme, and for a given value of the factorization scale µ.Two important observations are in order.The same PDF appears in the expressions for the two distinct observables.This is the universality of PDFs that was already mentioned above.The structure of the nucleon does not depend on the process that we consider, while the hard scattering does.The second observation is also related to a point we already made above, namely that the structure functions are physical quantities and therefore independent of the details of the renormalization procedure.This means that the PDFs themselves have to be scheme and scale dependent in order to cancel the dependence in the coefficient functions.This is an important point to keep in mind: PDFs are defined in a given renormalization scheme and at some given value of the factorization scale.The dependence of the PDFs on the factorization scale can be computed in perturbation theory, and is encoded in the DGLAP equations: The splitting functions P ab are known in perturbation theory up to NNLO [3,4].The solution of the DGLAP equations yields the evolution of the PDFs with the factorization scale; a consistent treatment requires the evolution kernel to be computed at the same perturbative order, and in the same renormalization scheme, as the coefficient functions.The PDFs at a generic scale µ 2 are obtained by convoluting the PDFs at a reference scale µ 2 0 with a perturbative kernel: DGLAP evolution has been implemented in numerous codes that are publicly available, see e.g.[5][6][7].
The PDFs and the coefficient functions satisfy analogous evolution equations, so that the physical structure functions are scale-independent.Clearly when the coefficient functions, and the splitting functions are computed to order α N s , we expect a residual scale dependence of the physical observables, of order α N+1 s .Finally, note that Eqs. 8, and 9 can be rewritten in terms of the Mellin moments of the structure functions, and the Mellin moments of the PDFs and the coefficient functions, which are defined analogously.The factorization theorems in Mellin space take the simple form

Operator definition
The PDF for a quark of flavor a can be defined as the matrix element of a non-local operator where |P is the hadron state with momentum P = (P + , m 2 /(2P + ), 0 ⊥ ) in light-cone coordinates, γ + = (γ 0 + γ 3 )/2, and A full discussion of this definition can be found in Refs.[1,[8][9][10].
The operator defined in Eq. 15 requires to be renormalized.Explicit calculations, e.g. in the MS scheme, show that the renormalization procedure introduces a dependence on the scale µ.This dependence is described by the DGLAP equations discussed above.We shall come back to discussing the properties of such operators below.
It is interesting to remark that this definition is not unique [11].A valid definition of a parton distribution is obtained by convoluting f a with any kernel D ab (x, Q/µ) that is perturbatively defined: A change in the definition of the PDFs entails a change in the coefficient functions, so that the physical observables can still be expressed via a factorization theorem, and remain unchanged.Care must be exercised when comparing PDFs to ensure that the same quantity is actually being computed and compared, especially when matching lattice quantities to quantities that are defined in Minkowski space.

Global fits from experimental data
Parton Distribution Functions are currently extracted from global fits to available experimental data, using factorization theorems to relate the PDFs to the physical observables.Before discussing the characteristics of these global fits, it is useful to summarise the basic ideas in a simple case, namely the non-singlet structure function for DIS, see e.g.[12].

The non-singlet PDF
The observable in this simple example is where F p 2 and F d 2 are the structure functions for the proton and deuteron respectively.As shown explicitly in Eq. 19, the factorization theorem allows the non-singlet structure function to be expressed as the convolution of the non-singlet PDF EPJ Web of Conferences 175, 01006 (2018) https://doi.org/10.1051/epjconf/201817501006Lattice 2017 with the coefficient function C NS , which is known in perturbation theory [13].Using DGLAP evolution, the structure function at any value of Q 2 can be written as a function of the PDFs at a single reference scale Q 2 0 : Eq. 21 summarises the challenges that global fits are trying to address.Experimental data, which are correlated and have statistical and systematic errors, appear on the left-hand side of the equation, and are used to determine the PDF at the reference scale on the right-hand side.The problem is illdefined in the sense that the continuous real functions f a (x, Q 2 0 ) cannot be determined from a discrete set of data, no matter how copious this set is.In order to overcome this difficulty, a parametrization for f a (x, Q 2 0 ) needs to be chosen; experimental data are then used in order to constrain the parameters that define the functional form.The parametrizations used for these fits need to be sufficiently flexible, so that they do not introduce a bias in the result of the fit.Moreover the error on the data needs to be propagated into an error on the fitted functions f a (x, Q 2 0 ).It is also clear from Eq. 21 that data for F NS 2 can only constraint the non-singlet PDF, and do not provide information on individual flavors distributions.In order to constrain all PDFs a large variety of processes needs to be included in the analysis.

Global datasets
Factorization theorems allow most observables to be written as a convolution of hard partonic cross sections, and PDFs.For example the cross sections for processes at hadron colliders can be written as where H 1 and H 2 are the hadrons involved in the collisions, and f a and f b are the parton distributions in these hadrons.Using DGLAP evolution again, it is clear that Eq. 22 yields an expression for observables as functions of the PDFs at the reference scale.The universality of PDFs allows to combine data from different experiments to constrain the same PDFs.Different experiments will constrain different combinations of PDFs, and different kinematical regions in x.Being able to combine all the available data, including the rapidly increasing amount of data from the LHC, is crucial to get the best determination of PDFs.
As an example, the result of the latest global fit by the NNPDF Collaboration is shown in Fig. 2.Here we review briefly the new data included in these global fits in going from [14] to [15], trying to identify their impact on the determination of PDFs.The reason for focussing on this specific example is twofold: understanding the current level of precision in global fits, and highlighting the impact of recent LHC data.
Deep-inelastic scattering data are summarised in Tab. 1.The final HERA combination [16] provides stringent bounds on quark distributions at medium values of x.The bottom [17,18] and charm [19] structure functions have been considered in order to constrain respectively the determination of the bottom mass, and the charm content of the proton.Tevatron data, reported in Tab. 2, include fixed target Drell-Yan from the E605 [20] and E866 [21][22][23] experiments, weak boson production from CDF [24] and D0 [25] Z rapidity distributions, and inclusive jet production [26].The very precise W lepton asymmetries in the electron [27] and muon [28], provide important information on the quark flavor separation at large-x, as demonstrated in [29].Nowadays an increasing number of LHC results has already been included in global fits.The recent NNPDF3.1 set of PDFs includes data for the Z boson double-differential distribution [30,31]; the inclusive W + , W − , and Z rapidity distribution [32,33]; the top-quark pair production normalized y t distribution [34,35]; the t t total cross section [36][37][38]; the inclusive jet cross section [39,40]; the low-mass Drell-Yan [41]; and the inclusive W, Z production [42,43].
This long list of experimental data should give a feeling for the variety of data used in these fits, and for the LHC contribution to the determination of PDFs, with most PDFs being affected at 1σ to 2σ level.The impact of these data is discussed in detail e.g. in Ref. [15].A quantitative estimate of the error reduction due to new data is shown in Fig. 3, where the statistical error for the gluon and the d quark PDFs are shown.An overall reduction of the error is seen for all values of x, sometimes by a factor of 2, bringing the relative uncertainty at the level of 2%.More detail can be found in the actual publications, but it is useful to keep in mind this order of magnitude as being typical of the uncertainty from global fits, in the regions that are reasonably constrained by the data.Clearly the error blows up at very small values of x.
The uncertainty on the PDFs is rapidly becoming one the limiting factors in searches for new physics.An example of the impact of the PDF error can be found in Ref. [44], where the relative size of the NLL corrections for gluino pair production was computed.As shown in Fig. 4, the error in the relative size of the NLL corrections grows very quickly as the gluino mass is increased, mostly as a consequence of the large PDF errors at large values of x.
There are several collaborations that are currently producing global fits, using basically the same datasets, but different methodologies, see Refs.[15,[69][70][71][72] for recent updates.Here we would like to summarise what we believe are the important issues to keep in mind when engaging in lattice studies of PDFs.It is interesting to remark that global fits yield consistent results within errors, despite a wide  variety of methodologies, which underscores the robustness of current studies.Based on the current global fits, the uncertainty in the central value of the PDFs is at the % level in the central x region at Q = 100 GeV.LHC data have already had a significant impact in reducing the error bars, and will continue to do so as Run-2 data are added to the existing fits.The determinations of PDFs in the small x and large x region have larger uncertainties, mostly due to the fact that reaching these kinematic regions is difficult in experiments.At the current level of precision the systematic errors in global fits will start to play a significant role, and will need to be assessed carefully.Reliable uncertainties are crucial both for precision Higgs physics, and for potential discoveries.First-principle calculations of PDFs can be performed using lattice QCD, which could thereby yield a significant contribution to the determination of PDFs.

Lattice computations
Recent work in lattice QCQ has led to significant progresses in the computation of PDFs from first principles.In this report, we will focus on recent developments in the extraction of light-cone PDFs from the so-called quasi-PDFs.  1 for the Tevatron fixed-target Drell-Yan and W, Z and jet collider data.The total number of Tevatron data points after cuts is 345/339 for NLO/NNLO fits.

Quasi-PDFs
In order to discuss the quasi-PDFs, it is useful to review briefly the field-theoretical definition using bi-local operators where Γ i is a matrix in spin space, and A µ is the gauge potential.The path-ordered exponential in Eq. 23 ensures that these operators are gauge invariant.We focus here on the operator F + obtained by  choosing Γ i = γ + , and integrating over a light-cone direction, ζ = (0, y − , 0 ⊥ ), F + needs to be properly renormalized, e.g.
where we have indicated explicitly the dependence on the renormalization scale µ -for a detailed discussion see e.g.Ref. [8].
The PDFs are obtained from the matrix element of F R between hadronic states with momentum where the scale dependence in the PDFs arises from the scale dependence of the renormalized bi-local operator in the right-hand side of Eq. 26.
Clearly the integral along the light-cone direction cannot be performed in Euclidean space.An interesting proposal to overcome this problem was put forward recently in Refs.[73,74], where socalled quasi-PDFs are introduced by shifting the integration contour in a purely spatial direction, e.g. the z-direction: Equation 27 defines a bare matrix element in the regularized theory, which can be evaluated by Monte Carlo simulations in Euclidean space.The dependence on the cutoff a has been made explicit in the expression above as a reminder of the fact that these quantities need to be properly renormalized.
In order to extract PDFs from the quasi-PDFs, the following step are needed: the renormalization of the lattice operators, including potential power divergencies, the relation between the Euclidean and the Minkowski space results, and a 'factorization theorem' relating the lattice renormalized quantity to the desired distributions.We shall now discuss these issues in turn.

Renormalization
The renormalization of operators O i (ζ) in Eq. 23 has been studied both in perturbation theory and nonperturbatively.
A one-loop calculation has been presented in Ref. [75,76].Although ultimately the operators will need to be renormalized nonpertubatively, a perturbative calculation is helpful in clarifying the renormalization pattern.
The operators considered in Ref. [75,76] are of the form with µ set to one , and Γ i spanning the entire Clifford algebra.These operators can be separated into 8 subgroups: The perturbative calculation highlighted the existence of finite mixings between pairs of the operators above under renormalization, and therefore the pattern of renormalization is where the dependence of the renormalization constants on the regulator X, and the renormalization scheme Y has been omitted to avoid cluttering the equation .We shall reintroduce this dependence below when we discuss conversion factors.The regularizations that we will consider here are dimensional regularization (DR) and the lattice formulation of gauge theories (LR).The renormalization schemes are MS and variants of the RI-MOM scheme, which we denote RI for simplicity.The elements of the renormalization matrix have been computed at one-loop in perturbation theory: The pairs involved in the mixing are: The details of the one-loop calculation can be found in Ref. [75].Besides showing explicitly the mixing pattern, the one-loop calculation provides the conversion factors to relate a nonperturbative scheme like RI-MOM to the MS scheme used in perturbative QCD, and in the global fits discussed in the section above.These conversion factors are independent of the regularization used, and can be written as The nonperturbative renormalization of the operators O i (z) proceeds along the usual lines.Having identified the renormalization pattern in Eq. 30, the matrix elements are computed by imposing a set of renormalization conditions on amputated correlators Λ(z, p): Z q is the fermion field renormalization, and the amputated correlators are defined as usual: Note that any divergence coming from the Wilson line is automatically taken into account in this framework.
Nonperturbative renormalization has been implemented recently by two groups, and first results have appeared in Refs.[77][78][79], where the interested reader can find a detailed discussion, and results.
An alternative approach to the renormalization of these nonlocal operators has been proposed in Ref. [80].Having different renormalization methodologies, and different lattice discretizations, should ultimately lead to cross-checks and robust results for the continuum extrapolation of the quasi-PDFs.

Power divergences
It has been suggested in Ref. [81] that the operator defined along a spatial direction in Euclidean space could have power divergencies that do not appear in Minkowski space, where the integral is performed along a light-cone direction.This issue needs to be investigated in more detail, as it could potentially invalidate the lattice approach.

Euclidean/Minkowski definition
The matrix elements of interest, Eq. 27 are extracted in Monte Carlo simulations from correlators computed in Euclidean time.The latter can be written as a sum of decaying exponentials, where the decay rate is given by the discrete energy levels in the spectrum, and the coefficients of each term in the spectral decomposition yield the finite-volume matrix elements of operators.In the case of the quasi-PDFs it is clear from Eq. 27 that operators of interest only involve fields at time t = 0, and EPJ Web of Conferences 175, 01006 (2018) https://doi.org/10.1051/epjconf/201817501006Lattice 2017 should be completely agnostic about the space-time signature.Both the operators, and the eigenstates of the Hamiltonian, are independent of the choice of metric.However the procedure for extracting the matrix elements does depends on it.
For the case of quasi-PDFs the dependence of the matrix elements on the metric has been discussed in Refs.[82,83].Here we briefly summarise the argument in Ref. [83] form where we borrow the notation.In Euclidean space the matrix element is extracted from the coefficient of terms that decay exponentially in time.For a three-point function we obtain: where N(τ, P) is an interpolating operator with the quantum numbers of the proton, and spatial momentum P, O i is the operator of interest, E P = √ P 2 + m 2 , and the ellipses denote terms that decay faster for large values of the Euclidean time τ.In Minkowski space the same matrix elements is extracted from the residue of a T-ordered product of the same fields at the pole corresponding to the two protons going on-shell.
A one-loop calculation in a scalar field theory toy model is discussed in full detail in Ref. [83], showing how the two procedures described above yield the same result for the matrix element.

Factorization/matching
Having performed a proper renormalization of the lattice operator, quasi-PDFs can be defined in the continuum limit: These can be considered like bona fide observables, that are finite when the lattice spacing vanishes, and are independent of the lattice discretization used in their definition.We have emphasized in Eq. 36 the dependence on the renormalization scale µ and the momentum of the nucleon state P z .Any extrapolation/interpolation to the physical values of the quark masses should also be performed at this stage.The continuum limit of the quasi-PDF obtained using different discretization could, and should be compared.This would yield an idea of the systematics of the lattice calculation before trying to relate the quasi-PDFs to the light-cone ones.It may be desirable at this stage to convert the renormalized quasi-PDF to the MS scheme, in order to avoid any references to the details of the lattice formulation after this stage.
Having determined the continuum limit of the quasi-PDFs, and having a robust control of the systematics, the final step in the process of computing the light-cone PDFs from first principles is the matching of the quasi-PDFs to the light-cone ones [84][85][86][87].We expect to be able to write this last step in the form of a factorization theorem, relating a physical observable -in this case the continuum limit quasi-PDFs -to the light cone PDFs.The role of the physical scale of the observable is played by the momentum P z , and therefore q(x, µ, M N , P z ) = C q x, P z µ ⊗ f (x, µ) where C q is a coefficient function to be computed in perturbation theory, and are the analogues of the coefficient functions that we discussed earlier for the structure functions.Note that the corrections to Introducing the kernel Eq. 41 can be trivially inverted in Mellin space, and transformed back to x-space, yielding where we assumed so that the corrections to this formula are negligible.The kernel function can be computed in perturbation theory, the order of the computation defining the order of the PDF.The DGLAP evolution of f (x, µ) is obtained from the µ dependence of the kernel Z.It is worthwhile to emphasise that a large momentum P z is required to suppress higher-twist contributions.As long as those are negligible, data for different values of P z can be combined to extract PDFs.

Conclusions
There has been a lot of recent progress in the determination of PDFs on the lattice, and it is impossible to give an extensive review of all the results here.Interesting results that we were not able to cover include the momentum smearing technique [91][92][93], the usage of the Feynman-Hellmann relation to compute the Compton amplitude [94], or the direct computation of the hadronic tensor of the nucleon [95].While there are theoretical questions that still need to be settled, recent results suggest that lattice determinations of PDFs will become a reality in the near future.While it is theoretically comforting to be able to compute these quantities from first principles, it would be desirable for the lattice to provide some useful input in the study of physical processes at hadron colliders.For this purpose lattice computations need to match, and eventually improve, the accuracy of the determinations obtained from global fits.The latest global fits yield very precise predictions, at the % level.There are however flavor combinations and/or kinematical regions that are poorly constrained by the data.A benchmarking exercise is currently being carried out, which will compare the different determinations, and assess the scope for progress.It will provide an excellent platform for determining future directions in this field.[96] The LHC is guiding us into an era of precision physics at hadronic colliders.Perturbative and nonperturbative QCD effects need to be known with great accuracy.As the statistical errors decrease, control over systematics becomes mandatory.The challenge for the theory community is set.

Figure 1 .
Figure 1.Kinematics of DIS: a lepton with momentum k scatters off a nucleon with momentum P. The final state contains the lepton with momentum k and some hadronic state X.

Figure 2 .
Figure 2. Results of the latest global fit by the NNPDF Collaboration.PDFs are shown at factorization scales of 10 GeV 2 (left) and 10 4 GeV 2 (right).PDFs at diferent factorization scales are related by DGLAP evolution.Note the size of the statistical errors for the different partons in different kinematical regions.

Figure 3 .
Figure 3. Error reduction in the gluon (left) and the d (right) PDFs due to the inclusion of recent LHC data in the NNPDF global fit.The relative error is shown as a function of x.The NNPDF31 global fit includes the extra data as discussed in the text above.

Figure 4 .
Figure 4. Impact of the PDF errors on the NLL corrections to gluino pair production at the LHC.The ratio σ NLO+NLL /σ NLO is shown as a function of the gluino mass.The error band is obtained from the propagation of the PDF error in the numerator.Plot courtesy of J. Rojo.

Table 1 .
Deep-inelastic scattering data included in NNPDF3.1.The EMC F c 2 data are in brackets because they are only included in a dedicated set but not in the default dataset.New datasets, not included in NNPDF3.0, are denoted (*).The kinematic range covered in each variable is given after cuts are applied.The total number of DIS data points after cuts is 3102/3092 for the NLO/NNLO PDF determinations (not including the EMC F c

Table 2 .
Same as Table

Table 3 .
Same as Table 1, for ATLAS, CMS and LHCb data from the LHC Run I at √ s = 2.76 TeV, √ s = 7 TeV and √ s = 8 TeV.The ATLAS 7 TeV Z p T and CMS 2D DY 2012 are in brackets because they are only included in a dedicated study but not in the default PDF set.The total number of LHC data points after cuts is 848/854 for NLO/NNLO fits (not including ATLAS 7 TeV Z p T and CMS 2D DY 2012).