Multi-boson block factorization of fermions

The numerical computations of many quantities of theoretical and phenomenological interest are plagued by statistical errors which increase exponentially with the distance of the sources in the relevant correlators. Notable examples are baryon masses and matrix elements, the hadronic vacuum polarization and the light-by-light scattering contributions to the muon g-2, and the form factors of semileptonic B decays. Reliable and precise determinations of these quantities are very difficult if not impractical with state-of-the-art standard Monte Carlo integration schemes. I will review a recent proposal for factorizing the fermion determinant in lattice QCD that leads to a local action in the gauge field and in the auxiliary boson fields. Once combined with the corresponding factorization of the quark propagator, it paves the way for multi-level Monte Carlo integration in the presence of fermions opening new perspectives in lattice QCD. Exploratory results on the impact on the above mentioned observables will be presented.


Introduction
Over the last three decades we have had an extraordinary conceptual, algorithmic and technical progress in numerical lattice gauge theory which have led to the simulation of Quantum Chromodynamics (QCD) with quark masses at the physical point, see Ref. [1] for a recent review. Lattice QCD became a theoretical femtoscope for studying the dynamics of the strong interactions in Nature. It opened the window on quantities not accessible to experiments which may help understanding the underlying dynamical mechanisms of the theory. The interesting chiral regime of QCD became accessible to non-perturbative computations.
The femtoscope, however, is still rather crude. Often we compute what we can and not what we would like to. With state of the art techniques, numerical computations of hadronic correlation functions suffer from signal-to-noise ratios which decrease exponentially with the time separation of the sources, notable exceptions being the propagators of non-singlet pseudoscalar mesons. For connected Wick contractions, the problem can be traced back to the fact that, on a typical gauge configuration, the quark propagator decreases approximatively as exp{−M π |y − x|/2} at asymptotically Speaker, e-mail: Leonardo.Giusti@cern.ch. Preprint numbers: CERN-TH-2017-195, DESY 17-170, HIM-2017-07 large distances |y− x|, while the expectation value of a generic hadron correlator decays much faster [2,3]. This problem afflicts many computations at the forefront of research in lattice QCD: the hadronic vacuum polarization and light-by-light scattering contributions to the muon g − 2, the amplitudes of leptonic and semileptonic B decays, masses and matrix elements of (multi) baryons states, etc. It is timely to solve this problem so to be able to extract the maximum information from the new experimental results expected in the coming years.
The conceptual framework for a solution has already been introduced in bosonic theories. The multi-level Monte Carlo integration takes advantage of the fact that, when the action and the observables depend locally on the integration variables, the degradation of the signal-to-noise ratio with the distance of the sources can be avoided by measuring independently the local building blocks of the observables. This leads to an impressive acceleration of the simulations [4][5][6][7][8][9], and fully solves the problem in some cases.
It is not straightforward, however, to formulate multi-level algorithms for systems with fermions. Once they have been analytically integrated out in the path integral, the manifest locality of the action and of the observables is lost. The fermion determinant and propagator are non-local functionals of the background gauge field. The aim of this talk is to review a recently proposed factorization of the fermion determinant in lattice QCD that leads to a bosonic theory with a local action in the block gauge, pseudofermion and multi-boson fields [10]. Together with the factorization of the fermion observables presented in Ref. [11], this opens the way for multi-level simulations of QCD. Exploratory results on the impact on the above mentioned computations will also be reviewed.

Signal/noise ratio in lattice QCD
At large time distances |y 0 − x 0 |, the zero-momentum propagator of a non-singlet pseudoscalar meson and its variance decay as C π (y 0 , x 0 ) = W π (y 0 , x) ∝ e −M π |y 0 −x 0 | , σ 2 π (y 0 , x 0 ) ∝ e −2M π |y 0 −x 0 | , where and the Hermitian-Dirac operator is defined as 1 Q = γ 5 D. This is so because the mean and the width of the distribution of the positive stochastic variable Tr Q −1 (y, x)[Q −1 (y, x)] † decay exponentially with the same exponent at large distances, which suggests that configuration by configuration in the representative ensemble it holds This is confirmed by numerical results on the lattice. As a consequence, the typical size of a connected Wick contraction at large time distances is exp{−nM π |y 0 − x 0 |/2}, with n being the number of quark propagators, while the expectation value of a generic hadron correlator decays much faster because hadron masses are naturally much larger than M π [2,3]. For disconnected contractions, the problem is even worse due to the vacuum contribution to the variance. As we will see in the next sections, the cause of the problem, i.e. Eq. (3), is also a key ingredient of its solution.
Today the exponential degradation of the signal-to-noise ratio sets the limits of many computations of theoretical and phenomenological interest. In the remaining part of this section we list some examples which at present are the object of an intense theoretical and experimental research activity.

Baryon correlators
The nucleon two-point function at zero momentum C N is the prototype example of this sort. The signal-to-noise ratio squared decreases as where |y 0 − x 0 | is the time-distance of the sources and (2M N −3M π ) is as big as 7.4 fm −1 at the physical point. The number of configurations needed to reach a given statistical precision thus increases with that exponential factor. Analogous considerations hold for three-point (and higher) baryonic correlation functions. For a precise and accurate determination of g A at the physical point, for instance, the chiral effective theory suggests that a time separation of 2.0 − 2.5 fm is needed between the axial vector current and the nucleon interpolating operators [12][13][14]. At present, typical time separations affordable in numerical calculations are, instead, between 1.0 and 1.5 fm. The problem becomes more and more severe for correlation functions of fields with higher and higher baryon number.

Vector correlators
For the non-singlet vector two-point function at zero momentum C ρ , the signal-to-noise ratio squared goes as where m ρ is the lightest asymptotic state in that channel. In the singlet case, the exponential degradation is again worse due to the vacuum contribution to the variance. This fact prevents a precise determination of, among other quantities, the Hadronic vacuum polarization (HVP) and the Hadronic light-by-light (HLbL) contributions to the muon g − 2 on the lattice at the physical point. The HVP can indeed be written as [16] a HVP where α is the electromagnetic coupling constant, K(x 0 ; m µ ) is a known analytic function, and G(x 0 ) is the correlation function of two electromagnetic currents at a temporal distance x 0 , see Ref. [15] for unexplained notation. The non-singlet contribution to a HVP µ of the u and d quarks is shown in Fig. 1 for pion masses of approximatively 190 (top) and 270 MeV (bottom) respectively 2 [15]. In those plots the effect of the exponential decrease of the signal-to-noise ratio with the distance x 0 of the sources is evident. As a result, the contribution to the integral in Eq. (6) is computed from data up to x cut 0 = 1.1-1.4 fm only, while the rest is estimated by the Gounaris-Sakurai based extension of the vector correlator [15]. The final statistical and dominant systematic errors turn out to be approximatively 5 and 2.5 percent respectively. If the signal could be kept well under control up to time distances of 2.5-3.0 fm or more, then one would be able to reach the percent precision or better within QCD.

Non-zero momentum correlators
Non-zero momentum correlators suffer from the exponential degradation of the signal-to-noise ratio as well. For the pseudoscalar mesons, the signal-to-noise ratio squared goes as An example of effective energies corresponding to |p| ≈ 840 (left) and |p| ≈ 1200 MeV (right) from Ref. [18] are shown in Fig. 2. The exponential decrease of the signal-to-noise ratio is evident in these data. The same would happen if the momentum is given to the mesons by imposing twisted boundary conditions in the spatial directions. Needless to say this is one of the basic building blocks entering the Wick contractions of hadronic and semileptonic decays, see below.

Static-light correlation functions
For a static-light two-point correlation function C B , the signal-to-noise ratio squared goes as where E stat is the ground-state energy of the B-meson which diverges linearly with the inverse of the lattice spacing. This degradation is the bottleneck in the computation of the leptonic decay constant of the B-meson in the static limit [20]. The three-point functions needed for the semileptonic decays B→π(K)lν, B→K(K * )ll, etc. have as basic building blocks the propagators of a static-light meson on one side and of a relativistic meson with a (large) momentum on the other side. As we have seen, both of them suffer from an exponential degradation of the signal-to-noise ratio. At present this sets the limit for the computation of these three-point functions, and prevent us from determining the form factors at small invariant lepton masses Q 2 . An example of the difficulties encountered is shown in Fig. 3. There the ratio of the threepoint function of a vector current with the pseudoscalar interpolating operators of a B s and a K meson over the square roots of the corresponding two-point functions at approximatively Q 2 = 20 GeV 2 is plotted [19].

Domain decomposition preliminaries
To find a solution to the signal-to-noise problem, we will start by making use of several decompositions of the global lattice in non-overlapping and overlapping domains [21,22]. Without loss of generality, we will consider a lattice with periodic and open boundary conditions in the space and time directions, respectively [23]. The first decomposition of the lattice is in three non-overlapping thick time-slices Λ i , i = 0, 1, 2, with the inner and outer (time-slice) boundaries indicated by ∂Λ i and Figure 4. Examples of non-overlapping and overlapping domain decompositions considered in these proceedings.
∂Λ * i respectively, see Fig. 4. It is useful to define projection operators onto the subspaces of quark fields supported on the domains Λ i as and analogously for P ∂Λ i and P ∂Λ * i . In the rest of these proceedings we will use the above symbols irrespectively of the dimension of the full space on which the projectors act. The Hermitian O(a)improved massive Wilson-Dirac operator can then be written in the block form 3 Maybe the simplest decomposition of the lattice in overlapping domains is obtained by defining Fig. 4. Projection operators on those domains and their boundaries can be defined analogously to Eq. (9). In each of these domains, the operator Q takes the block form Another domain decomposition which turns out to be instrumental in the following is the two-block non-overlapping partitioning of the lattice with Γ = Λ 0 ∪ Λ 2 and Γ * = Λ 1 . Notice that Γ is a disconnected domain.

Quark propagator and locality
The quark propagator between two points x and y, or better Q −1 (y, x), is formally a non-local functional of the background gauge field over the entire lattice. Our intuition, however, suggests that Q −1 (y, x) should depend weakly on the values that the gauge field takes far away from the region between x and y. To formalize this insight, we consider the case of x, y ∈ Λ 0 , decompose the lattice in the two non-overlapping blocks Γ = Λ 0 ∪ Λ 2 and Γ * = Λ 1 , and choose the thickness ∆ of Λ 1 so that M π ∆ 1. The Schur complement, defined as usual as is then given by By noticing that after a few steps of algebra one obtains where Our intuition is formalized in Eq. (15). The first term on the r.h.s. (continuous line in Fig. 5) does not depend on the gauge field in Λ 2 . That dependence comes only from the second term (dashed line in Fig. 5) which propagates a quark from x ∈ Λ 0 to the region Λ 2 and back to y ∈ Λ 0 . The contribution from these paths is suppressed proportionally to e −M π ∆ thanks to Eq. (3). The very same domain decomposition sheds light also on the gauge-field dependence of Q −1 (y, x) when x and y are in distant blocks, e.g. x ∈ Λ 0 and y ∈ Λ 2 . By following an analogous derivation, one arrives to Figure 6. Continuous and dashed lines represent the first two contributions to the quark propagator in Eq. (18), corresponding to n = 0 and 1.
From Eq. (16) it is clear that the operator w propagates a quark from the inner boundary of Λ 0 to Λ 2 and back to ∂Λ 0 , and it is therefore suppressed proportionally to e −M π ∆ . By neglecting its contribution on the r.h.s of Eq. (17), the bulk of the quark propagator turns out to have a factorized dependence on the gauge field in Λ 0 and Λ 2 . The Eqs. (15) and (17) provide further insight into the dependence of the quark propagator from the gauge field. In Eq. (17), for instance, we can expand the factor (1 − w) −1 into the Neumann series to obtain We recognize on the r.h.s the result of a Schwarz alternating procedure (SAP) with overlapping domains Ω * 0 and Ω * 1 . The propagator is written as a series of terms, each having a factorized gauge-field dependence of increasing complexity. The index n counts the number of times a quark loops from the inner boundary of Λ 0 to ∂Λ 2 and back to ∂Λ 0 before arriving in y. The contribution of these paths is suppressed proportionally to e −nM π ∆ . The thickness ∆ of the overlapping region regulates the rate of convergence of the associated Neumann series, making SAP with overlapping domains also a valid alternative for computing the quark propagator in lattice QCD with respect to the case of non-overlapping domains 4 [21].

Block decomposition of the determinant
The domain decomposition of the lattice in the two blocks Γ = Λ 0 ∪ Λ 2 and Γ * = Λ 1 is also the starting point for the factorization of the gauge-field dependence of the quark determinant. The LU decomposition of the associated 2 by 2 block form of the Dirac operator leads to which, thanks to Eqs. (14), can be re-written as By noticing that the last determinant on the r.h.s can be reduced to the one of a matrix acting on one of the boundaries only, it is easy to show that and therefore For the first three determinants on the r.h.s, the goal has been reached: det Q −1 11 depends on the gauge field in the block on the gauge field in Ω * 1 . The (small) remaining determinant det [1 − w] still depends on the gauge field over the whole lattice. As in Eq. (18), we can expand the factor [1 − w] −1 into the Neumann series and obtain where N is chosen to be even, u k = e i 2πk N+1 (k = 1, . . . , N) are the roots of the approximant polynomial N k=0 w k , the remainder polynomial is R N+1 (1 − w) = w N+1 , and C is an irrelevant numerical constant. For the last equality we have used the fact that the roots of the approximant polynomial come in complex conjugate pairs and that w is similar to w † [10]. We recognize in Eq. (23) a specific implementation of Lüscher's original multi-boson proposal [24] generalized to complex matrices [25][26][27], see Ref. [10] for a more general discussion. By defining the matrix we can perform the reverse substitution of the one in Eq. (21), and finally obtain It is this expression with W z acting on ∂Λ 0 and ∂Λ 2 that will allow in the next section for a fully factorized domain decomposition of the fermion action. For the determination of the approximation, however, it has been advantageous to work with the operator w (acting on ∂Λ 0 only) since the order of the polynomial N is reduced by about a factor of 2 for a given accuracy.

Multi-level integration with fermions
By introducing auxiliary pseudofermion and multi-boson fields, for two flavors of quarks we can finally represent the determinants in Eqs. (22) and (25) where C is another irrelevant numerical constant. Each pseudofermion field φ i is confined to the corresponding region Λ i , i = 0, 1, 2. The N multi-boson fields χ k live on the outer boundaries of region Λ 1 . We can decompose them as χ k = η k + ξ k , with η k = P ∂Λ 0 χ k and ξ k = P ∂Λ 2 χ k , and split explicitly the contributions from the inner boundaries of regions Λ 0 and Λ 2 as The dependence of the bosonic action from the gauge field in block Λ 0 and Λ 2 is thus factorized. Interestingly, the terms in Eq. (27) which contribute to the forces in region Λ 0 always start (or end) on the inner boundary of Λ 2 and vice versa. The matrices in Eq. (27) contain one boundary to boundary propagator which is suppressed exponentially in ∆, and so are the corresponding forces. The factorization of the gauge-field dependence in the bosonic action has been achieved by treating differently the contributions from the various quark paths to the fermion determinant. Those with no loops around the inner boundaries of Λ 0 and Λ 2 have a factorized dependence on the gauge field in Λ 0 and Λ 2 , and can then be included by introducing the pseudofermion fields φ i in each of the three blocks. The contributions from quark paths with 1 up to N loops around ∂Λ 0 and ∂Λ 2 are introduced via the multi-boson fields living on these boundaries and their interactions. The contributions from higher loops are either negligible within the precision required, or can be associated to the observables in the form of a reweighting factor, see below.
We are now in the position to formulate a multi-level numerical integration for lattice QCD. A given correlation function of a string of fields O can be written as where O fact is a rather precise factorized approximation of O that can be obtained by expressing the quark propagators in the fermionic Wick contractions following Eqs. (15) and (17) as in Ref. [11], and · N indicates the expectation value in the theory defined by N multi-boson fields. Since both the action and the observable are factorized, the expectation value O fact N can be computed with a multi-level algorithm by generating gauge field configurations with the multi-boson action at finite N. 5 The identity det Q −1 can be used to speed up the simulation when region 1 is active. All other quantities in Eq. (28) can be computed with a standard one-level Monte Carlo procedure. For two flavors, the reweighting factor W N is This expression is easily evaluated as where the exponent can be computed by a Taylor expansion, and as usual the integral over η can be replaced by random samples.

A crucial numerical test
The feasibility of the whole proposal hinges crucially on the assumption that the spectrum of the operator (1 − w) is confined into a disk around 1 in the complex plane, with a radius significantly below unity. Only in this case, a small number of bosonic fields N in Eq. (27) leads to a good enough approximation at a reasonable computational cost. To test this assumption, 200 configurations with the Wilson gluonic action and with two flavors of nonperturbatively O(a)-improved Wilson quarks have been generated in Ref. [10], with β = 6/g 2 0 = 5.3, T × L 3 = 64 × 32 3 a 4 and open boundary conditions. The lattice spacing is a = 0.0652(6) fm, while the pion mass is aM π = 0.1454(5) corresponding to 440(5) MeV, see Ref. [10] for more details.
For ∆/a = 8, 12 and 16, 60 approximate eigenvalues δ i of w with the largest absolute value have been computed with the Arnoldi algorithm. In the left plot of Fig. 7 all eigenvalues for all 200 configurations are shown for ∆ = 12 a. As expected, they are either real or appear in complex conjugate pairs. The blue circles in these plots have radiusδ and 2δ, whereδ = exp{−M π ∆}. The distribution of the eigenvalue with the largest magnitude is shown in green in the right plot of Fig. 7. It is peaked at a value slightly smaller thanδ, denoted by a vertical blue line, and extends up to ≈ 2δ. The results for the largest eigenvalue norm computed over the 200 configurations, its average value and the estimate of its standard deviation are also reported in Table 1. In the right plot of Fig. 7 we also report in grey the distribution of the absolute value of the eigenvalues limited to those with |δ i | > 0.35δ. A clear picture emerges from these data. The largest eigenvalue of the relevant operator w decreases proportionally to exp{−M π ∆} in this range of values of ∆, with a prefactor of order 1. This in turn implies that (1 − w) has a large gap if ∆ is properly tuned. The relative error on the determinant at various values of N compares well with |δ| N+1 max configuration by configuration [10]. No big prefactors appear because the eigenvalues do not accumulate near the maximum one, and the approximation gets exponentially more precise toward the center of the circle. The reweighting factor, as defined in Eq. (30) for N = 12 and estimated with 4 random sources per configuration, deviates from 1 by at most 4.5 · 10 −6 , again in line with the expectation. At the level of precision of most contemporary simulations the impact of the reweighting factor is therefore negligible.

Numerical tests of MB-DD-HMC
The effective action in Eq. (26) can be simulated by using variants of the hybrid Monte Carlo algorithm [28]. The introduction of multi-boson fields and the resulting multi-boson domain-decomposed hybrid Monte Carlo (MB-DD-HMC) do not pose particular problems, see Ref. [10] for more details on its implementation. For a first test of its potentiality, a subset of n 0 = 32 configurations spaced by at least 80 molecular dynamics units (MDUs) among the 200 described in Section 7 has been selected in Ref. [10]. Starting from each of them, n 1 = 45 level-1 configurations spaced by 4 MDUs have been generated by keeping fixed the spatial links on the boundaries ∂Λ 0 and ∂Λ 2 and all the links in between. The region Λ 1 extends between time slices 24 and 35, corresponding to a thickness of ∆ ≈ 0.8 fm and M π ∆ ≈ 1.7. Maybe the simplest observables to be computed for a first test of the algorithm are the one-and two-point gluonic correlation functions of the energy and the topological charge densities summed over the time slices with F µν (x) being the gluon field strength tensor, see Ref. [10] for more details. The two-level estimates of these quantities have been carried out by first averaging, for each of the n 0 configurations, the densities over the n 1 level-1 background fields. This gives the n 0 measurements of the improved estimator for the one-point function, while for the two-point correlators the n 0 measurements are obtained by multiplying the improved densities. The figure of merit is the variance of the estimators. In the situation where autocorrelations among the n 0 level-0 configurations can be neglected, the square root of the variance divided by √ n 0 gives the error of the measurement. Since the cost of the simulation scales linearly in n 1 , the variance itself should decrease with n 1 to break even. The square root of the variance of C e (x 0 ) as a function of x 0 is shown in the left panel of Fig. 8 for various values of n 1 . In the central region, the links are frozen during the level-1 updates. We therefore do expect the same variance as in the level-0 estimator, but with a larger error since in this case the number of level-0 configurations is 32 instead of the 200 used in the standard case. Once the density moves into the active regions Λ 0 and Λ 2 , however, the variance of the estimator is clearly improved, in agreement with what is expected from ideal scaling, i.e.
In the right panel the same analysis is shown for the two-point function C qq , and analogous results are obtained for C ee . Here the full benefit of the method can be realized, because an improved estimator can be constructed by averaging for each of the n 0 fields the densities in regions Λ 0 and Λ 2 independently before constructing the two-point function. As optimal scaling in this case we expect a reduction of the square root of the variance, and therefore the error, with 1/n 1 . The numerical data are in agreement with such a reduction once x 0 and y 0 are in two different active regions.
These results are in line with expectations. In the region where the links are frozen during the level-1 updates no benefit from the multi-level is observed. As soon as the densities are in the active regions Λ 0 and Λ 2 , the square root of the variances of the one-and two-point functions are reduced by 1/ √ n 1 and 1/n 1 respectively. The two-level Monte Carlo works at full potentiality in these regions, with a net gain of a factor n 1 in the signal-to-noise ratio of the two-point function. This in turn implies that links in the active regions Λ 0 and Λ 2 are regularly updated during the level-1 MB-DD-HMC, and no particular freezing induced by multi-boson fields is observed.

Tests of two-level integration for fermionic correlators
So far the effectiveness of two-level integration for fermionic correlation functions has been tested mostly in the quenched approximation of two flavour QCD. The reason being that the generation of the gauge field backgrounds is much cheaper, while keeping the essence of the signal-to-noise ratio problem. Two-level integration has been implemented for zero momentum correlators of two singlet pseudoscalar densities [10,11], two non-singlet vector currents [29], a baryon propagator [11], and meson propagators at non-zero momentum [29]. In all these exploratory studies an impressive gain in the statistical precision has been observed when a two-level integration is at work. Due to lack of space, in this section we will briefly summarize the main results for the correlators of two non-singlet vector currents and for the baryon propagator only. Those correlators have been computed by discretizing gluons and fermions with the Wilson action, and by imposing open and periodic boundary conditions in the time and spatial directions respectively [23,30]. The inverse coupling constant is fixed to β = 6/g 2 0 = 6.0, the length of each spatial direction to L = 24 a, and the time extent to T = 64 a. The lattice spacing is a = 0.093 fm as fixed by assuming a physical value of 0.5 fm for the Sommer scale r 0 /a = 5.368 [31]. The up and down quarks are taken to be degenerate with a mass fixed by the hopping parameter value k = 0.1560, corresponding to a pion of approximatively 455 MeV [32]. A 1000 level-0 independent gauge-field configurations have been generated with the HMC, and for some of them level-1 configurations have been produced subsequently, see below and Ref. [11] for more details.

Non-singlet vector two-point function
For n 0 = 50 of the level-0 configurations, n 1 = 30 level-1 gauge fields have been generated by updating independently the gauge field in Λ 0 and Λ 2 while freezing the links in Λ 1 . The latter includes the time slices between 16 and 23, corresponding to a thickness of ∆ ≈ 0.7 fm and M π ∆ ≈ 1.7. On all those configurations the exact Wick contraction of the non-singlet vector-vector correlator has been computed, for more details see Ref. [29].
On the left plot in Fig. 9, it is shown the effective mass of the vector correlator as a function of the sink coordinate y 0 (the source is kept fixed at x 0 = 8a) with (red) and without two-level integration (black). The data are cut when the relative error reaches 10%. On the right plot it is shown the standard deviation of the correlator with (red) and without (black) two-level integration both normalized to the standard deviation with n 1 = 1.
A picture similar to the one for the gluonic observables emerges. When the source and the sink are deep in the two active regions, i.e. y 0 > 24a, the square root of the variance is reduced by ≈ 1/n 1 signaling that the two-level Monte Carlo is working at full potentiality. With two-level integration, the plateau in the effective mass turns out to be approximatively 1 fm longer than the one computed in the standard way. No attempt was made to reach the value of n 1 at which the reduction of the variance starts to slow down.
These results suggest that, when applied to full QCD with light quark masses, the two-level integration can indeed solve the problem of large statistical errors in the lattice determination of the hadron contributions to the muon g − 2.

Baryon propagator
For n 0 = 50 level-0 configurations, n 1 = 20 level-1 gauge fields have been generated by freezing the links in Λ 1 which, in this case, includes the time slices between 16 and 32, corresponding to a thickness of 6 ∆ ≈ 1.4 fm and M π ∆ ≈ 3.4. The gauge fields in Λ 0 and Λ 2 have then been updated independently. On all those configurations the exact Wick contractions for the baryon propagator C N , a factorized approximation C fact N , and the remainder defined configuration by configuration by have been computed, see Ref. [11] for more details. All of them have been determined starting from local sources on the time-slice at x 0 = 4a. Extensive numerical tests show that the factorized correlator approximates the exact one at the level of 5 − 10%. The final results for the correlator with (filled black squares) and without (open squares) the twolevel integration are shown in the left plot of Fig. 10, together with the factorized contribution only (red circles). Thanks to the two-level Monte Carlo, the signal-to-noise ratio for the factorized contribution remains larger than 1 for 10 additional time-slices with respect to the standard evaluation. When the remainder, C rest N , is added the gain reduces to 5 additional points. The effectiveness of the two-level integration is better seen on the right plot of Fig. 10, where the standard deviations of the various contributions are normalized to the one of the exact correlator. For completeness we report also the normalized standard deviation on our best two-level estimate of the full correlator.
At large time distances, the statistical error on the standard estimate of C N is dominated by the one on C fact N . Once the two-level integration is switched on, the error on C fact N decreases 7 roughly as n −1 1 , while the one on the remainder continues to scale as n −1/2 1 . The multi-level therefore works at its best until the red curve on the right plot of Fig. 10 hits the green curve. After that point the statistical error on the two-level estimate of the correlator is dominated by the one on the remainder, and increasing n 1 is not profitable anymore.
Multi-level simulations of baryon correlation functions solve the problem of the exponential degradation of the signal-to-noise ratio, and open new perspective for the computation of baryon masses and matrix elements in lattice QCD.

Conclusions
The decomposition of the lattice in overlapping domains leads to a factorization of the gauge-field dependence of the fermion determinant in QCD. Thanks to a multi-boson representation of the (small) interaction among gauge fields on distant blocks, the resulting action is local in the block scalar and gauge fields. It can be efficiently simulated by variants of the standard hybrid Monte Carlo algorithm. The measurements of local gluonic observables, such as the energy and the topological charge densities, reveal a good efficiency of the algorithm in updating the gauge field. No particular freezing of the links is observed. When combined with the factorization of the fermion propagator, these results pave the way for multi-level Monte Carlo integration in the presence of fermions, opening new perspectives in lattice gauge theory.
The numerical tests on gluonic and fermionic correlation functions carried out so far prove that the signal-to-noise ratio in those computations increases exponentially with the time distance of the sources when a two-level integration is at work instead of the standard one-level Monte Carlo. This represents a turning point for the computation of many interesting quantities sensitive to Standard Model and hopefully to beyond Standard Model physics: baryon masses and matrix elements (g A , . . . , < x > u−d ), the hadronic contributions to the muon g − 2, leptonic and semi-leptonic B decays, ρ , η , etc.
The factorization does not require a particular shape of the domains, nor does each of them need to be connected. What matters is a minimum distance of ≈ 0.5 fm among the blocks which are active during the level-1 updates. It is therefore already quite clear that its generalization to four dimensions would localize the simulations of theories with fermions, allow for very large volumes to be generated in master-field simulations [33], and open lattice QCD to a new class of physics problems.