Testing algorithms for critical slowing down

We present the preliminary tests on two modifications of the Hybrid Monte Carlo (HMC) algorithm. Both algorithms are designed to travel much farther in the Hamiltonian phase space for each trajectory and reduce the autocorrelations among physical observables thus tackling the critical slowing down towards the continuum limit. We present a comparison of costs of the new algorithms with the standard HMC evolution for pure gauge fields, studying the autocorrelation times for various quantities including the topological charge.


Introduction
One of the frontiers of recent Lattice QCD studies is providing precise, sub-percent, Standard Model predictions in the heavy quark sector to compare against the experiments. Accurately including dynamical heavy quarks with mass approaching several GeV in the discretised theory simulations requires fine lattice spacings, below 0.05 fm, approaching the continuum limit.
The continuum limit of a discretised theory is described by a critical point of second order where all the correlation lengths and the correlation times diverge with some power of the inverse lattice spacing, τ int ∝ a −ε . This problem is typically referred as critical slowing down. In general the (integrated) autocorrelation time is going to depend on the algorithm used to generate the Markov chain and on the observable under study. For the Langevin algorithm it has been proven [1] that this power is universal, ε = 2. For the Hybrid Monte Carlo scheme (HMC, sometimes referred as Hamiltonian Monte Carlo) [2] we do not know the actual exponents and there are claims that these may not be universal since the algorithm is non-renormalisable for the φ 4 theory [3].
Critical slowing down depends on how much the observable will couple to the low modes of the evolution algorithm in the configuration space. Some observables like the topological charge will be heavily affected by the increasing autocorrelation time. The cost to generate ensembles whose sampling is in practice ergodic will increase accordingly.
There are several studies in the literature on how to reduce the cost of configuration generation for theories close to the continuum limit [4][5][6][7]. In these proceedings we discuss two modifications of the HMC algorithm that have the potential to reduce the total cost of the expensive dynamical simulations with fermions. The first modification, called Riemannian Manifold HMC (RMHMC), modifies the kinetic term of the HMC Hamiltonian to speed up the evolution of the slowest modes. The second, referred as Look Ahead HMC (LAHMC), drops the detailed balance requirement in favour of a generalized principle with the effect of an increased trajectory length.
The first step is to show effective reductions of the autocorrelation times and related algorithm cost in the simple cases of SU(3) pure gauge theory and CP N models, that already exhibit critical slowing down [6,[8][9][10][11]].

Description of the algorithms
In this section we describe the details of the algorithms starting with a brief summary of the HMC to fix the notation.

Markov Chain Monte Carlo
The basic problem of simulating a quantum field theory is that we have to sample field configurations distributed according to some target probability p(x) ∝ exp(−S (x)) where S (x) is the corresponding action functional of the field x. If we are able to generate a Markov chain sequence with transition probabilities T (x |x) from the state x to state x that satisfies i.e. p(x) is a fixed point of the transition matrix 1 , then the expectation value of an observable can be computed by simply averaging the measurements over the generated ensemble. For this purpose, a sufficient but not necessary condition is for T (x |x) to satisfy the detailed balance relation: i.e. that for the pair x, x , the probability of going from state x to x is the same as the probability of reaching state x from x . Equation (2) automatically implies the fixed point condition, but introduces a random walk component in the HMC. This relation is satisfied by the accept-reject step in the HMC.

Hybrid Monte Carlo
The Hybrid Monte Carlo scheme [2] for the generation of correctly distributed configurations for a field φ extends the space with a fictitious time coordinate and generates new configurations according to the distribution q(φ): where P are the conjugate momenta for φ in the Hamiltonian evolution and it enters as a trivial Gaussian factor. An HMC update step can be easily described as an application of a sequence of operators. In order to fix our notation these operators are • L( , M), any integrator that is area preserving and reversible. Runs the integration for M steps of size .
Using these operators we can write the HMC in a compact form as follows. The x (t,n) stands for the configuration state (P, φ) at the point t of the Markov chain and at the step n of the evolution sequence to generate the new Markov chain state. We have 3 sub-states n = 0, 1, 2 in the HMC for the generation of a new state.
1. x = FLx (t,0) , where x (t,0) is the initial state. Then compute the acceptance probability using the Metropolis step and The FL step is necessary for the HMC in order to cancel the contribution of forward and backward transition probabilities in the Metropolis-Hastings acceptance probability of state x , π(x|x )

Riemannian Manifold HMC
One issue in the evolution induced by HMC algorithm is that in general, in the configuration space we have fast modes of the transition matrix evolving quickly and some slow modes that will take longer time to complete a cycle and decorrelate. Fast and slow corresponds to large and small gauge forces in the HMC and the speed refers to the MD time required to evolve over a full period. The former will limit the integration step size from above, otherwise the integrator becomes unstable. The divergence of the ratio of the highest and the lowest frequencies determines the degree of critical slowing down. In light of this observation, an obvious solution is trying to make all the modes to evolve with the same frequency. We now follow the ideas of Duane et al. [12,13] on Fourier acceleration of slow modes in the evolution of the Markov chain. This leads to a non-separable Hamiltonian in the HMC which we integrate with the generalised leapfrog integrator [14] as suggested by Girolami and Calderhead [15]. The resulting algorithm acknowledges that the Hamiltonian manifold where the evolution takes place is not flat and modifies the kinetic term with a metric accordingly, e.g. with the inverse of the Laplacian operator.
Consider the simplest example of acceleration of the slow modes in a free scalar field theory with mass m that have characteristic frequencies ω 2 (p) = m 2 + p 2 . Let us study the evolution of the modified Hamiltonian with a new parameter M [13]: The Hamiltonian H has now characteristic frequencies: that we can tune in order to have a function independent on p and have perfect acceleration. The inverse in the new kinetic term is easily written in the momentum space, hence the name Fourier acceleration. The acceleration parameter for interacting theories will have to be determined, but a good choice is the renormalized mass of the scalar field [12]. Gauge theories pose several complications in this context, see [13]. First we need a covariant version of the kinetic term, that we are going to denote as the kinetic kernel. There are many possible choices for the kinetic kernel, in this proceedings we are going to use the covariant Laplacian, that for fields in the adjoint representation is: for its discretised version, U(x) being the gauge field. In the RMHMC Hamiltonian, the operator M acting on fields in the adjoint representation is written as where d is the dimensionality of the theory and we get maximal decoupling (in the free field theory) for κ → 1. Zero modes of the Laplacian operator constrain κ 1.
The resulting Hamiltonian equations are now non-separable and as such the standard Störmer-Verlet leapfrog integration algorithms (and their higher order versions) are not useful any more because they are not reversible and the Jacobian of the transformation does not have unit determinant, i.e. it is not volume preserving.
Leimkuhler and Reich [14] proposed a generalised leapfrog scheme that is reversible. In terms of the gauge field theory variables (π, U) it is written U n+1 = exp 2 δH δπ (U n , π n+ 1 2 ) + δH δπ (U n+1 , π n+ 1 2 ) U n (10) We can eventually concatenate more than one of these steps by symmetrising this sequence to make it reversible (new implicit steps in the update of π will appear). The implicit equations can be solved iteratively. The equation (9) is general, valid even when fermions are included. In particular notice that the implicit evaluation of the force term would not require any additional computation of the δH δU (U n ) force term during the iterative process.
The modification of the kinetic term introduces the unwanted determinant of the matrix M in the path integral. To cancel this term we introduced a new set of auxiliary bosonic fields φ µ in the adjoint representation with action Tr[φ µ Mφ µ + φ 2 ], see [12,13].
We implemented the RMHMC with implicit versions of the leapfrog and the minimum norm 2nd order integrators in Grid [16]. Furthermore, the code is ready for simulations with more complex actions.

Look Ahead HMC
It has already been shown that increasing the trajectory length in the HMC algorithm can improve the autocorrelation times, for the same or reduced cost [6,7]. The basic idea is to move further in the integration space before randomizing the momenta and let the slow modes complete their cycle.
In order to suppress the random walk behaviour induced by the introduction of the accept-reject step needed to ensure the detailed balance it has been suggested to modify the Metropolis acceptreject step [17,18]. The resulting probability distribution satisfies the fixed point equation but drops the detailed balance condition. The final effect would be a reduction of the rejection rate for the same step size and effective increase of the trajectory length, and eventually an optimization of the costs.
Using the notation introduced in section 2.2 the new LAHMC algorithm [17,18] consists in the following steps: • Repeat the integration accepting with the modified probabilities π L K (x (t,0) ) for a maximum number K of times.
The transition probabilities x → L a (x) are defined in order to satisfy the fixed point equation and are: where p(x) is the same probability function as in equation (4). These transition probabilities do not satisfy the detailed balance relation but a more generalized set of identities, called generalized detailed balance [18]. The target distribution is still a fixed point of the evolution algorithm. Please refer to equations (28)-(33) of [17] for a straightforward demonstration of this fundamental result. Notice that for K = 1 the LAHMC reduces to the usual HMC. The practical implementation of equation (13) involves a solution of an implicit set of equations. The LAHMC has been implemented in Grid [16] and the two new parameters are K, the maximum number of repetitions, and α, the mixing factor in the momenta generation. Prolonging the integration has a high chance of decreasing the rejection rate since we are working with symplectic integrators that fluctuate around the correct integration surface during the evolution.

Preliminary numerical results
We now present some of our preliminary results on the efficiency and cost of these two algorithms in the case of the pure gauge SU(3) and the CP N model with N = 10.
The gauge action is the standard Wilson action with β = 6.2 (a = 2.9 Gev −1 = 0.068 fm). The lattice size is L 4 = 16 4 . The integrator is always the 2nd order minimum norm [19], in its standard or implicit version (RMHMC). We define N MD as the number of molecular dynamics steps in a trajectory and τ as the trajectory length. We measure the observables every 5 generated configurations and all the autocorrelation times are consistently reported in these units. For the LAHMC we tested different values of α but these had no appreciable effect and are not included in the plots due to larger errors. The parameters of the run LA1 were chosen to reduce the cost and Ref0 (standard HMC) to match the acceptance rate.
Runs for the CP N model are performed at two different β and volumes: β = 0.7, L = 42 and β = 0.9, L = 90 respectively. Measurements are performed after every trajectory. The average cost for an observable O in molecular dynamics steps units per configuration, C O , is defined as: where N traj and N conf are respectively the number of trajectories generated (in units of τ) and the number of new configurations generated per ensemble. The integrated autocorrelation time of O is indicated by τ int (O). For the HMC and the RMHMC case N traj and N conf are the same. This is not the case for the LAHMC algorithm where for each new configuration we can travel more than one trajectory length. C represents the average number of molecular dynamics steps per new configuration. Travelling for one molecular dynamics step costs exactly the same for the algorithmsin terms of computations of the force terms δH δU (S (U n )) -so use this quantity here for a fair comparison (also in view of the application of these algorithms to the more expensive simulations with dynamical fermions). In the case of the RMHMC we are not including the cost of the inversion of the Laplacian and the implicit steps. The Laplacian inversion is relatively cheap, converging in O (20) iterations with very mild dependence on κ. The implicit steps converge in less than 10 iterations and require no additional computations of the force terms relative to the action S (U), very expensive in the case of fermion actions. These steps would be negligible when dynamical fermions at the physical point are included in the simulation, since the fermion force term does not have to be evaluated for each implicit step.
In the SU(3) case the observables are measured after applying some smearing (Wilson Flow, τ flow = 2.0): they are the average action S (U) and Q 2 , that is proportional to the topological susceptibility. Integrated autocorrelation times are estimated following [20]. The observables for the CP N model are the zero momentum projected 2-point function, G 0 , and the topological charge.
The results for the absolute autocorrelation time and the cost estimate in the SU(3) pure gauge case are reported in figure 1. The cost estimate shows promising results for both the RMHMC and the LAHMC that are almost equivalent at the maximal acceleration.
The results for the CP N model in figure 2 are even more striking showing a reduction of one order of magnitude of the autocorrelation times of G 0 . It is important to notice that the RMHMC algorithm effectiveness is increasing towards the continuum limit β → ∞.

Summary
We presented our preliminary investigation on two variants of the Hybrid Monte Carlo [2] in order to reduce the cost of configuration generation in the region of the parameter space useful for high  precision flavour physics. The Riemannian Manifold HMC (RMHMC) [12,13,15] modifies the kinetic term in the HMC Hamiltonian to acknowledge the presence of faster and slower modes during the evolution. The Look Ahead HMC (LAHMC) drops the detailed balance condition in favour of a generalized detailed balance that would reduce the random walk behaviour that the acceptance-reject step introduces and increase the effective trajectory length at a lower cost.
The preliminary results presented in these proceedings show that both algorithms have the potential to reduce the computations required to generate independent configurations and be ergodic.
Moreover the two algorithms act in orthogonal directions and thus can be easily combined for even better decorrelation, a possibility to be tested. Another direction of research, once we confirm the viability of these methods, is the combination of these algorithm with other methods proposed in the literature to improve the ergodicity by keeping the costs under control.
We are currently increasing the statistics of our runs and probing the hyper-parameter space. In the pure gauge case we have runs for lattices L 4 = 32 4 for both algorithms at β = 6.4 in order to check the scaling toward the continuum limit as done already for the CP N case.