Status of Complex Langevin

I review the status of the Complex Langevin method, which was invented to make simulations of models with complex action feasible. I discuss the mathematical justification of the procedure, as well as its limitations and open questions. Various pragmatic measures for dealing with the existing problems are described. Finally I report on the progress in the application of the method to QCD, with the goal of determining the phase diagram of QCD as a function of temperature and baryonic chemical potential.


Sign problem
It is well known that in many instances the functional measure in Euclidean Quantum Field Theory is not positive, making standard importance sampling impossible. Examples are • The real time Feynman path integral, • theories with topological terms or a nonzero vacuum angle θ, • theories with nonzero chemical potential corresponding to nonzero density.
In such cases the functional measure is described by a complex (or nonpositive) density ρ on a real configuration space M. The following motto describes a strategy to deal with this 'sign problem': When the measure is complex, complexify the fields This means the following: for holomorphic observables O, one tries to represent the complex measure by a measure on the complexified configuration space M c which is either nonnegative or at least only mildly oscillating, making the sign problem less severe. If that measure is described by a density P, consistency requires that for all holomorphic observables O

Complex Langevin
After the method was proposed in 1983 by G. Parisi [19] and independently by J. Klauder [20], the 1980s and 1990s saw many studies, sometimes showing success, but sometimes the method failed to reach convergence and sometimes, even worse, it converged, but to an incorrect limit [21,22] not satisfying Eq.(1). This negative finding almost killed interest in the method, but the interest was rekindled by a paper J. Berges and I.-O. Stamatescu [23], which showed some success in computing correlation functions of a quantum field theory at real time.

Recall real Langevin
Real Langevin, also known as Stochastic Quantization was proposed by G. Parisi and Y.-S. Wu in 1981 [24] and developed further in particular by Batrouni et al [25] ; a comprehensive review is due to Damgaard and H. Hüffel [26]. It should not be confused with Stochastic Quantum Mechanics, invented already in 1966 by E. Nelson [27] (for a brief pedagogic review and comparison of both approaches see [28]).
The principle is the following: consider a real action S (� x) on some configuration space M (for simplicity we assume M = R N ). Then the real Langevin equation is where d� w is the increment of N dimensional Brownian motion; the corresponding Fokker-Planck equation, describing the time evolution of the probability density P iṡ Under rather general conditions it can be shown that P converges to the unique invariant probability density P ∞ ∝ exp(−S ) ; • the process is ergodic.
(Ergodicity means that there is only one measure invariant under the time evolution.) The justification of the real Langevin method is well known: By a similarity transformation −L T is converted into a positive semidefinite operator So the spectrum of L T lies on the positive real axis. H FP has a unique ground state |0� if and only if the process is ergodic. Then But there are counterexamples, as we will see below! They are related to zeroes of ρ = exp(−S ), which can of course mean that S has branch points and the drift � K has poles.

Go complex!
If S is complex, Klauder and Parisi simply postulate stochastic equations of the same form as before: where d� w is still the real Wiener increment, i.e. � dw = � η(t)dt where � η is white noise with covariance �η(t)η(t � )� = 2δ(t − t � ). Obviously the trajectories of the process will wander into the complexification M c = C N . The process, however, is still a real stochastic process, but now on M c , as seen by writing it for the real and imaginary parts: But the unavoidable question is: Why should this be right?

Justification of Complex Langevin
Early attempts [29,30] tried to justify the method using a relation involving analytic continuation of P in the form exp(i� y · ∂ � x )P(� x, � y; t): requiring consistency by and taking time derivatives. One problem here is that in general one does not know if P has the analyticity of P needed to make sense of the left hand side of this equation, espectially since we typically want to start with P(� x, � y; 0) = δ(� x)δ(� y).
Our approach [32] to the justification is different: let O be a holomorphic 'observable' of which we want to find the average with the density ρ, and assume that the drift � K = ( � ∇ρ)/ρ is at least meromorphic.
The evolution of the observable averaged over the process is then (according to Ito calculus) governed by the Langevin operator L: which has the formal solution O(� z; t) will be holomorphic wherever � K is, i.e. away from the poles of � K, so there the Cauchy-Riemann (CR) equations hold: The positive density P on M c evolves according to a Fokker-Planck equation The complex density ρ, on the other hand, evolves according to where L T c ≡ � ∇ x � ∇ x − K is the complex Fokker-Planck operator, which can be naturally extended to act on functions on M c .
The question is then whether the two evolutions are consistent in the sense that they lead to identical evolutions of expectation values for holomorphic observables, i.e. whether remain equal if they agree at t = 0. Consider the two differential equations determining the two evolutions Eq.(16): it can be shown that the right hand sides agree if we can use integration by parts without any boundary terms [32]. So that would ensure the desired consistency. A crucial role in the argument is played by the CR equation obeyed by O(� z; t). Let us list the obvious assumptions that were used: • agreement of initial conditions (irrelevant for t → ∞ if the process is ergodic), We sketch the idea of the proof: 1. The initial conditions agree 3. F(t, τ) ≡ P(x, y; t − τ)O(x + iy; τ): interpolates between �O� P(t) and �O� ρ(t) : (the last eqation involves integration by parts). Formally the quantity F(t, τ) is independent of τ: Integration by parts and the holomorphy of We assumed again that in the integration by parts there are no boundary terms at ∞ and at the poles of � K . In equilibrium we obtain the so-called consistency conditions (CC) expressing the stationarity of observables averaged over the process. These relations are equivalent to the Schwinger-Dyson equations [33]. If integration by parts without boundary terms is allowed, they are also equivalent to i.e. they express the fact that the equilibrium measure solves the stationary real Fokker-Planck equation. But together with some additional conditions the CC Eq.(23) are indeed sufficient to ensure correctness of the equilibrium measure (see [33]): Theorem: Consider a compact configuration space M. Assume that • Eq.(23) holds for a dense (in supremum norm on M) set of observables O, • 0 is a nondegenerate eigenvalue of L T c . Then the equilibrium measure of the CL process is correct, i.e.
The hardest point to check is the bound (second bullet point), but in some cases one can obtain negative evidence by its violation [33,40].

Mathematical problems
There are two serious mathematical problems connected with the CL method: (1) Existence and uniqueness of the stochastic process as well as the time evolutions generated by L, L c , L T , L T c for all t ≥ 0 are not known. In numerical applications, however, there never seems to be a problem, provided one uses a suitable adaptive step size as described in [34].
(2) Convergence of the positive density P to an equilibrium measure is not proven mathematically. What we would need is some information about the spectrum of L and L T , namely that the spectrum is contained in the left half of the complex plane, with a unique eigenvalue at the origin.
Already in 1985 Klauder and Peterson [35] remarked about the "conspicuous absence of general theorems" for such non-selfadjoint and non-normal operators. To my knowledge, the statement is still valid today.
We should, however, be pragmatic and not be deterred by this lack of mathematical rigor: numerically it seems that an equilibrium measure exists in all interesting cases; a necessary condition is the existence of an attractive fixed point of the drift � K, which holds in all cases of physical interest.

Practical problems
Being pragmatic, we still have to worry about some problems. These are: (1) Boundary terms at ∞ and at the poles of � K (if present), (2) Lack of ergodicity. Both problems may lead to failure of the CL method, as we will show below. The simulations have to be monitored for possible boundary terms.
Other authors have also formulated criteria for failure, for instance [36]. These criteria are interesting, but it seems they can also be subsumed under those mentioned above. Salcedo [37] proves constraints on the support of the positive measure on M c that allow in some simple models to predict failure of CL without the need to actually carry out a simulation.

Boundary terms at ∞
In typical cases of lattice gauge theories the configuration space M is compact whereas its complexification M c is not, for example where k is the number of links. It is a well-known fact that holomorphic functions grow at ∞, hence the drift � K as well as the observables will grow at ∞. This can lead to "skirts" or "tails" of the distribution of P, | � KOP| on M c . The presence or absence of boundary terms at ∞ under integration by parts will depend on the decay of the distribution of | � KOP| as we move towards ∞. In some toy models we are in luck: the equilibrium distribution is confined in a strip, i.e. at least the imaginary part remains bounded. An example [38] is  In this case the equilibrium distribution is confined in a strip provided B < √ 3 and the CL simulation gives correct results.
The plot Fig.1, taken from [38] shows the emergence of skirts as B is increasing, crossing the critical value √ 3; it displays histograms of the partially integrated equilibrium density The tails behave like O((x 2 + y 2 ) −3 ) and the results deteriorate as B becomes > √ 3.

Boundary terms at poles
This section follows largely our paper [40]. If ρ has zeroes in M c , � K has poles there. The evolutioṅ O = L c O then generically produces essential singularities at these locations.
In equilibrium poles may be • outside of the distribution: in this case they are harmless • at the edge of the distribution: the behavior of | � KP| determines success or failure • inside the distribution: success of failure depend on the behavior of | � KP| and ergodicity Nagata, Nishimura, Shimasaki [39] propose to monitor skirts of | � KP| at poles as well as at ∞. This is an excellent idea, allowing to analyze both kinds of boundary terms the same way.
We study these issues first in a simple one-pole model given by choosing y p = 1 for concreteness. The situation is illustrated in Fig.2: For β < 2n p /y 2 p we have P(x, y) > 0 for 0 < y < y p and the pole is on the edge of the equilibrium distribution, whereas for β > 2n p /y 2 p , P(x, y) > 0 if and only if 0 < y < y − and the pole is outside the equilibrium distribution; the upper strip is transient.
This behavior can be understood by looking at the "classical" flow diagrams, see fig. 5 and it can be shown rigorously to be as stated [38,40]. Fig 4 shows some results of CL simulations for n p = 1 and β = 0.8 , 1.6 , 3.2 and 4.8 as well as for n p = 2 and β = 1.6 , 3.2. In both cases the results improve dramatically with increasing β; for n p = 1, β = 1 is already large enough, whereas for n p = 2 CL results are still quite wrong for this β. The reason for this difference is that n p is multiplying the pole term in the drift and so with increasing n p the pull towards the pole becomes stronger.
Let us now focus on n p = 2. For β = 3.2 and 4.8 there is excellent agreement between simulation results and the exact values. For β = 4.8 (pole outside the distribution) this is to be expected, but it may be surprising that for β = 3.2, with the pole on the edge, CL still produces excellent results, unlike the situation for β = 1.6. The reason is the different behavior of the distribution near the poles To see what happens with poles inside the distribution, we turn to another simple model: the one-link U(1) model given by: where is a mockup of the fermion determinant in a lattice model. We show the classical flow portrait for β = 0.3 and three values of n p in Eq. (6). Note that besides the attractive fixed point of the flow at Im z = 0 there is a secondary one at Im z = ±π.
The equilibrium distributions for the same cases are shown in Fig.7. For n p = 1, 2 they show two almost separated regions defined by visible as the 'head' and 'ears', respectively. For n p = 4 we only see one region contained in G + . It can be seen that the process tends to avoid the poles, creating bottlenecks there. But it does not produce correct results (see [40]). The reason will be discussed in the next section, where we find that the CL processes restricted to G + or G − actually simulate correctly a different complex measure.
It is also interesting to study the effect of β here as well. We compare the results of simulations for β = 0.3 and β = 5 with the exact results in Fig.8 and see the failure of the simulation for β = 0.3, whereas for β = 5 there is perfect agreement, even though in both cases the drift has a pole in the same place. This of course due to the fact that the large value of β makes the attraction of the fixed point at x = 0 very strong.

Failure of ergodicity
Consider a special case of the real one-pole model:  The Fokker-Planck Hamiltonian related to this model (see Section 1.1) is H FP has two ground states Corresponding to these ground states are two equilibrium probability densities: The pole at the origin is a bottleneck, obstructing the crossing of the real CL process. Returning now to the one-link U(1) model Eq.(30) (Section 2.1), recall that there the poles are also bottlenecks. We venture a bold generalization:

Poles inside the distribution tend to form bottlenecks!
In the simulation for n p = 1, 2 , the process occasionally manages to cross between the regions G + and G − , but this might be entirely due to the nonzero step size. For n p = 4 no such crossing was observed, even in very long runs (Langevin time 25000).
Restricting the simulation process to G + (G − ), it turns out that it correctly simulates the integral from one pole to the other along a path C + (C − ) from one pole two the other contained in G + (G − )! C + is the path starting at the pole with negative real part and ending at the one with positive real part, whereas C − , using the periodicity of the system starts at the latter pole and moves right, ending at the first pole. This is shown in Fig.9 for n p = 1 and in Fig.10 for n p = 2. So probably for n p = 1, 2 and definitely for n p ≥ 4 there are two invariant (equilibrium) distributions and the process is nonergodic! In this simple model one can actually obtain correct results for the original problem by combining the two restricted processes with suitable, not always positive weights, w ± : where �O� ± are results of the CL simulation restricted to G ± . w + and w − are easily determined: they are with For our parameter set β = 0.3, κ = 2, μ = 1 the weights are: n p = 1 : w + = 1.09551, w − = −0.09551, n p = 2 : w + = 0.97267, w − = 0.02733, n p = 4 : w + = 0.99699, w − = 0.00301. (cf [40]). So the 'ears' region G − has already rather small weight for n p = 1, getting even smaller with increasing n p , so that for n p = 4 the 'head' region G + alone gives results so close to the exact ones as to be practically indistinguishable.
The point to take home from this discussion is: absence of boundary terms, i.e. sufficiently fast vanishing of the distribution approaching a pole is necessary for correctness. It is not, however, sufficient, because nonergodicity may mean that a restriction of the desired complex measure is being simulated.   Fig.9, but for n p = 2, and weights w + = 0.97267, w − = 0.02733.

Poles in QCD
The fermion determinant in QCD det(D / U + M) (40) in a finite volume is a polynomial in the matrix elements of the direct product of the S L(3, C) groups forming M c , so it always has zeroes somewhere in U ∈ S L(3, C). These zeroes are of course not isolated points, but rather form submanifolds of M c of codimension 2. It should be expected that this also may lead to boundary terms and/or non-ergodicity, which might be the reason for failure in some cases.
But there is some good news: Sexty [41] and Aarts et al [40] find that the eigenvalues avoid 0, at least for the parameters studied, so at least there are apparently no boundary terms here.
It can be hoped that in many cases the situation is as in the one-link U(1) model for n p = 4, i.e. that the region G + , where Re det(D / U + m) ≥ 0 already gives results sufficiently close to the correct ones.

Gauge cooling (GC)
Gauge cooling [42] is a necessary (not always sufficient) procedure for obtaining stable results from CL simulations of lattice gauge theories. It starts with the observation that under complexification of the configuration space of a lattice gauge theory the group of gauge transformations gets complexified as well. Let's be concrete and focus on lattice QCD. After integrating out the fermions the configuration space is M ≡ S U(3) ×N l which gets complexified to M c ≡ S L(3, C) ×N l , where N l ≡ # of links. The gauge group G ≡ S U(3) ×N s is complexified to G c ≡ S L(3, C) ×N s with N s ≡ # of sites. The point is that observables O, analytically continued to M c and invariant under G are also invariant under the larger, noncompact group G c . So in the ideal world of mathematics the expectation values of O will not change under gauge transformations from G c ; they also would not change if the Langevin process is modified by adding a component tangential to the gauge orbit to the drift � K. Nevertheless, such a modification of the process is necessary for the following reason: the Langevin process will move, due to buildup of rounding errors, exponentially fast into noncompact directions, far away from the unitary submanifold M, because the drift � K produced from a gauge invariant action is gauge covariant and has no component restricting the movement along the gauge orbits of G c . The link variables thus will soon be very large, whereas the gauge invariant observables composed of them will involve huge cancellations, so that their values will become unreliable. The process will become numerically uncontrolled. This exponential growth is seen in Fig.12, taken from [42] (which will be explained later). So some stabilization of the noncompact directions is definitely necessary.
This can be done either by interspersing some 'gauge cooling steps' between 'dynamical updates' [42] or by adding a suitable cooling term to the drift, as advocated by [43]. A cooling procedure first requires defining a suitable distance from the unitary submanifold M, called 'unitarity norm'; a simple choice is where the sum is over the lattice sites; but of course there are other possibilities (see for instance [39,44], which are sometimes preferable). F({U}) vanishes if and only if all the U's are unitary. Dynamical updates are defined, using a simple Euler discretization, by where the λ a , a = 1, . . . , 8 are the Gell-Mann matrices forming a basis of the Lie algebra of S U (3) and the drift is given by with the derivation D acting on a function f as where {U(δ)} means the variable U x,μ has been replaced by exp(iδλ a )U x,μ with all other variables unchanged. We define a 'gauge gradient' of the unitarity norm by The gauge cooling updates of the configuration are then given by whereα = �α; α determines the strength of the GC force, whereas � is a discretization parameter as in Eq. (42). Note that even ifα is not small, Eq.(46) is still a gauge transformation; it just might not be optimal for reducing F. Gauge cooling first was shown to work in simple Polyakov loop model, given by a 1D lattice consisting of N links with periodic boundary conditions. [42] Analytically this model reduces to a trivial one-link integral, but it is a useful laboratory.
with β 1,2 complex. Simulating this model by using 'uncooled' CL for the N links, one discovers that already for N = 16 one fails to reproduce the correct results. Adding sufficient gauge cooling, however, correct results are obtained.

GC for QCD and QCD inspired models
A more interesting application of GC is the study of the so-called heavy-dense QCD (HDQCD) model, which is obtained by dropping all spatial links from the Wilson fermion action. The plot Fig.12 is from a study of this model [42].  This model had been studied in [45] by reweighting, which is an exact procedure, but unfortunately limited to small lattices; here, however, those results serve as a standard to judge the correctness of CL simulations. In Fig.13 we show a comparison of the reweighted results with those of CL with GC on a 6 4 lattice, taken from [42]. It is seen clearly that for β ≥ 5.7 there is excellent agreement, whereas for β ≤ 5.6 there are considerable deviations.
A natural explanation is that these deviations are due to poles in the drift at the zeroes of the determinant, leading to boundary terms or nonergodicity; just as in the toy models discussed before, increasing β makes the process stay away from those dangerous places.
For full QCD there are also some results, some good, some not so good: In an exploratory study for β = 5.9 on a small (4 4 ) lattice Aarts et al [46] used the expansion in the spatial hopping parameter κ s for Wilson fermions, truncated at rather high order (up 50) to see convergence; the hopping parameters κ s = κ were chosen to be 0.12. Agreement of CL for the truncated expansion with CL for the full model supports correctness for this rather large value of β.
Sexty [41] and later Aarts et al [40] produced results for staggered quarks on lattices up to size 12 3 × 4; there are deviations at smaller β.
A very detailed study by Fodor et al [47] compared CL with GC for QCD with staggered fermions to multi-parameter reweighting results on lattices up to 16 3 × 8 and again found deviations for smaller β.
Kogut and Sinclair [48,49]   the results can be checked by comparing with ordinary Monte Carlo simulations. By increasing β to 5.7 that last discrepancy disappeared, in accordance with our general observation that larger β values improve the situation. Their results show rather large values for the unitarity norms, which gives already reason for suspecting that the results are not correct (see next section).
Finally I should mention the very recent work by Bloch and Schenk [50] which is using a better matrix inversion method ('selected inversion') for the Dirac operator. This improves greatly the stability of the CL simulations, but brings out the deviations for smaller β even more clearly.
So this suggests that the incorrect results for QCD at smaller β are not due to numerical instabilities, but rather to a more fundamental problem of CL, as discussed earlier in this talk. Some more encouraging results for larger β can be found, however, in [51].

Limitations of gauge cooling and how to deal with them
We have seen that CL even with gauge cooling has some issues, which are quite likely due to the zeroes of the fermion determinant. Various cures for this problem have been tried: Nagata et al [39] suggest to use different unitarity norms, Bloch et al [52] propose a variation of the GC procedure. In both cases problems at small β remain. Bloch [53] suggests reweighting of CL trajectories; while this works well in simple models, on larger lattices one has to expect overlap problems as always with reweighting.
In my view the most promising, though not sufficiently understood cure is the so-called dynamical stabilization proposed by F. Attanasio and B. Jäger [54][55][56]. The following plot Fig.14 taken from [57] and showing once more a simulation of HDQCD, gives some indication about what is going on: if we focus on the purple data, showing the evolution of a certain observable under CL with GC, we see that there is some kind of metastability up to t ≈ 70, with the data fluctuating around the correct average value. Then the process wanders off fluctuating around a different (and incorrect) value.
• Insuffient decay both at ∞ and at poles may lead to boundary terms spoiling correctness of a CL simulation. Therefore 'skirts' or 'tails' have to be monitored carefully. • Gauge cooling eliminates some of the 'skirts', but there still may be insufficient decay, so monitoring of them is still necessary. • Poles are harmless if the process stays away from them; monitoring this in QCD is costly, unfortunately. • The hopping expansion can sometimes avoid poles (see [46]), but it may still hit problems when κ approaches the continuum value. • The best hope to cure the remaining problems seems to be the 'dynamical stabilization', but a better understanding why this works seems necessary. • In the context of the previous point, it seems highly desirable to gain a deeper understanding of the reason for the observed metastability in QCD, and more generally, understand the essentials of the fixed point and pole structure.