Selected inversion as key to a stable Langevin evolution across the QCD phase boundary

We present new results of full QCD at nonzero chemical potential. In PRD 92, 094516 (2015) the complex Langevin method was shown to break down when the inverse coupling decreases and enters the transition region from the deconfined to the confined phase. We found that the stochastic technique used to estimate the drift term can be very unstable for indefinite matrices. This may be avoided by using the full inverse of the Dirac operator, which is, however, too costly for four-dimensional lattices. The major breakthrough in this work was achieved by realizing that the inverse elements necessary for the drift term can be computed efficiently using the selected inversion technique provided by the parallel sparse direct solver package PARDISO. In our new study we show that no breakdown of the complex Langevin method is encountered and that simulations can be performed across the phase boundary.


Introduction
The lattice simulations of QCD at nonzero quark chemical potential are strongly hampered by the sign problem, caused by the complex fermion determinant. The complex Langevin (CL) method has drawn a lot of attention in recent years as a potential solution to this problem [1]. Nevertheless, careful studies have shown that the method can break down or, even worse, can converge to the wrong solution, if the trajectories make excursions too far into the SL(3,C) plane or come too close to a singularity of the drift [2,3]. Although conditions were derived that have to be satisfied for the CL solutions to be valid, the matching of these conditions can only be verified a posteriori. Therefore, it cannot be excluded that the violation of the validity condition is due to numerical inaccuracies rather than to a theoretical deficiency of the method for the model being considered.
The first successful application of the CL method to QCD was made in the heavy dense approximation [4]. For full QCD, the method was shown to work correctly in the deconfined phase, when the inverse coupling β is large enough; however, it breaks down when β gets smaller and the system crosses the phase boundary, such that no solutions are found in the confined phase [5]. Other studies at larger β and larger volumes also seem to converge to incorrect solutions [6]. Recently there were suggestions to modify the CL evolutions through dynamical stabilization [7] or deformation [8], but the extrapolations needed to recover the original theory are not yet well controlled.
Speaker, e-mail: jacques.bloch@ur.de CL? Figure 1. Sketch of the QCD phase diagram as a function of baryon chemical potential and temperature. The simulations of [5] are performed at µ/T = 1 with decreasing β, which follows the gray arrow across the roof of the phase transition. Figure 1 gives a sketch of the QCD phase diagram as a function of temperature and chemical potential. The simulations reported by Fodor et al. [5] follow the gray arrow through the roof of the phase transition; however, the CL simulations break down when crossing the phase boundary, and no results were found inside the confined phase. In these simulations, the temperature was lowered by decreasing β on an 8 3 × 4 lattice with µ/T = 1 and m = 0.05. For these parameter values the critical temperature corresponds to β c ≈ 5.04 at µ = 0. The results for the Polyakov loop and its inverse, the temporal and spatial plaquettes, the chiral condensate, and the quark number density published in [5] are reproduced in Fig. 2. In these plots the CL results are compared with data reweighted from the µ = 0 ensemble, however, CL results are only available for β ≥ 5.1 as the CL simulations became unstable below this value.
In this presentation we will show that the breakdown observed in [5] can be cured and stable CL solutions can be found for smaller β when the drift is computed exactly, rather than being estimated with stochastic techniques.

Complex Langevin for QCD
The lattice QCD partition function is given by with Wilson gauge action S g , staggered Dirac operator D, and links with Gell-Mann matrices λ a and link parameters z axν . After discretization of the Langevin time, the CL evolution of the links in SL(3, C) is described by where, in the stochastic Euler discretization,  with Langevin step and Gaussian noise η axν . The evolution is driven by the drift (5) with complex action S = S g − log det D. In the following we will focus on the fermionic drift, where ∂ axν D is the partial derivative of D wrt the variables z axν .

Fermionic drift term and selection inversion
When looking at the breakdown of the CL method in [5] it is not immediately clear if the violation of the CL validity conditions is genuine or rather due to numerical approximations. The traces in the fermionic drift (6) involve the inverse Dirac operator. Computing the full inverse, for example, with Lapack, is too expensive for four-dimensional QCD, both in terms of CPU time and storage space. Until now, the explicit computation of D −1 was avoided by using stochastic estimators for the traces, Although the merit of this technique is undisputed for positive-definite matrices, it can be problematic when applied to indefinite matrices, as the number of noise vectors needed to get a good and stable estimate may be extremely large. As current CL algorithms typically estimate traces using a single noise vector, they rely on choosing tiny, such that consecutive CL steps are highly correlated and effectively provide an improved estimator to the trace after many Langevin steps. A potential danger is that this strategy may destabilize the discrete time evolution beyond repair.
To clarify this situation we decided to investigate how using exact traces affects the stability of the CL evolution. As already mentioned, the exact traces require the inverse of the Dirac matrix, which is too costly to compute in full. Below we will analyze the drift term further and show that a relatively new numerical method, called selected inversion, can be used to compute the drift exactly, while saving CPU time and storage space. This method also allows us to use a larger , as the very small values used before were only needed to stabilize the drift when using the stochastic technique.
Consider the fermionic drift K f axν of Eq. (6). The derivative matrix ∂ axν D is zero except for two 3 × 3 blocks at the positions of the link U xν , so that the drift looks like In the matrix product the sparse derivative matrix effectively selects out the two corresponding columns of D −1 such that this can be rewritten as where C x+ν × B and C x × F, respectively, give the columns x and x +ν of D −1 ∂ axν D. Only two 3 × 3 blocks of D −1 contribute to the trace of this matrix product, so this simplifies further to Each drift term can thus be written as and, hence, the computation of the drift only requires elements of D −1 where D itself is nonzero. Note that, the same statement holds for the computation of the fermionic observables. This observation is what brought us to consider the selected inversion technique to compute the drift term. The method consists of a sparse LU-factorization followed by a selected inversion, which exactly computes selected elements of the inverse D −1 of a general matrix D. The subset of selected elements is defined by the set of nonzero entries in D. The method is based on the fact that this specific subset of D −1 can be evaluated without computing any inverse entry from outside of the subset. This is what substantially speeds up the computation compared to routines computing the full inverse. Moreover, as only the inverse elements of this subset are computed, they can be stored in sparse format. The parallel implementation of the selected inversion technique, as described in [9], can be found in the latest version of the parallel sparse direct solver PARDISO [10].
It is worthwhile to note that the selected inversion method is optimally used in the CL evolution, as the selected inversion method precisely yields all the elements of D −1 needed for the drift terms and for the fermionic observables.
In contrast to the stochastic technique, the selected inversion allows us to compute the drift exactly, and this in a way that is much more efficient than Lapack, both in terms of CPU times and storage space. In Fig. 3 we compare the scaling of the wall-clock time between the selected inversion from PARDISO, the full Lapack inversion, and the stochastic technique (where we used 100 intermediate steps due to the smaller choice of as explained after (7)) as a function of the lattice volume. There is a clear performance gain when using the selected inversion, which is a factor of 100 compared to the Lapack dense inverse for a lattice size of 8 4 . The comparison with the stochastic technique is more delicate as the stability of the CL evolution can be affected by the latter and merely comparing timings does not tell the whole story.
The volume scaling of the selected inversion seems to be somewhat better than N 3 . From other applications, the selected inverse is known to scale like N 2 for three-dimensional problems, however, as this is the first application to a four-dimensional problem the scaling has to be investigated further.

CL evolution for varying β
In this section we present the first results of the CL method using the selected inversion. The simulations were performed on the Xeon cluster at the ICS, Lugano. We performed the same study as in [5] and investigate QCD across the phase boundary as sketched in Fig. 1, decreasing the temperature from the deconfined to the confined phase by varying β for constant µ/T = 1 on an 8 3 × 4 lattice  with m = 0.05 (β c ≈ 5.04 at µ = 0). Whereas the CL method broke down below β = 5.1 in the original study, we are now able to generate stable CL trajectories for all investigated β values without any further tuning. Moreover, this was done using = 0.001, which is much larger than before. The length of the trajectories is 30 Langevin time (Lt) of which 5 Lt are discarded as thermalization. From the results, shown in Fig. 4, we see that the simulations perform well across the phase boundary, and the complete range from β = 5.45 to β = 4.6 can be simulated without any problem. The numerical implementation includes gauge cooling, to avoid excessive excursions in SL(3,C), and an adaptive step size such that the continuum trajectory is properly followed, even when the drift is large.
The fact that a stable solution is found does not necessarily mean that it is correct, as the CL method can converge to the wrong solution in some instances. In order to validate the CL results we show the histograms for the chiral condensate and the Polyakov loop for β = 5.4 and β = 4.8, in Fig. 5. The CL results are only to be trusted if the tails of the histograms decay exponentially. For the chiral condensate it seems that the validity might be problematic for β = 4.8, as the tails of the histogram are somewhat broad, but even for β = 5.4 the exponential decay in the tails is not so clear, even though the results agree with the reweighting results. The histograms for the Polyakov loop look  fine so far. From these data we conclude that the results may be incorrect below the phase transition, even though the histograms do not give a clear cut way to validate or invalidate CL measurements.

CL evolution for varying µ
As we found stable CL evolutions when decreasing β through the phase boundary, we also performed a partial study of the phase diagram and investigated QCD for varying µ at two values of the temperature. We chose β = 5.0 and m = 0.05 for which the pion mass is am π = 0.5588 ± 0.0002 and a = (0.3045 ± 0.0001) fm, such that m π ≈ 362 MeV. We work on an 8 3 × N t lattice and consider temporal extents N t = 4 and N t = 8, corresponding to T = 161.74 MeV and T = 80.87 MeV, respectively.
In Fig. 6 we show the results for the various observables as a function of µ for both temperatures. Although the results do not show any conspicuous behavior, there are no other results to compare with when the sign problem becomes large at nonzero µ. One value that can easily be checked is the value at µ = 0 as it can be computed using standard importance sampling method. From the figure it is clear that the chiral condensate and even the plaquette measured from the CL simulations is incorrect for µ = 0, even though they result from a convergent CL evolution. For µ = 0 the Langevin evolution should be real, and the wrong result is merely due to numerical inaccuracies. This is in fact easily remedied by reunitarizing the links after each Langevin step; however, we decided against this to be consistent with the µ 0 simulations. It is interesting to note that the measurements at nonzero chemical potential smoothly connect to the wrong µ = 0 value, and so we expect all of them to be incorrect. This argument is further supported through the lack of Silver Blaze phenomenon, as one would have expected a very slow µ-evolution of the observables up to the phase transition. The validity of the CL results can again be verified using the histograms of the measured observables. The histograms are very similar to those of the bottom row of Fig. 5: that of the chiral condensate does not seem to have the required exponential falloff, while that of the Polyakov loop does not show a problem. Still, the histograms are not what one would brand as extremely broad, and so the decision of the validity is a difficult call, even though the chiral condensate at µ = 0 is wrong by a factor three.

Summary and outlook
In this presentation we have argued that the breakdown of the CL method at the QCD phase boundary observed in [5] is in fact a numerical artifact due to the stochastic estimation of the drift. To compute the drift exactly, one needs the exact inverse of the Dirac operator, which cannot be computed in full with standard direct methods because it is too expensive in terms of both computer time and storage. However, we showed that the drift and the fermionic observables only require those elements of the inverse Dirac operator at the positions where the Dirac operator itself is filled, and, therefore, the selected inversion method, implemented in the sparse direct solver library PARDISO, can be applied. This allows for the exact computation of the drift term in a much faster way, using little storage space.
We observed that the Langevin evolution became stable and convergent when the exact drift term was used. This allowed us to study the QCD phase transition across the roof of the phase diagram, i.e., when decreasing temperature from the deconfined to the confined phase at constant µ/T = 1. Moreover, we were able to measure QCD observables as a function of µ at two values of the temperature below the deconfinement temperature.
Although these are preliminary results, there are clear indications that the CL results obtained at small mass in the confined region are incorrect, even though they are stable. There is therefore a need for further investigation of these CL results to understand if the validity problems are of a fundamental or numerical nature. The fact that we now get stable CL trajectories should allow us to dig deeper into this problem and to search for improved methods.