Applications of Lattice Gauge Equivariant Neural Networks

. The introduction of relevant physical information into neural network architectures has become a widely used and successful strategy for improving their performance. In lattice gauge theories, such information can be identiﬁed with gauge symmetries, which are incorporated into the network layers of our recently proposed Lattice Gauge Equivariant Convolutional Neural Networks (L-CNNs). L-CNNs can generalize better to di ﬀ erently sized lat-tices than traditional neural networks and are by construction equivariant under lattice gauge transformations. In these proceedings, we present our progress on possible applications of L-CNNs to Wilson ﬂow or continuous normalizing ﬂow. Our methods are based on neural ordinary di ﬀ erential equations which allow us to modify link conﬁgurations in a gauge equivariant manner. For simplicity, we focus on simple toy models to test these ideas in practice.


Introduction
In the past decade, neural networks (NNs) have been established as an essential tool with numerous applications in e.g. computer science and the natural sciences.The crucial role played by symmetries in a large amount of scientific problems has attracted the idea that the inclusion of such symmetries in the NN architecture could be beneficial in enhancing their performance.For example, convolutional neural networks (CNNs) are based on the inclusion of translational symmetry as an inherent property of their architecture.In computer vision problems such as image classification, this proved to be a powerful idea, due to the fact that the position of a particular feature that has to be detected is irrelevant and it is just the presence of the feature that matters.This approach has been generalized to include other symmetries in group equivariant convolutional neural networks (G-CNNs) [1], which take into account not just translational, but e.g. also rotational and reflection symmetry.Recently, this idea has been further extended to local symmetries [2].The more general framework dealing with symmetries in neural networks is called geometric deep learning [3].
In theoretical physics, and more specifically in lattice field theories with global symmetries, CNNs or G-CNNs have been successfully applied to solving regression problems and detecting phase transitions [4][5][6][7][8][9][10].In the context of Abelian and non-Abelian gauge theories, which exhibit local symmetries, there has been progress in the direction of incorporating  [17] gauge symmetry in the network architecture [11][12][13][14][15].For example, gauge equivariant normalizing flows [11,12,15,16] can be used in place of Monte Carlo simulations to sample uncorrelated gauge configurations while retaining gauge symmetry.Similarly, a lattice gauge equivariant convolutional neural network (L-CNN) was proposed in our paper [17], in which the elementary layers of the architecture individually preserve gauge symmetry.L-CNNs have been used successfully for regression tasks and in principle can also be employed for the generation of gauge configurations.The continuous flow approach proposed in [18,19] provides a continuous generalization of normalizing flows applied to lattice field theory.In contrast to normalizing flows, this continuous formulation allows for a straightforward inclusion of exact global symmetries.At its core, continuous flows are an application of neural ordinary differential equations (NODEs) [20], which are ordinary differential equations (ODEs) parametrized by NNs.
In these proceedings, we first review the basics of lattice gauge theory and the L-CNN architecture, and we show how NODEs can be modified to study Wilson flow [21] and exemplify it with an SU(2) toy model.

Lattice gauge equivariant neural networks
Lattice gauge theory is a discretized version of SU(N c ) Yang-Mills theory, in which spacetime is approximated by a periodic hypercubic lattice in D + 1 dimensions with imaginary time and with lattice spacing a.In the lattice discretizations, the continuous gauge fields A µ are replaced by the link variables U x,µ via the following definition: where P denotes path ordering, g is the coupling constant and the integral is performed over the straight line connecting the site x to the site x + μ.The gauge fields are elements of the su(N c ) algebra, while the links can be interpreted as the parallel transporters along the lattice edges and live in the SU(N c ) group.In practice, we employ the fundamental representation of U x,µ , where links are represented as complex N c × N c matrices.It is possible to multiply adjacent links, the repetition of which leads to arbitrary Wilson lines.If the start and end point of a Wilson line coincide, closed loops are formed and are called Wilson loops.The simplest Wilson loop on a hypercubic lattice is the plaquette given by , 09001 (2022) https://doi.org/10.1051/epjconf/202227409001t h Quark Confinement and the Hadron Spectrum EPJ Web of Conferences 274 XV The Wilson action [22], formulated in terms of plaquettes, is equivalent to the Yang-Mills action in the continuum limit a → 0. A general lattice gauge transformation applied to links induces a local transformation of the plaquettes These transformations leave the Wilson action unchanged, meaning the theory is invariant under SU(N c ) lattice gauge transformations.
In order to build up an L-CNN [17], we can consider its individual layers, each of which is designed to respect gauge equivariance.First, the input consists of the set of gauge links U of a particular lattice configuration and locally transforming objects W, which in practice we choose to be the plaquettes, but can also be the Polyakov loops (closed Wilson lines wrapping around the periodic boundary of the lattice).The L-Conv layer is a gauge equivariant convolution, which acts as a parallel transporter of locally transforming objects W, while L-Bilin performs a multiplication of such objects (more specifically, it is a bilinear layer).We proved that the repeated application of these two operations can grow arbitrarily sized Wilson (or Polyakov) loops.Moreover, it is possible to introduce non-linearity via L-Act layers, which behave like activation functions in traditional CNNs.The Trace layer yields a gauge invariant output that can be passed to a traditional CNN.A possible realization of such a network is depicted in Fig. 1.By virtue of the ability of generating any loop and the non-linearity, L-CNNs can be seen as universal approximators of gauge-equivariant functions on the lattice.
Among the relevant results found in [17], it is worth mentioning that the L-CNNs performed very well on the regression of Wilson loops up to a size of 4×4 and simple observables such as the topological charge density, beating traditional CNNs on the same task.

Adaptation of NODEs to lattice gauge theory
NODEs are ODEs parametrized by neural networks [20].As in the original paper, we will focus on first order ODEs: The unknown function z(t) is a time-dependent D-dimensional vector and f(z(t), θ, t) is a D-dimensional function parametrized by a priori unknown weights θ.In particular one can choose f(z(t), θ, t) to be represented by a NN.NODEs can be understood as generalizations of residual networks [23] with continuous depth, where the time coordinate t is used in place of the discrete depth of the network.Starting with an input state z 0 = z(t 0 ) at t = t 0 , the NODE can be formally solved by which provides predicted states z(t 1 ) at some final time t = t 1 .In this manner, NODEs map arbitrary input states to output states similar to generic NNs.The mapping depends on the NN architecture and the weights θ.NODEs can thus be used to solve regression problems: given a dataset characterized by the initial conditions z i 0 = z i (t 0 ) (where i ∈ {1, . . ., N samples }), which are used as input, and the desired output vectors zi 1 , which represent the labels, the weights θ can be optimized such that the final states approximate the labels as accurately as possible.In practice, this is done with the aid of an ODE integrator, such as Euler or Runge-Kutta.We can require that the discrepancy between the labels and the predicted final states is minimized by introducing a loss function such as the mean squared error (MSE), L(θ) = i (z i 1 − z i (t 1 )) 2 /N samples , and run the training procedure in order to optimize the weights θ.
While the approach described above only uses the final state labels in the optimization problem, it is also possible to include the discrepancies of the whole state evolution (i.e. more points t j along the trajectory z(t j )) in the loss function for successful training.If we only use the final states at t 1 , then it is crucial that the dataset provides sufficient information to reconstruct the underlying dynamics.
We can adapt the previous scheme to study continuous flow applied to lattice gauge configurations dU x,µ (τ) dτ where U x,µ ∈ SU(N c ) are gauge links, τ is flow time and H µ [U(τ), θ, τ] is a NN parametrized by the weights θ with a traceless and Hermitian output.This last requirement guarantees that the gauge links do not leave the group during the evolution.In order to retain gauge equivariance, H µ can be modeled with an L-CNN.
Our dataset consists of the initial conditions U i x,µ,0 = U i x,µ (τ 0 ) and the desired output configurations Ũi x,µ,1 , which define input and labels respectively.A standard ODE integrator would in general break the group structure, so we make use of the iterative application of the exponential map for time evolution.Since H µ is traceless and Hermitian, i.e. it can be understood as a su(N c ) algebra element, the links remain in the group manifold.The final configuration is then used in a loss function, such as where ∥ . . .∥ denotes the Frobenius norm.

SU(2) Wilson flow toy model
We test the adaptation of the NODE approach to Wilson flow [21] using a toy model consisting of a single SU(2) link characterized by the action S [U] = Re Tr (U 2 ).Starting with randomly distributed initial matrices U i 0 ∈ SU(2), we generate a dataset of flowed matrices Ũi 1 by applying gradient descent on S [U] using group derivatives akin to Wilson flow in lattice gauge theory.The action exhibits two minima, ±1, toward which Wilson flow lets the links evolve depending on the initial conditions.If Tr U > 0, the link is flowed toward the north pole (+1), otherwise the dynamics is directed toward the south pole (−1).For links with TrU = 0, the dynamics are stuck.Our goal is to use NODEs to reconstruct these dynamics via the flow Eq. (8).
In order to visualize the dataset, we use the following parametrization of SU( 2) ,  where σ i are the Pauli matrices.We then normalize u 0 , u 1 and u 2 by introducing ũ j = u j / u 2 0 + u 2 1 + u 2 2 for j ∈ {0, 1, 2}, so that each ũ j lies on a three-dimensional sphere, while the remaining parameter u 3 determines the color of a point, as shown in Fig. 2. As anticipated, the link variables, which are initially homogeneously distributed on the sphere, flow toward one of the two minima, which in Fig. 2 correspond to the north and south pole of the sphere.
Training NODEs in principle needs backpropagation through the ODE solver.The problem with standard backpropagation is that it requires to keep the whole evolution of the system, which leads to increased memory consumption.For more complicated systems, this can easily saturate memory.A solution to this problem lies in the adjoint sensitivity method [20], which avoids having to store the entire history of the Wilson flow by solving the evolution backwards.Our implementation of this method is still a work in progress, so for this simple system we rely on standard backpropagation.
For our model, the matrix H in Eq. ( 8) is constructed with the following steps: the complex entries of U are split into real and imaginary parts.They are then fed into a multi-layer perceptron with real weights.The output is recombined into a complex matrix, which is generally neither Hermitian nor traceless.Therefore, we take its anti-hermitian traceless part, [C] ah = C −C † /(2i) − 1 Tr C −C † /(2iN c ), which projects the output onto the su(2) algebra.The application of the exponential map yields a matrix in SU(2).This guarantees that the evolution of U takes place without leaving the group.
We choose the Frobenius norm averaged over the lattice as our loss function and train on a dataset of 50000 samples using a batch size of 100 and a learning rate of 10 −3 for 100 epochs.The multi-layer perceptron we employed has four hidden layers with 16, 64, 32 and 16 nodes respectively.We use tanh(x) as an activation function after every layer except the last.
After training, we test on 4000 samples with the same final  Since the loss function in Fig. 3 (b) seems to increase quadratically as a function of flow time, we investigate the loss at larger times τ > 1 outside the training interval.In Fig. 4 we test 4000 samples and extrapolate to flow times up to τ = 10.The deterioration of the performance is clear, since the loss jumps up to values that are three orders of magnitude larger compared to the highest loss found during testing in the interval τ ∈ [0, 1].Investigating our data more closely, we found two types of mispredictions which can contribute to a large loss.For one specific sample in our test dataset, the predicted trajectory moves in the opposite direction of the actual one.There are also some trajectories at large times τ that tend to overshoot the ground truth values.In both cases, we found that these trajectories originate from points that lie within a thin neighborhood of the equator (TrU ≈ 0) and are in general very difficult for the network to evolve correctly.Despite these flaws, the results are encouraging, considering that we employed a simple multi-layer perceptron which is not adapted to the symmetries of the problem.Therefore, a network structure incorporating additional symmetries (U → ΩUΩ † with Ω ∈ SU(2)), could further improve the performance in this toy model.

Conclusions and outlook
In these proceedings, we reviewed the structure of L-CNNs and their successful application in regression tasks.These architectures are very flexible and their layers can be composed to modify gauge link configurations.We discussed how this can be achieved in the context of NODEs and tested it on an SU(2) Wilson flow toy model with a single link.Based on these experiments, we intend extend the toy model to actual lattice configurations and apply L-CNNs to Wilson flow.

Figure 2 :
Figure 2: Visualization of 1000 samples from the dataset.Left: distribution of the initial conditions U i 0 .Right: distribution of the labels Ũi 1 , found by evolving the initial conditions according to the action S [U] = Re Tr (U 2 ) up to τ 1 = 1.
Wilson flow time τ = 1.The results are shown in Fig. 3.The left panel shows the trajectories of the ground truth (blue) and the predicted trajectories (red) during the NODE flow.The right panel shows the MSE as a function of flow time.Since the loss of 5 • 10 −6 is very small, the two evolutions are visually indistinguishable., 09001 (2022) https://doi.org/10.1051/epjconf/202227409001t h Quark Confinement and the Hadron Spectrum EPJ Web of Conferences 274 XV (a) Ground truth and predicted trajectories (b) Loss as a function of flow time

Figure 3 :
Figure 3: Test results.(a) Evolution of 30 samples projected on the three-dimensional sphere.The ground truth and the prediction lie on top of each other.(b) The corresponding Frobenius loss as a function of flow time.

,
09001 (2022) https://doi.org/10.1051/epjconf/202227409001t h Quark Confinement and the Hadron Spectrum EPJ Web of Conferences 274 XV (a) Ground truth and predicted trajectories (b) Loss as a function of flow time

Figure 4 :
Figure 4: Results of our extrapolation up to τ = 10 based on training up to τ = 1.(a) Extrapolated evolution of 30 samples.The ground truth and the prediction are very close.(b) Loss function as a function of flow time.At large times the loss increases by up to three orders of magnitude compared to the original interval τ ∈ [0, 1].