openQ*D simulation code for QCD+QED

The openQ*D code for the simulation of QCD+QED with C$^\star$ boundary conditions is presented. This code is based on openQCD-1.6, from which it inherits the core features that ensure its efficiency: the locally-deflated SAP-preconditioned GCR solver, the twisted-mass frequency splitting of the fermion action, the multilevel integrator, the 4th order OMF integrator, the SSE/AVX intrinsics, etc. The photon field is treated as fully dynamical and C$^\star$ boundary conditions can be chosen in the spatial directions. We discuss the main features of openQ*D, and we show basic test results and performance analysis. An alpha version of this code is publicly available and can be downloaded from http://rcstar.web.cern.ch/ .


Introduction
The calculation of isospin-breaking corrections to hadronic observables from lattice simulations requires a theoretically sound definition of charged-hadron states in finite volume. As discussed in ref. [1], a possible way to preserve locality and gauge-invariance is to employ C boundary conditions in the space directions.
We present here the openQ*D package, which can be used to simulate QCD+QED or QCD in isolation with C boundary conditions. In the context of isospin-breaking corrections, the simulation of QCD in isolation is useful within the framework of the RM123 method [2], in which observables are calculated order-by-order in the electromagnetic coupling.
An alpha version (openQ*D-0.9a2) is publicly available and can be downloaded from http:// rcstar.web.cern.ch/. This version of the code has already passed a large number of tests, performed by means of check programs that are provided along with the code. On top of this, several core featurs of openQ*D-0.9a2 have been compared with an independently developed code based on HiRep and presented by one of the authors in another talk of this conference [3]. We are currently working towards a fully-tested and stable release, which we plan to make available before the end of the year.
The general features of the openQ*D code are discussed in section 2. In sections 3 and 4 we discuss some exploratory runs, and selected issues related to simulations with C boundary conditions, with or without dynamical U(1) field.
2 openQ*D code: general features openQ*D is an extension of the openQCD code [4,5], from which it inherits the core features, and most notably the Dirac operator and the solvers. The openQ*D-0.9a2 code supports: • Simulation of the SU(3) gauge theory (inherited) and the SU(3)×U(1) gauge theory (extension).
• A one-parameter family of SU(3) gauge actions, built with plaquettes and planar double-plaquettes.
• A one-parameter family of U(1) gauge actions, built with plaquettes and planar double-plaquettes (extension).
• O(a)-improved Wilson quarks [8] in the fundamental representation of the SU(3) gauge group (inherited) and generic electric charge (extension).
• Rational approximation to a generic fractional power of the fermion determinant with or without twisted-mass reweighting (extension).
• A choice of solvers (including multi-shift conjugate gradient and highly optimized deflated solvers) for the Dirac equation, separately configurable for each force component and pseudofermion action (inherited).
In future versions of the code, a non-compact U(1) action and the Fourier acceleration for the U(1) field will be included. In this section the distinctive features of the openQ*D code will be described in some detail.

C boundary conditions
openQ*D can simulate C boundary conditions in one or more space directions, by means of an orbifold construction. In short, we say that a space direction k is C-direction (resp. P-direction), if fields satisfy C (resp. periodic) boundary conditions along the direction k.
Consider a L 0 × L 1 × L 2 × L 3 lattice, which is referred to as physical lattice, with C boundary conditions along direction 1 and possibly along directions 2 and 3. The code simulates an L 0 × (2L 1 ) × L 2 × L 3 lattice, which is referred to as extended lattice. The sets of points of the extended and physical lattice are denoted by Λ ext and Λ phs respectively, i.e.
The physical lattice is identified as a subset of the extended lattice. The set of points Λ ext \ Λ phs is referred to as the mirror lattice. The fundamental fields in the openQ*D-0.9a2 code are the SU(3) link variable U(x, µ) and the real photon field A(x, µ). Only the compact formulation of QED is considered here, therefore all observables are written in terms of the U(1) link variable and the real photon field can be restricted to −π ≤ A(x, µ) < π. C boundary conditions along direction 1 on the physical lattice are given by The charge-conjugation matrix C satisfies the following conditions On the extended lattice, points x and x + L 1ê1 do not coincide, so eqs. (4) have to be interpreted as constraints which defines the admissible gauge and fermion fields. Eqs. (4) are referred to as the orbifold constraints. Admissible gauge fields in the mirror lattice are completely determined by the value of the gauge field in the physical lattice via eqs. (4a). The integration measure over the manifold of admissible gauge fields is given by where the products are restricted over the physical lattice. The orbifold constraint has a slightly different meaning for fermion fields. On the physical lattice ψ andψ are independent Grassmanian variables. On the extended lattice one can choose the value of ψ in each point as a complete set of independent variables. The fieldψ on the extended lattice is completely determined as a function of the field ψ via eqs. (4b). By introducing the translation operator T as the relation between ψ andψ can be conveniently rewritten as The integration measure for the fermion field is given by Since the square of the charge-conjugation operation is the identity, all fields must obey periodic boundary conditions along the extended direction 1, i.e.
C boundary conditions in directions k = 2, 3 are implemented by modifying the global geometry of the torus. If k = 2, 3 is a C direction, then shifted boundary conditions (see fig. 1) are imposed When combined with the orbifold constraint (4), shifted boundary conditions are equivalent to C boundary conditions in direction k = 2, 3. For instance for the SU(3) gauge field, 1 C dir. k 1 1 P dir. k 1 Figure 1. Global geometry of extended lattice. The top diagram represents a section of the extended lattice along a (1, k) plane where k = 2, 3 is a C direction. All fields are periodic along the extended direction 1. C boundary conditions in the direction k = 2, 3 are replaced by shifted boundary conditions in the extended lattice. Shifted boundary conditions are imposed by properly defining the nearest neighbours of boundary sites. Empty circles in the red (resp. green, blue) rectangle have to be identified with the corresponding solid circles in the red (resp. green, blue) rectangle. The bottom diagram represents a section of the extended lattice along a (1, k) plane where k = 2, 3 is a periodic direction. In both diagrams, the black circles represent the sites of the physical lattice, and the grey circles represent the sites of the mirror lattice.

Gauge actions
For simplicity, periodic boundary conditions in the time direction will be assumed. The SU(3) and compact U(1) gauge actions are respectively: Given a path C on the lattice, U(C) and z(C) denote the SU(3) and U(1) parallel transports along C. S 0 and S 1 are the sets of all oriented plaquettes and all oriented 1 × 2 planar loops respectively.
The overall weight ω C is 1 if no C boundary conditions are used. With C boundary conditions ω C = 1/2 corrects for the double counting introduced by summing over all plaquette and double- plaquette loops in the extended lattice instead of the physical lattice. The coefficients c 0,1 satisfy the relation c 0 + 8c 1 = 1. The Wilson action is obtained by choosing c 0 = 1, the Lüscher-Weisz action is obtained by choosing c 0 = 5 3 , and the Iwasaki action is obtained by choosing c 0 = 3.648. The parameter g 0 is the bare SU(3) gauge coupling, and e 0 is the bare U(1) gauge coupling in terms of which the bare fine-structure constant is given by In the compact formulation of QED, all electric charges must be integer multiples of some elementary charge q el which is defined in units of the charge of the positron. As discussed in ref. [1], q el appears as an overall factor in the gauge action and essentially sets the normalization of the U(1) gauge field in the continuum limit. Even though in infinite volume q el = 1/3 would be an appropriate choice in order to simulate quarks, in finite volume with C boundary conditions one needs to choose q el = 1/6 in order to construct gauge-invariant interpolating operators for charged hadrons.

Dirac operator
The Dirac operator can be written as where D w is the (unimproved) Wilson-Dirac operator, δD sw is the Sheikholeslami-Wohlert (SW) term, and δD b is the time boundary O(a)-improvement term. For simplicity, periodic boundary conditions in the time direction will be assumed, which means δD b = 0 In presence of electromagnetism, the Dirac operator depends on the electric charge of the quark field. Let q be the physical electric charge in units of e (i.e. q = 2/3 for the up quark, and q = −1/3 for the down quark). In the compact formulation of QED, all electric charges must be integer multiples of an elementary charge q el , which appears as a parameter in the U(1) gauge action (14). It is useful to introduce the integer parameter The Wilson-Dirac operator can be written as where the covariant derivatives are defined as Notice that the integer parameterq appears in the hopping term of the Wilson-Dirac operator. The SW term is given by The SU(3) field tensor F µν (x) and the U(1) field tensor A µν (x) are constructed in terms of the clover plaquette. The explicit expression of the SU(3) field tensor used in openQ*D can be found in ref. [18], while the U(1) field tensor is given here, The normalization is chosen in such a way that −ie 0Âµν (x) is the canonically-normalized field tensor in the naive continuum limit, and the discretized field tensors are defined to be anti-hermitian. In the openQ*D code, it is possible to choose the values for all parameters of the Dirac operator independently for each flavour. In particular notice that, in presence of electromagnetism, the values of c SU (3) sw and c U(1) sw must depend on the electric charge in order to obtain O(a) improvement.

Pseudofermion action
The sum over the extended lattice introduces a double counting which can be corrected with an extra 1/2 factor in the fermion action: where the orbifold constraint (8) has been used. By using the properties of the C matrix one easily proves that If the gauge field respects the orbifold constraint, and T is the translation operator defined in (7), one trivially gets By combining the previous two equations, and by using the fact that C is anti-symmetric while T is symmetric, one gets that the matrix CT D is anti-symmetric. The integration over the fermion field yields the Pfaffian of CT D up to an irrelevant overall factor which will be reabsorbed in the definition of the fermionic integration measure, In the continuum limit, the Pfaffian of CT D is positive [1]. However at fixed lattice spacing, the Pfaffian is shown to be real but it can be negative on rough enough gauge configurations. The absolute value of the Pfaffian of CT D has a representation in terms of the determinant of a positive operator (notice that det C = det T = 1) The openQ*D-0.9a2 code uses even-odd preconditioning. After decomposing the lattice in even and odd sites in the standard way, the Dirac is represented in block form as In terms of the even-odd preconditioned Dirac operator the absolute value of the Pfaffian can be written as In order to stabilize the configuration generation, the openQ*D-0.9a2 code allows to introduce a parameterμ 2 and to replace which can be corrected by introducing a properly-defined reweighting factor in the observables (following the strategy of ref. [5]). Let R be a rational approximation of order The openQ*D-0.9a2 inherits from openQCD-1.6 the frequency splitting for the RHMC [5]. If the rational approximation is written explicitly as where the finite sequences µ j and ν j are assumed to be monotonically increasing, one can always define a factorization where the factors P k contain all the zeroes and poles with indices in a given range {J k , J k +1, . . . , J k+1 − 1}, i.e.
The openQ*D-0.9a2 code simulates the absolute value of the Pfaffian by means of the pseudofermion action given by The pseudofermion φ k is a complex field with gauge and spinor indices, which lives on the even sites of the extended lattice. It satisfies periodic boundary conditions in direction 1, and possibly shifted boundary condition in directions k = 2, 3. It is completely unrestricted over the extended lattice. Notice that, since D oo is diagonal in the lattice index, its determinant can be calculated exactly.

Molecular dynamics
The momentum fields associated to the SU(3) and U(1) gauge field are denoted by Π(x, µ) and π(x, µ) respectively. The momentum Π(x, µ) lives in the Lie algebra of SU (3), where Π a (x, µ) are taken to be real. The momentum field π(x, µ) is taken to be real, like the gauge field A(x, µ). With C boundary conditions, the momentum fields must satisfy the orbifold constraint Summing over all momenta in the extended lattice instead of the physical lattice introduces a double counting which can be corrected with an extra 1/2 factor in the MD Hamiltonian: In deriving the MD equations, one needs to take into account the orbifold constraint. It is convenient to define the forces as the derivative of the action S (U, A) thought as a function of the unconstrained fields on the extended lattice By using the orbifold constraint and the chain rule, the MD equations are found to be Since the Hamiltonian is invariant under translations and charge-conjugation, the orbifold constraint is preserved by the MD. In fact the orbifold constraint is also preserved by the discrete integrators used by the openQ*D code.

QCD: some tests
We have performed some test runs in order to assess two issues: • the overhead due to the orbifold construction; • the effectiveness of deflation with C boundary conditions.
The results presented here allow only to scratch the surface of these issues. The drawn conclusion are far from definitive and should be taken with a grain of salt. All runs presented in this section share the following parameters: • QCD with N f = 2 degenerate flavours; • Physical lattice 64 × 32 3 with periodic boundary conditions in time; • Wilson action with β = 5.2; • Non-perturbatively O(a) improved Wilson fermions with κ = 0.1359 and c sw = 2.017147; • MD trajectory lenght τ = 2; • a 0.08 fm, m π 380 MeV, m π L 4.7.
The parameters and the estimates for the lattice spacing and the pion mass are taken from the A4 ensemble in ref. [19] (see table 2). The c sw improvement coefficient is calculated by using eq. (2.25) in ref. [20]. In all cases we have used a twisted-mass reweighting withμ = 0.001.

Overhead due to the orbifold construction
A possible and direct way to implement C boundary conditions is to pack the quark and antiquark into a doublet which transform under the ×¯ representation of the gauge group. The boundary conditions swap the two components of Ψ. When the pseudofermion action is derived, pseudofermions will have two components as well. Let us denote by D C,2c the Dirac operator with C boundary conditions in the 2-component formulation. The Dirac operator acts independently on the two components in the bulk and mixes the two components at the boundary.
In openQ*D-0.9a2, C boundary conditions are implemented through an orbifold construction. The Dirac operator D C,orbi acts on single-component pseudofermions which are defined on a lattice that is twice as large as the physical lattice.
The orbifold construction was preferred to the two-component formulation for the openQ*D-0.9a2 code, since it required no modification of the Dirac operator and solvers of the original openQCD-1.6 code. The implementation of C boundary conditions via the orbifold construction is almost trivial. However, one may think that simulations with the orbifold construction are much more expensive. The question we want to address is: how more expensive is the orbifold construction with respect to the two-component construction?
If the physical volume V = T L 3 is kept constant it is obvious that where D P is the standard operator with periodic boundary conditions, which acts on single-component pseudofermions. Therefore, the application of the Dirac operator and the solution of the Dirac equation has exactly the same cost in the two possible implementations of C boundary conditions. The orbifold construction loses for two reasons: • The gauge fields are evolved twice, in the physical and mirror lattice, see fig. 1. This operation is relatively cheap, but it is done many times (at the innermost level of the integrator).
• The forces in the physical and mirror lattices need to be summed in the evolution equations for the momenta, see eqs. (43), and this requires MPI communications. The openQ*D-0.9a2 code implements two solutions to mitigate this issue. First we notice that the gauge force (which is integrated more often) satisfies automatically the orbifold constraint. The sum of forces in eqs. (43) gives a trivial factor of two for the gauge force. As a second measure, the MPI ranks are organized in such a way that a point x and its mirror point x + L 1ê1 belong to different MPI processes, but end up on the same multi-core node (assuming an MPI implementation for which MPI processes residing on a node are numbered consecutively). In this case the MPI communications needed for eqs. (43) should not go through the network.
The cost for the gauge field and momenta evolution is identical in the 2-component implementation of the C boundary conditions and in the periodic case. Assuming that the volume is large enough, the cost of the simulation of the theory with periodic boundary conditions is going to be equal to the cost of the simulation of the theory with C boundary conditions implemented with the 2-component formalism, provided that all parameters of the algorithm are identical (in particular the RHMC algorithm is used in both cases, and the rational approximation is the same). Then the overhead due to the orbifold construction is obtained by comparing the run with periodic boundary conditions (QCD2) with the run with C boundary conditions implemented with the orbifold 10 EPJ Web of Conferences 175, 09005 (2018) https://doi.org/10.1051/epjconf/201817509005 Lattice 2017 construction (QCD3). For details about these runs, refer to app. A. By comparing the simulation times listed in table 1, we conclude that the orbifold construction produces a relative overhead in cost of about 1%. The relative overhead is expected to decrease at smaller quark masses (since the solution of the Dirac equation becomes more expensive relatively to the gauge field and momenta evolution).

RHMC vs. HMC
In the case of N f = 2 degenerate flavours and periodic boundary conditions in space, one can use the HMC algorithm. We have generated the QCD1 ensemble with the HMC and twisted-mass Hasenbusch preconditioning, with splitting of the various terms in different integration levels.
In case of C boundary conditions one is forced to use the RHMC since the Pfaffian needs to be simulated. We use a representation of the Pfaffian of the type with a rational approximation for (D † D) −1/4 , as this is more similar to the setup needed for the inclusion of isospin-breaking corrections. We have generated the QCD3 ensemble with C boundary conditions, the RHMC with a rational approximation for (D † D) −1/4 and frequency splitting in different integration levels.
For details about the QCD1 and QCD3 runs, refer to app. A. By comparing the simulation times listed in table 1, we see that the QCD3 run is about 5.5 times slower than the QCD1 run. This big factor is most likely an indication of the fact that the QCD3 run has not been optimized as well as the QCD1 run. We plan to invest more time into the optimization of the RHMC runs, and in particular we thank Kate Clark for providing useful suggestions during the conference.

QCD+QED: some tests
We have also used openQ*D-0.9a2 code to perform test runs that include the dynamical degrees of freedom of the U(1) gauge field. The aim of these runs was twofold: • to examine whether any unexpected features appear if compact QCD+QED simulations are performed with Wilson fermions and C boundary conditions; • to get a first insight into autocorrelations of SU(3) and U(1) gauge observables.
The openQ*D-0.9a2 code currently implements only the compact formulation of QED; hence, we perform the initial tests with the compact QED action. The test runs presented in this section share the following parameters: • QCD+QED with N f = 2 + 1 fermion flavours; • C bc's in all space directions; • α em = 0.05 ≈ 7α phys em ; • Lüscher-Weisz SU(3) gauge action and Wilson U(1) gauge action; • Wilson fermion action with SU(3) SW-term coefficients determined at α em = 0; • Tree-level coefficients for the U(1) SW-term; • MD trajectory length τ = 0.7071 (to be able to compare with HiRep code [3]). The first run (QCD+QED1) takes over the parameters from the H200 ensemble of the N f = 2 + 1 CLS [21], except that the lattice extent is halved in each of the space-time directions. The dynamical U(1) degrees of freedom contribute to the renormalization of the bare parameters, hence the estimate for the lattice spacing and pion mass cannot be taken from the CLS ensembles 1 , but rather need to be estimated independently.
The parameters characteristic for each of the performed QCD+QED runs are the following: Although openQ*D-0.9a2 code allows for twisted-mass reweighting, the runs described in this section do not use that option (μ = 0.0). In both runs all three bare sea quark masses are taken to be the same. However, due to the differences in quark charges we end up with a degenerate pair of quarks (down and strange) with q = −1/3, and a single quark (up) with q = 2/3 in our simulations; hence, we are essentially simulating N f = 2 + 1 theory.

Spectral ranges of the Dirac operators
We monitored the smallest and the largest eigenvalue of |γ 5 D u | and |γ 5 D d/s | throughout the performed QCD+QED runs, in order to confirm that the spectral ranges of the rational approximations have been

Autocorrelations of SU(3) and U(1) gauge observables
We use Wilson flow to define the global topological charge and the action density for SU(3) and U(1) gauge fields (see app. B and ref. [22]). In the left panel of fig. 3

A Some details on the QCD test runs
We have produced three ensembles which differ by the space boundary conditions (and consequently by the simulated lattice) and by the algorithm used, as summarized in table 1. In all cases, a 3-level integrator has been used, following the strategy used in ref. [21]: • 1 steps of 4th order OMF integrator in the innermost level (level 0); • 1 step of 4th order OMF integrator in the intermediate level (level 1); • 12 step of 2nd order OMF integrator with λ = 1/6 in the outermost level (level 2).
Only the gauge force is integrated in level 0, while the fermionic forces are distributed in levels 1 and 2, in different ways depending on the particular algorithm used. In all cases the acceptance rate is found to be between 90% and 95%. Let us look now in detail at the algorithms used for each run listed in table 1.

QCD1.
In case of periodic boundary conditions, one can use the HMC algorithm with even-odd and Hasenbusch twisted-mass preconditioning. We use a pseudofermion action of the form We choose (µ 1 , µ 2 , µ 3 ) = (0.005, 0.05, 0.5). The force associated to S pf,1 is integrated in level 1, while the force associated to S pf,2 is integrated in level 2. The Dirac equation is solved by means of a conjugate gradient for twisted mass µ = 0.5 and by means of the deflation-accelerated solver in all other cases. QCD2. In case of periodic boundary conditions, one can use the RHMC algorithm with multiple copies of pseudofermion fields. In particular, we construct the optimal rational approximation with relative precision of 10 −6 for  where N pf = 4 copies of pseudofermion fields have been used in order to reproduce the correct number of flavours. QCD3. In case of C boundary conditions, one must use the RHMC algorithm with multiple copies of pseudofermion fields. We use the same setup as for the QCD2 run (see table 2) with the only difference that in this case the correct number of flavours is obtained by using N pf = 2 copies of pseudofermion fields.

B Some details on the QCD+QED test runs
We have produced two ensembles which differ by the time boundary conditions and by the lattice size. The algorithm used in all cases is the same (RHMC) and involves a 3-level integrator: • 1 step of 4th order OMF integrator in the innermost level (level 0); • 3 steps of 4th order OMF integrator in the intermediate level (level 1); • 10 steps of 4th order OMF integrator in the outermost level (level 2).
The U(1) force is integrated in level 0, the SU(3) force in level 1, and all the fermionic forces are integrated in the outermost level (level 2). The two runs are simulating C bc's in space and the Dirac equation is solved by means of CG in all cases (multi-shift CG is used for all the factors in Table 3. N f = 2 + 1 QCD+QED test runs. N represents the order of the corresponding rational approximation, and [λ min , λ max ] is the range that is assumed to include the spectrum of |γ 5D | in both cases.  table 3 for details on the degrees and spectral ranges of the rational approximation used for the (D † D) −1/4 (up quark) and (D † D) −1/2 (down and strange quarks).

Run name
The SU(3) and U(1) action densities are defined as: where F µν (x, t) and A µν (x, t) are the clover-type discretization of the SU(3) and U(1) field strength tensors respectively, at positive flow time. The global topological charge for SU(3) gauge fields is given by: