Dual-Parameterized Quantum Circuit GAN Model in High Energy Physics

Generative models, and Generative Adversarial Networks (GAN) in particular, are being studied as possible alternatives to Monte Carlo simulations. It has been proposed that, in certain circumstances, simulation using GANs can be sped-up by using quantum GANs (qGANs). We present a new design of qGAN, the dual-Parameterized Quantum Circuit(PQC) GAN, which consists of a classical discriminator and two quantum generators which take the form of PQCs. The first PQC learns a probability distribution over N-pixel images, while the second generates normalized pixel intensities of an individual image for each PQC input. With a view to HEP applications, we evaluated the dual-PQC architecture on the task of imitating calorimeter outputs, translated into pixelated images. The results demonstrate that the model can reproduce a fixed number of images with a reduced size as well as their probability distribution and we anticipate it should allow us to scale up to real calorimeter outputs.


Introduction
The next High Luminosity Large Hadron Collider (HL-LHC) phase will collect an overwhelming amount of data, with complex physics and small statistical error. To analyse this data, high precision methods which use only limited resources are needed. Traditional Monte Carlo based simulation, such as Geant4 [1,2] and the GeantV prototype [3] for full simulation of particle transport, is however very time-consuming, therefore new approaches using deep neural networks have been studied for fast simulations.
Generative Adversarial Networks (GAN) are a strong candidate for such fast simulations. Based on two neural networks, generator and discriminator, trained alternatively, GANs have been widely explored thanks to their ability to generate images with complex structures at much high speed. In HEP, the variations of GAN, such as CaloGAN [4] and 3DGAN [5], have achieved similar performance as full Monte Carlo based simulation, but with reduced time taken.
At the same time, quantum computing has emerged as another important pillar in modern research attracting the attention of many researchers due to its potential to execute certain tasks with an exponentially reduced amount of resources both in time and space compared to classical processors [6]. It has already shown promising results in various fields, such as optimization [7,8] and cryptography [9].
Advances in both deep learning and quantum computing suggest to merge them to benefit their advantages at once, leading to a new field of study, so-called Quantum Machine Learning (QML). Quantum Generative Adversarial Networks, which are the quantum version of GANs, are one of its examples. Several quantum GAN models have been investigated in the last few years, but the scientific community still confronts a need to further explore in order to apply the model to more realistic use-cases.
In this paper, we propose for the first time a Dual Parameterized Quantum Circuit GAN model (dual-PQC GAN model) as one of the improvements to overcome the remaining limitations of quantum GANs. This model uses two parametrized quantum circuits, which share the role of a single quantum generator: the first PQC learns the distribution over image samples, while the second PQC determines the amplitude distribution over pixels on a single image. Thanks to this separation, it is possible to exploit the continuous nature of probability distributions over output states in quantum circuits to represent continuous variables.
This paper is organized as follows. Section 2 summarizes the application of Generative Adversarial Networks in HEP. We then present a short overview of a quantum version of GAN in Section 3. In Section 4 and Section 5, the first prototype of a dual-PQC GAN model is proposed, with the results of its simulation. This paper concludes with Section 6, which summarizes and gives an outlook of future works.

Applications of GANs in HEP
GANs, designed by I. Goodfellow et al. in 2014 [10], are deep generative models which aim to reproduce new data from a given original training set. They are characterized by two deep neural networks, Generator G and Discriminator D, which are trained alternatively. During the training, G progressively generates data similar to the real one, while D increases the probability of assigning the correct labels to both real and fake data. Numerous improvements have been made since the initial proposal and, in particular, GANs have achieved remarkable success in image processing and generation, via variations such as Deep Convolutional GAN (DCGAN) [11], Auxiliary Classifier GAN (ACGAN) [12], Progressive GAN [13], etc.
The evolution of GANs has attracted strong interest in the high energy physics domain. In HEP, the detectors can be described as 3D cameras, recording pictures of particle collisions. Calorimeters, in particular, measure the energies deposited by a shower of the particles that traverse them. They generally consist of alternate arrays of active sensor material and passive dense layers to ensure that the incoming (primary) particle will deposit most of its energy inside their volume. These energy depositions can be compared to the monochromatic pixel intensities of a 3D image. Because of their high granularity, the detailed simulation of a calorimeter is particularly time-consuming. As a result, GANs come into the limelight to allow fast simulation of particle showers with high fidelity.
One possible application of GAN in HEP is 3DGAN [5,14], which is a 3D extension of GAN, using 3D (de-)convolutional layers to capture the whole 3D energy profile. It simultaneously performs two additional tasks of estimating the incoming particle energy and measuring the total deposited energy to enhance the stability and convergence of networks. Details on 3DGAN architecture and its performance validation are available in [5].

Quantum Generative Adversarial Networks
The possibility of combining machine learning and quantum computing also led to the generalization of GAN to quantum systems by S. Lloyd and C. Weedbrook [15]. The main mechanism of the model, the adversarial training, is reproduced, but different scenarios are possible: the input data can be either quantum data or classical data embedded in quantum states and the discriminator/the generator can also be either classical or quantum.
Since its initial proposal, several quantum GAN (qGAN) variations have been suggested to generate either classical data [16][17][18] or quantum data [19][20][21][22]. Zoufal et al [16] proposed a hybrid qGAN model composed of a quantum generator and a classical discriminator to train on classical data. During the training, the generator learns an arbitrary probability distribution over discrete variables, which is encoded in the amplitudes of the final quantum state. This model was applied of quantum finance, and demonstrated using the real quantum hardware, IBM Q Boeblingen. Anand et al [23] also present similar results, but simulated on the real Rigetti quantum hardware, Aspen-4-2Q-A. Unlike the aforementioned models treating classical data, Situ et al [18] propose a qGAN model which aims to approximate an unknown pure quantum state with a quantum generator and a quantum discriminator. One problem with quantum machine learning models is the apparent difficulty of training PQCs, captured by the vanishing gradient and barren plateau problems. Fortunately, there have been also studies on the methods to improve the performance of qGAN, for instance, Quantum Multiplicative Matrix Weight (QMMW), which helps to avoid mode collapse or vanishing gradient problem in qGAN [24].
The results from the previous research are impressive, showing the potential of qGAN for the near-term quantum hardware. However, additional investigations are needed in order to fully understand the quantum advantages of qGAN and generate not only simple probability distributions but also more complex image samples.

Dual-PQC GAN model
Our preliminary experiments involved training a qGAN, as conceptualised in [16], for the calorimeter problem -however this immediately revealed a problem: as the image itself is encoded in the amplitudes of the computational basis states, this meant that only the average of all the training samples could be learned, and the GAN did not, therefore, sample typical images. To put this in more precise terms, in order to achieve the exponential compression of representing a N = 2 n pixel image using n qubits, it follows that the qGAN prepares a quantum state of the form: Initialization where I j is the intensity of the j th pixel. So we can see that, in a sense, the qGAN encodes a single image as a probability distribution, and so there is no room left to also encode a probability distribution, representing the full dataset, to sample images from. This problem does not arise in classical GANs, where a single sample from a O(N) sized neural network generates a single image in one go.
In this section we describe our solution, a new type of quantum GAN, the dual Parameterized Quantum Circuit (PQC) GAN model (dual-PQC GAN model). It aims to reproduce a set of image samples from real training data while preserving the exponential compression achieved by amplitude encoding. The work in [25] follows from similar motivation.
The dual-PQC GAN is a hybrid qGAN architecture which has one classical discriminator and two parameterized quantum circuits, PQC1 and PQC2, sharing the role of the generator. PQC1, with n 1 qubits, learns the probability distribution over image samples and PQC2, with n 2 qubits, learns the amplitude distribution over pixels of each image. The classical discriminator takes the training set and the images generated by PQC2, and it classifies them into real and fake. The predicted labels are used to tune alternatively φ 1 , φ 2 and θ, the parameters for PQC1, PQC2, and the discriminator, respectively.
In this study, both PQCs consist of alternating layers of single-qubit Pauli-rotations and a set of two-qubit entanglement gates, as shown in Fig. 1 It is widely used in quantum machine learning thanks to its strong expressive power, offering an effective way of reconstructing an expected behaviour [26,27]. We use RY rotation gates and CZ entanglement gates, but other choices are possible. Consider a training set X ⊂ R 2 n of N = 2 n pixel images. To begin, the output state of PQC1 is measured producing n 1 bits. Then, via a set of Pauli-X gates, this bit string is used to initialise PQC2 with the corresponding computational basis state in the 2 n 1 dimensional Hilbert space, |i ∈ {|0 , ..., |2 n 1 − 1 }. Then, by repeatedly measuring n output qubits of PQC2, the probability distribution over the computational basis in the 2 n dimensional Hilbert space, |i ∈ {|0 , ..., |2 n − 1 }, is constructed and translated as an image of size 2 n for each input state.
Note that since PQC2 performs a unitary operation, and since its inputs are always computational basis states, the output quantum states are necessarily orthogonal. This puts an unwanted restriction on the possible images. Relying on the Stinespring dilation theorem, we can remove this restriction by choosing n 2 > n 1 , n and using some ancilla qubits which are discarded at the end. We will return to this point at the end of the section.
Let p i g be the probability that state |i is measured by PQC1; let I i denote the normalized image produced by PQC2 when given the input state |i ; and let I i j be the amplitude of j th pixel of I i with i ∈ {0, ..., 2 n 1 − 1} and j ∈ {0, ..., 2 n − 1}. Then the output states generated by PQC1, G 1,φ 1 , and PQC2, G 2,φ 2 , are explicitly given as : where |ψ initial is the input state of PQC1 fixed during the whole training and Ψ j are some n − n 2 qubit states that we discard. During the training, PQC1 learns the distribution p g (i) over I i , so that it approaches to the real distribution, p real over X. On the other hand, PQC2 learns the amplitude over 2 n pixels for 2 n 1 images, to make I i as close as possible to real images. At the end of the training, the true/fake probabilty predicted by the discriminator for I i , D(I i ) should converge to 1/2.
For the following simulations, we use a modified min-max loss, given as: where m is the batch size, x i the real data and z i random input. The first equality gives the definition of the loss in the classical GAN and the second equality the practical formula used in dual-PQC GAN simulations. Based on the calculated loss, the parameters in the quantum generator are tuned by computing the analytic quantum gradient descent [28], while the discriminator is optimized in the exactly same way as in classical GAN. Further details on the method used for analytic quantum gradient are explained in Ref. [29,30].
Ultimately, the dual-PQC GAN model can generate 2 n 1 images of size 2 n . Increasing the number of qubits used in PQC1 and PQC2 allows to increase both the number and size of produced images. This model shows an advantage in terms of computational resources by using only O(log(N)) qubits, compared to the classical neural networks with O(N) neurons to reproduce an image of size N. Specifically, the potential advantages are threefold: firstly, there is an exponential reduction in space (memory) requirement; secondly, the resultant exponential reduction in number of tunable parameters (i.e. the number of tunable parameters is proportional to the number of gates in the PQC or neurons in the classical neural network) suggests that training could be performed more efficiently; finally, it is potentially advantageous to have the images encoded in quantum states if further processing is to be performed (for example if that image processing can itself be performed more efficiently using quantum computing). It should, however, be noted that if all one wants to do is to generate an image, then the number of samples required from PQC2 is exponential in the number of qubits.

How many ancillas are needed?
Since we want to be able to reproduce any collection of images, we have to use some ancillary qubits. Stinespring's theorem gives an upper bound on the number of ancillas but it is not tight. We will now show that n 2 = 2n will suffice when we assume for simplicity that n 1 = n.
Let U denote the unitary matrix corresponding to PQC2. When U is applied to the input state |0 ⊗n ⊗ |i , where |i is an n-qubit basis state, the resulting state is always one of the columns of U; in particular it always one of the first 2 n columns. We'll construct an example of U which realises 2 n arbitrary images. Let |I i = ( I i,0 e iφ i,0 , . . . , I i,0 e iφ i,0 ) T be a state whose amplitudes encode the pixels of image i. Consider the state |U(i) = |i ⊗ |I i , and observe that if its first n qubits are measured in the computational basis and discarded, then the remaining quantum state is precisely |I i . Since U(i)|U( j) = δ i j , ie, they are orthonormal, we can construct the required unitary matrix, U, by setting the first 2 n columns to be |U(i) for i ∈ {0, . . . 2 n − 1} and choosing the remainder arbitrarily. Since these columns correspond to states which will not be selected by any input to PQC2, they don't matter. It should be noted that it is unlikely that the trained dual-PQC GAN would actually converge on unitaries of such a form, and thus this construction is given merely to demonstrate that 2n qubits suffice; further improvements are surely possible.

Training dual-PQC GAN
This section tests dual-PQC GAN described in Section 4 and shows its potential to generate a set of image samples from a training set with a certain degree of fidelity. We emphasize that, in order to work with a manageable number of qubits, this study simplifies the original 3DGAN problem by reducing it to a 1D problem: reproducing the energy pattern along the calorimeter depth. In other words, the training dataset is composed by 1D energy profiles along the calorimeter z dimension, averaged over N = 4 pixels. It should be noted that such drastic reductions in problem size are common-place in quantum machine learning, in order to obtain useful proof-of-principle results.
In order to evaluate the performance of the model, the original data set of 20,000 sample images is classified into 2 n classes of via K-means clustering [31] as shown on Fig. 3a for the case n = 2. The average image for each class, displayed on Fig. 3b, gives an insight on the shape of images which should be produced by PQC2. Note that this clustering is purely for evaluation of results -raw data are used for the training. Although, theoretically, it is sufficient to take n 1 = n = 2, several preliminary simulations have shown that n 1 = 2n = 4 gives better stability in the results. Therefore, for the following simulations, PQC1 takes n 1 = 4 but still builds a probability distribution over 2 2 = 4 images, by only measuring n = 2 qubits among four. PQC1 is initialized with an equiprobable superposition over the computational basis, {|0 , ..., |2 n 1 − 1 } with n 1 = 4. Furthermore, the initial parameters for both PQC1 and PQC2 are sampled from a uniform distribution over [−δ, δ], with δ = 10 −1 . The discriminator is implemented in PyTorch, using an input layer with 4 nodes, two hidden layer with 256, 128 nodes, respectively, and a single node output layer. After the first two layers follows a Leaky ReLU function [32] with α = 0.2 and a sigmoid function [33] is applied after the output layer. In addition, a gradient penalty [34] for real images is added to help stability and convergence of the model, with the parameters λ = 7, k = 0.01 and c = 1. The dual-PQC GAN is trained using the AMSGRAD optimizer with initial learning rate of 10 −4 for PQC1 and discriminator and 10 −3 for PQC2.   4 displays progress in the loss functions and relative entropy, as well as the average of generated images at the end of the training, weighted with their probability, given by: Both cases of d g,2 = 6 and d g,2 = 16 exhibit convergence in mean energy distribution towards the target, as well as convergence of generator and discriminator losses. Furthermore, the relative entropy between real and generated mean images reaches below 10 −4 as shown on Fig. 4c and Fig. 4f. Note that after the convergence in loss function, the relative entropy does not cease decreasing in case of d g,2 = 16, while it starts to oscillate with large amplitude in case of d g,2 = 6, reflecting certain degree of instability in the simulation.
As an analysis on the mean images is not enough to validate the GAN result, it is necessary to evaluate the individual images produced by PQC2 as well as the probability distribution generated by PQC1. The images generated by PQC2 with d g,2 = 16, displayed on Fig. 5c, are similar to the mean real images shown in Fig. 3b with the peaks at x = 2 or x = 3. On the other hand, those generated by PQC2 with d g,2 = 6 on Fig. 5a contain two images, I 0 and I 2 , which are far from the real image profile.
The probability distribution generated by PQC1 can explain this discrepancy. As shown on Fig. 5b, the weights for I 0 and I 2 are negligible compared to those for I 1 and I 3 , meaning that their labels, produced by the classical discriminator, are suppressed in the loss function given by Eq. (4). Therefore, the mean images and the generator loss could converge to the correct value, despite considerable errors in the produced images themselves. Contrarily, in case of d g,2 = 16, the probability on Fig. 5d implies that all 4 images have non-negligible weights, thus leading to consistency between the quality for the mean and individual images. This result certainly highlights the importance of choosing a correct structure of dual-PQC GAN model in order to prevent any bias during training. Figure 5: Images generated by PQC2 (a,c) and its distribution obtained from PQC1 (b,d) with n = 2, n 1 = n 2 = 4 and d g,1 = 2, but with different d g,2 : 1) d g,2 = 6 (a,b), 2) d g,2 = 16 (c,d). Although the mean images shown on Fig. 4 are close to the real value in both cases, PQC2 with d g,2 = 6 cannot imitate correctly the real data (c.f. Fig. 3b), while the one with d g,2 = 16 achieves to reproduce very similar images.
Finally, the quality of individual images is evaluated by calculating the relative energy between the real images Set i, shown on Fig. 3b, and the generated images I i for i = 0, 1, 2, 3, as displayed on Fig. 6. Considering the lowest relative entropy values across the last 4 epochs (200 epochs are run in total), a bijection between real images and generated images can be constructed : I 0 → Set0, I 1 → Set3, I 2 → Set1 and I 3 → Set2. This result gives a quantitative proof that PQC2 can reproduce four different sets of images in the real training data. Despite this optimistic affirmation, the relative entropy large instability is the main point that should be improved in future studies.
Unfortunately, the number of qubits in the quantum generator scales not only with the number of pixels in one image but also with the number of images that can be produced, while in original GAN, the system size mainly scales with image size. This fact represents the largest limitation of the model and it requires further improvement. We are currently investigating a solution feeding extra noise to the remaining PQC2 n 2 − n 1 qubits, while PQC1 keeps generating a probability distribution over the zero-noise image. Figure 6: Relative entropy of generated images I 0 , ..., I 3 with respect to average of real image classes, shown on Fig. 3b. The images are generated via dual-PQC GAN with n = 2, n 1 = n 2 = 4, d g,1 = 2 and d g,2 = 16. According to minimum relative entropy at the end of the training, one-to-one correspondence can be established between real and generated images.

Conclusion
This work presents a dual-PQC GAN model, a prototype of quantum GAN with two quantum generators, sharing the role of a single generator. One of the generators is responsible for reproducing the distribution over images, while the other for building amplitude distributions over pixels on a single image. If we supplement the input of PQC2 with a sample from PQC1 and some (continuous) noise, then we can see that PQC1 really is a distribution over means of different classes of images, and the noise allows generation of typical samples from the class. The results obtained prove that this model can generate individual image samples and their probability distribution, similar to the training set.
It is also worth noting that, as the number of possible images (or classes of images if noise is added) grows exponentially with n 1 , the fact that we sample from a finite set of images (or classes of images if noise is added) rather than a continuum of typical images (as in the corresponding classical case) is unlikely to seriously compromise performance.
An interesting question for future research would be how to reproduce an arbitrary number of outputs with the dual-PQC GAN model. One possible way is to introduce a complexvalued noise to the remaining qubits in the PQC2. Using a fixed sampler instead of a trained quantum circuit for PQC1 to pass an entangled quantum state to PQC2 is another possible approach. Throughout future studies, we look forward to building a more advanced quantum GAN model to imitate the performance of classical GAN.