Online and Offline Pattern Recognition in PANDA

PANDA is one of the four experiments that will run at the new facility FAIR that is being built in Darmstadt, Germany. It is a fixed target experiment: a beam of antiprotons collides on a jet proton target (the maximum center of mass energy is 5.46 GeV). The interaction rate at the startup will be 2 MHz with the goal of reaching 20 MHz at full luminosity. The beam of antiprotons will be essentially continuous. PANDA will have NO hardware trigger but only a software trigger, to allow for maximum flexibility in the physics program. All those characteristics are severe challenges for the reconstruction code that 1) must be fast, since it has to be validated up to 20 MHz interaction rate; 2) must be able to reject fake tracks caused by the remnant hits, belonging to previous or later events in some slow detectors, for example the straw tubes in the central region. The Pattern Recognition (PR) of PANDA will have to run both online to achieve a first fast selection, and offline, at lower rate, for a more refined selection. In PANDA the PR code is continuously evolving; this contribution shows the present status. I will give an overview of three examples of PR following different strategies and/or implemented on different hardware (FPGA, GPUs, CPUs) and, when available, I will report the performances.


Introduction
PANDA [1] is an experiment that will run at the FAIR accelerator facility in Darmstadt, Germany, starting in 2021.A beam of antiprotons circulating in an accumulation ring with momentum ranging from 1.5 to 15 GeV/c will collide on a hydrogen jet target.The interaction rate at the beginning will be 2 MHz with the goal of reaching 20 MHz when the accelerator delivers the full luminosity.
The PANDA event selection will be completely software.This has been designed to achieve the maximum possible flexibility in the physics channels selection and to exploit the detector possibilities at the fullest.On the other hand this choice poses a great challenge on the Pattern Recognition (PR) software that must keep up with the initial 4 MHz (corresponding to a raw data rate of 20 GB/s) and, in perspective, with the final 20 MHz interaction rate (raw data rate of 200 GB/s), and it must reject the fake tracks produced by the pile-up of hits from previous or subsequent events.A lot of PR code have been developed so far in PANDA.It can be divided essentially in three categories: 1) online PR running on FPGA ; 2) code running on GPU for online/offline PR; 3) code running on CPU for offline PR.
For lack of space only three examples of PR code are described in the following, one running on FPGAs, one for GPUs and one running on CPUs.At the end a short summary of the PR algorithms developed in PANDA can be found.First a very brief description of the PANDA central detector is given in the next section in order to allow the reader to understand later the PR algorithms described.The PANDA detector is shown in Figure 1.The antiproton beam circulates in the ring along the z-axis in the positive direction.The central part of the detector is inside a 2 Tesla superconducting solenoid.The target system blows frozen hydrogen into the beam line.Inside the solenoid from about 3 cm up to 15 cm away from the beam axis the MicroVertex Detector (MVD) is located, consisting of pixel silicon planes and double-sided strip planes (see Figure 2), the Straw proportional Tubes Tracker (STT) shown in Figure 3, a time of flight tile scintillator detector, a ring imaging silica Čerenkov detector, and a scintillating crystal electromagnetic calorimeter.The muon proportional chambers are inserted between the solenoid iron yoke.The central part is completed by three planes of gas electron multiplier (GEM) detectors covering the very forward angles (θ < 11 • ), a large backward angle (θ > 140 • ) scintillating crystal calorimeter and a forward silica disk ring imaging Čerenkov detector.Downstream the solenoid the forward particle section of the experimental setup is placed: a dipole with proportional straw chambers inside, a Time of Flight wall made of plastic scintillators, an aerogel ring imaging Čerenkov, and an electromagnetic and hadronic calorimeter.

An example of PR running on FPGAs
It is described here an example of a very fast algorithm written for online selection and designed to run on FPGAs.The tracks are searched in the central part of the detector, inside the solenoid, with a road finding algorithm, using only the STT straw hits.First the information on each hit (straw layer and position number, time of arrival of the hit) of the stream of STT hits from the continuous data taking is packed in the FPGA memory in 16 bits per hit (see Figure 4).Then the following algorithms are implemented in the FPGA: 1.The first step consists in finding tracklets with neighbouring STT hits in the x-y-plane combined together (see Figures 3 and 4) starting from the innermost STT axial layers going outwards.
2. The particle trajectory projected on the x-y-plane is a circle; its radius and center position are determined by fitting the tracklets with a circle (χ 2 minimization) in 2 iterations.The p ⊥ of the particle at the origin is calculated.
3. The stereo layers hits are associated to the tracklets and the P z of the particle is found with a χ 2 minimization in two iterations.The algorithm has been implemented in VHDL language on 140 Virtex5 Xilinx FPGAs and it is parallel.The square root operations and the divisions have been sped up using look-up tables.
The algorithm was tested with a stream of STT hits simulated with the standard PANDA simulation package.Events with six muon tracks were generated uniformly in space.An interaction rate of 20 MHz was assumed, thereby stressing the code at the maximum level possible in the PANDA experiment.The high interaction rate produces also pile-up in the recorded hits as a consequence of the STT maximum drift time (≈ 200 nsec), which is longer than the interaction rate.A detail of the clock counts used in each step is given in Table 1 .The total time per 7-track event is 7 μs ; the momentum resolution obtained is σ p ⊥ ≈ 3.2% for tracks having p ⊥ = 1 GeV/c, and σ p z ≈ 3.8% for tracks having p ⊥ = p z = 1 GeV/c.This algorithm will be extended to include the MVD hits in the road finding procedure.
algorithm is based on a Hough transform method.After the information of the STT and MVD hits are loaded in the GPU memory the algorithm executes the following steps: 1. in the x-y-plane the x and y coordinates of all MVD hits are transformed to coordinates u and v by a conformal transformation: consequently the circular trajectory originating from (0,0) in x-y-plane transforms into a straight line going through the MVD points.
2. The same conformal transform is applied to the drift circles of the STT Axial straw hits.Those drift circles transform again to circles in the u-v-plane that are tangent to the particle trajectory (see Figure 5).
3. The Hough plot in the variables θ and R is constructed using the MVD hits and the STT axial straw hits.For each MVD hit a bundle of straight lines (in the u-v-plane) passing through the MVD hit is generated, with the following parametrization: where u i and v i are the MVD hit coordinates.For each STT axial straw hits a similar parametrization is used: K20K hardware driven by a dedicated CPU.It was written in plain CUDA.The data was fed in the CPU first and then into the memory of the GPU.The performances are relative to the PR algorithm run with the data deposited in the shared memory of the GPUs and are shown in Table 2.So far this algorithm delivers only the p ⊥ of the tracks; it will be completed with the inclusion of the calculation of p z of the tracks, using MVD hits and STT stereo hits.Here the (extreme) case of an interaction rate of 20 MHz is show, while at the beginning the PANDA interaction rate is only 2 MHz.The red-circled hits are the STT axial hits from which the tracklets are started.

An example of PR running on CPUs
I describe in this section a PR algorithm using the central detector hits to find tracks.It combines the road-finding strategy with the Hough method to maximize efficiency and rejection of spurious hits.
Initially it was designed for offline PR and run on CPUs.However, parallelization and implementation on GPUs is foreseen for a possible use online as well.The first step is finding the tracklets in x-y-plane by using the STT axial straw hits only.They are formed starting from hit at the boundary of the STT detector (red-circled hits in Figure 7) and looking for adjacent hits.In the figure one can notice the large number of spurious hits not belonging to any real track.Those are caused by remnants of tracks from previous events of subsequent events.In PANDA it is one of the major problems encountered in a PR algorithm using the STT detector.It is caused by its large maximum drift time (about 200 ns).Figure 7 is an example of pile-up event in the (extreme) case of interaction rate of 20 MHz, a situation possibly occurring only at a later stage of PANDA.At the start-up of PANDA the interaction rate will be only 2 MHz.
The algorithm proceeds to calculate the conformal transformation in Equation ( 1), thereby transforming the circular trajectory in the x-y-plane into a straight line in the u-v-plane.A first fast calculation of the radius and center of the trajectory helix is done with a Hough transform plot on the straight trajectory in the u-v-plane.Using this first result the algorithm attempts to add to the tracklet more STT axial straw hits and MVD hits close enough to the trajectory.A χ 2 -minimization fit to all STT+MVD hits is performed and the p ⊥ and charge of the particle are finally established.
The p z is determined by associating the STT stereo straw hits to the helix.The stereo angle is only 3 • and helps selecting the true hits belonging to the tracks.In fact the spurious straws intersect the trajectory cylinder in unphysical regions (too far back or too far forward).When projected on the trajectory cylinder the trajectory is a straight line (see Figure 8).The MVD hits look like points, while the stereo STT straw tube intersections look like ellipses squeezed in the vertical direction because of the small stereo angle (3 degrees), with two possible points of intersection with the helix trajectory (blue points in Figure 8).A straight line χ 2 -fit delivers the p z of the track.All possible combinations of stereo STT straw tube intersection points are taken into account in the fit and the solution corresponding to the best χ 2 is chosen.At this point the trajectory is completely determined and a last round of elimination of spurious hits is performed with a simple proximity selection.
Because of the pile-up mentioned earlier, the algorithm may find many spurious tracks (Figure 7 in the worst case of 20 MHz interaction rate).Consequently two cleanup procedures have been written rejecting a track if its hits are not continuous (spatially) in the MVD detector (in the first procedure) or in the MVD and STT detectors (in the second procedure).The performance of this PR is summarized in Tables 3 and 4, calculated with simulated events in which an interaction rate (and consequent pileup of hits) of 2 MHz was assumed.Events were simulated with 1 to 8 (muon) tracks.All results are summarized here for the two cleanup procedures.
The CPU time spent per track per event on an Intel i7-2600K 64 bits 3.4 GHz core is shown in Table 5 for 2 MHz and 20 MHz interaction rate.The code is not yet parallelized and another factor of 5-10 in speed is expected to be gained when running on a GPU.

Other algorithms implemented in PANDA
In PANDA many other Pattern Recognition or tracking algorithms have been written during the years and unfortunately describing them would require too much space.The following is a partial list of them.
• A PR using the MVD hits to find radius and center of the helix cylinder of a track not necessarily produced at the vertex.This algorithm is based on the projection of the circular trajectory on a Riemann sphere (or paraboloid) and can in principle be used to find also secondary vertexes.• Cellular Automaton types of algorithms that use the MVD and STT hits have been written and implemented on CPUs and GPUs; • A Hough transform method implemented on GPUs that uses MVD and STT axial straw hits only.
This procedure uses an alternative parametrization of the particle circular trajectory in the x-y-plane.2 MHz interaction rate 20 MHz interaction rate 1-track event 1.6 msec 6.5 msec 4-track event 5.0 msec 9.5 msec 8-track event 9.5 msec 13.5 msec 201 Connecting The Dots 6

Figure 1 .
Figure 1.The PANDA full detector

Figure 4 .
Figure 4. Pattern Recognition code running on FPGAs; on the left top part of the STT axial straws is shown with the convention used to identify uniquely each tube; on the top right an example of tracklets found in the STT detector in the Layer ID versus Tube ID plot.At the bottom the detail of the sequentially stored hit information is shown.

Figure 5 .Figure 6 .
Figure 5. Simulation of a particle trajectory produced at the origin in x-y-plane and conformal-transformed as described in the text

Figure 7 .
Figure 7.An example of a PANDA single track event projected on the x-y-plane.The green line is the true simulated track, the red line is the reconstructed tracks.Most of the hits in this figure are spurious, remnants of previous (or subsequent) events.Here the (extreme) case of an interaction rate of 20 MHz is show, while at the beginning the PANDA interaction rate is only 2 MHz.The red-circled hits are the STT axial hits from which the tracklets are started.

Figure 8 .
Figure 8. Left: cartoon showing an example of the intersection of a stereo STT straw tube (red cylinder) with the cylinder on which the helix trajectory lies (yellow surface); right: projection on the later surface of the helix cylinder of a helix trajectory.The stereo STT straw (≡ skew STT) intersections look like ellipses squeezed in the vertical direction because of the small stereo angle (3 degrees).One can notice one MVD spurious hit that was eliminated by an annealing filter type of algorithm.

Table 2 .
Performance of the parallel algorithm run on NVIDIA Tesla K20K GPUs.

Table 5 .
CPU time consumption per track per event for an interaction rate of 2 and 20 MHz; MVD+STT hit cleanup procedure.