A High speed data link optimization for digitalized transfer to processing FPGA

State-of-the-art arrays of detectors, that require digital processing, may have a sizeable number of digitalized signal links. This is the case in several experimental nuclear physics instruments. Moreover, the data rate of the sampled signals, defined primary by the signal bandwidth of the individual detectors, may not exhaust the capabilities of a single FPGA transceiver input. The preprocessing is usually carried out in a modern FPGA with transceiver data rate capabilities over 10Gbps. Moreover, cost effective FPGA have a limited number of transceivers for given FPGA processing capabilities. The investigation of a cost-effective and efficient solution to the mismatch between both data rates, optimizing simultaneously the use of the FPGA resources, is the topic of the present work. We have developed a solution based on the Time Domain Multiplexing link aggregation, in the form of a Mezzanine board. This mezzanine combines four channels from an optical or copper input up to 2.5 Gbps to one up to 10Gbps, and serves them to the FPGA via the mezzanine connector. The board itself is controlled by a small FPGA by the Two Wire Interface (TWI) protocol as a standalone intelligent device, so minimum slow control is needed. The solution has been also developed for a motherboard housing a SoM module and FMC connector as an alternative implementation. An associated firmware has been developed to de-aggregate the data in the FPGA and recover the original sampled data, based on JESD204 communication protocol, inside the FPGA. The method has been validated and applications, beyond the development of the AGATA electronics, may be envisioned.


I. INTRODUCTION
HE main motivation for this work is the development of the new Phase 2 electronics for the AGATA experiment [1]. The AGATA detector is an array of segmented high purity germanium crystals for gamma spectroscopy. The segmentation is required to grant position sensitive capabilities to each crystal, and in sum with all the array, being capable to perform tracking algorithms. This is a huge advance in nuclear science gamma spectroscopy because it allows for searching of gamma multiplicities and angular correlations, in addition to an unprecedented energy resolution. These crystals are segmented in 36 segment contacts plus a core contact. To perform the positioning of an event on the crystal, a processing of the pulse shape on each segment is needed. [2] This implies a digitalization of each contact signal. In the case of AGATA, the sampling is performed with 14 bit (for an ENOB of 12) at 100MHz, in free-running operation, due to the bandwidth of the detector and preamplifier response, limited to about 30 MHz. Each ADC has a link to the preprocessing system at 2Gbps using the JESD204 protocol. [3] As in many instrumentation electronics for these kind of experiments we just need the information when an event arrives to the detector, therefore a triggering is performed on the preprocessing electronics. [4] There are also metrics needed for each event, as the energy or the timing for each event. The triggering, energy extraction, packaging of traces and readout is performed in real time using an FPGA. On AGATA each FPGA is capable of perform this preprocessing for one entire crystal, which implies that 38 transceivers are needed to receive serialized data from the ADCs. There are detector arrays with even a higher amount of ADCs to be processed in the FPGA. All these issues lead to the main problem solved in this work.

II. PROBLEM
The problem presented in this electronics development was the high bandwidth mismatch between the ADC boards and the processing FPGA. As already mentioned, the detector preamplifier response is 30MHz and the sampling frequency of the digitizer to acquire this bandwidth is 100MHz, in order to obtain the data precision required to carry out positioning algorithms. A serialized data output from 8 to 16 bit requires about 1-2Gbps. In the case of AGATA there is a 2Gbps JESD204 encoded differential line per ADC. As a pair of ADC are integrated in each IC it is possible to send data as fast as 4 Gbps in each link in pair. Being AGATA case an example, there are many configurations with the same high speed side data link bandwidth, depending on the ADC sample rate, the number of bits and the amount of digitalizes per integrated circuits.
All cases require a certain amount of links between the ADCs and the destination FPGA for real time processing. In the segmented array detectors, or similar instrumentations, this A High speed data link optimization for digitalized transfer to processing FPGA J. Collado 1,2 , V. Gonzalez ,2 , and A. Gadea 1 1 ETSE Universidad de Valencia, Spain 2 IFIC-CSIC, Spain javier.collado@uv.es T becomes a critical element. While FPGA processing is possible in parallel on the same device for huge amount of channels, the number of input transceiver is limited in the device and has a direct impact on cost. Nevertheless, the FPGA transceivers have an input data rate maximum much higher than 1-5 Gbps, in modern FPGA is about 10-40Gbps depending of the technology used [5], [6].
An optimization of the aggregated bandwidth has benefits on two parts of the developments. The first benefit is the direct reduction of FPGA cost and design, allowing the use of lower performance FPGA. In the case of AGATA only one FPGA is capable to preprocess the 38 channels of a segmented crystal in one mid-low performance FPGA as a Xilinx Zynq Ultrascale+ XZU15 [7]. The second benefit is to open the possibility to use SOM, COTS boards. Such SOM modules (using high speed connectors or FMC) include a reduced number of high speed channels depending on the model and connector.

III. SOLUTION
A logical solution for the described problem is to multiplex incoming channels. The simplest option for this solution is to perform a Time Domain Multiplexing (TDM). In this case we can combine several lines from 1-5Gbps to a higher speed link of about 5-20Gbps.
The solution could be implemented in several ways. The first possibility is to implement the TDM in low-cost FPGAs, with a limited number of transceiver . The second possibility is to search for specific commercial ASICs used in other applications to perform the TDM.
This second solution led us to explore the possibility to use Ethernet ASICs for communication systems. In particular, there is a TDM family from Texas Instruments that performs a 2:1, 4:1 or 8:1 multiplexing for 1-5 Gb serial links. Their primary function is to aggregate Ethernet links to reduce the connection number. For our implementations TLK10002 and TLK10022 [8] are selected. They perform a 4:1 aggregation of up to 1-2.5 Gbps to 10Gbps maximum per link.
For the development, there is a huge gap to cover due to the nature of these ASICs as they are meant to work in pairs with one device aggregating and sending data through a long transmission lines and another device at the receiver to recover the original links. Each of them has a Tx and Rx part to operate in full duplex mode. Our development includes the FPGA firmware reception side development to disaggregate the data inside the FPGA, in order to obtain the advantage of the transceiver number reduction.
As mentioned before, the use of our development for different instruments and applications has been considered. In order to do that, two implementations, allowing to extend the use for several configurations of data rates for input and output, have been developed.

IV. IMPLEMENTATION
We have implemented two options for delivering data to the FPGA. One is an FMC VITA 57.1 standard [9] mezzanine to connect to a custom motherboard with the selected FPGA or to an evaluation board in case it's used in the development. The second implementation is a motherboard with a SOM socket to host the selected FPGA, including also an FMC connector to perform the readout from the FPGA, as in the case of AGATA.
The data input is done by either optical or copper links from the ADC boards. To maintain compatibility, both boards have been designed to cope with these two possibilities, optical and copper connection.
Both implementations have been developed looking to comply as much as possible to the standards. The have been designed and simulated to work with very low jitter and synchronized clocking and stable data links.

A. FMC version
The FMC version is called IDM (Input Data Mezzanine) [10]. This board is built to minimize the form factor. Nevertheless, is not a standard FMC form factor but slightly wider. This issue is foreseen to be corrected in future versions of the board.
This mezzanine implements a megArray 9x9 [11] connector for input data and are fully compatible with miniPod [12] for optical transceiver. This miniPod can be easily converted to 12 line MTP optical fibers or multiple LC if needed. Each of the 4 input connectors is capable to receive 12 lines. Although, only five TLK10022 are used, each of them is capable of perform 2 aggregations (channel A and B). Therefore, we can process up to 40 lines and aggregate them into 10. These 10 high speed serial lines are connected to a High Pin Count (HPC) FMC connector as is the VITA57.1.
The board also includes a small form factor, low power and low cost FPGA to perform slow control and autonomous initialization. The board is seen from the FMC I2C lines as one device with all the element memory mapped on this FPGA. It has also alerts for temperature and voltage anomalies.
On the AGATA environment the board has been tested aggregating from 40 differential lines of 2Gbps to 10 differential serial lines of 8Gbps. [13]

B. Motherboard version
The second implementation is built as a motherboard (called PACE-CAP) with a socket for a TE0808 SOM from Trenz Electronic GmbH [14], demonstrating the possibility of integrating aggregation and processing on the same board. In our case, this board is the final design for AGATA Phase 2 electronics and implements an FMC connector to send the processed data through Ethernet using a custom board called STARE [15].

C. Firmware
One of the main developments is the firmware to recover the data on the FPGA, on this case is done for the JESD204B [16] protocol, but the IP from the disaggregation is standard to any 8b/10b based communication protocol.
The firmware is capable also to perform alignment of the 38 ADC channels for the AGATA digitalizes.

V. RESULTS
The IDM board is built using a High Density Interconnect process to ensure the minimal size to cope with FMC size. Both boards are designed carefully with previous simulations on the high speed signals parts.
There is a test bench designed for each of the boards with a setup of AGATA digitizers. On the IDM case, the mezzanine is connected to an evaluation board, Trenz TEBF0808, with a TE0808 SOM board as FPGA. The data is extracted by internal memory reading.
The testbench for the PACE-CAP is done with a set of AGATA Phase 2 digitizers in final configuration. This board is also capable to perform the slow control of the ADC and the general clocking.
Both boards were tested with stable links over periods longer than 24h. The IDM has been tested with optical connections and the PACE-CAP is tested using copper links. Future tests may include copper connections for the IDM and optical ones for the PACE-CAP.

VI. CONCLUSIONS
Our implementation of the Time Domain Multiplexing has probe to be a valid method to aggregate data from a large amount of ADCs (30-40) optimizing the FPGA transceiver occupancy. The 40 serial lines from 2Gbps links are aggregated up to 10 links and data is recovered and synchronized. The two possible boards are valid and stablish a solution that cope with other complex instrumentation involving many channels of sampling electronics Future developments are on the way to optimize the IDM to fit in a standard FMC form factor and updating the input connections to extent compatibility. On the PACE-CAP, there are some implementations on study to extent its capacity to new configurations of line aggregation beyond 2:1 and opening the use for different applications.