The TOTEM DAQ based on the Scalable Readout System ( SRS )

The TOTEM (TOTal cross section, Elastic scattering and diffraction dissociation Measurement at the LHC) experiment at LHC, has been designed to measure the total proton-proton cross-section and study the elastic and diffractive scattering at the LHC energies. In order to cope with the increased machine luminosity and the higher statistic required by the extension of the TOTEM physics program, approved for the LHC’s Run Two phase, the previous VME based data acquisition system has been replaced with a new one based on the Scalable Readout System. The system features an aggregated data throughput of 2GB/s towards the online storage system. This makes it possible to sustain a maximum trigger rate of ∼ 24kHz, to be compared with the 1KHz rate of the previous system. The trigger rate is further improved by implementing zero-suppression and second-level hardware algorithms in the Scalable Readout System. The new system fulfils the requirements for an increased efficiency, providing higher bandwidth, and increasing the purity of the data recorded. Moreover full compatibility has been guaranteed with the legacy front-end hardware, as well as with the DAQ interface of the CMS experiment and with the LHC’s Timing, Trigger and Control distribution system. In this contribution we describe in detail the architecture of full system and its performance measured during the commissioning phase at the LHC Interaction Point.


Introduction
The TOTEM (TOTal cross section, Elastic scattering and diffraction dissociation Measurement at the LHC) [1] experiment at LHC has been designed to measure the total proton-proton cross-section with a luminosity independent method, based on the optical theorem, and to study the elastic and diffractive scattering at the LHC energies.
To perform these measurements, TOTEM requires a good acceptance for particles produced at very small angles with respect to the beam.TOTEM's coverage in pseudo-rapidity spans the ranges 3.1 ≤ |η| ≤ 4.7 and 5.3 ≤ |η| ≤ 6.5 on both sides of the Interaction Point (IP); this is accomplished by two gas detector telescopes, named T1 and T2.The T1 and the T2 telescopes adopt respectively Cathode Strip Chambers (CSC) and Triple Gas Electron Multiplier (GEM) chambers which are able to detect inelastically produced charged particles.The inelastic telescopes are complemented by silicon detectors housed in special movable structures embedded in the beam-pipe, called Roman Pots (RP).Two stations, each composed of 6 RP, are placed at about 210m and 220m on both sides of the IP.The TOTEM's RP are designed to detect leading protons down to a few mm from the beam centre.The layout of the TOTEM apparatus is shown in Figure 1.
All TOTEM's detectors adopt the same front-end chip despite their different technologies.The chip, called VFAT2 [2], has 128 analogue input channels and digital serial readout interface.a e-mail: michele.quinto@cern.chFigure 1: TOTEM experiment apparatus layout TOTEM has already measured elastic, total and diffractive dissociation cross sections at the energies explored during the LHC's Run One phase.The future physics programme of TOTEM requires an increase of statistics by a factor 10 to 100.This goal has to be reached by minimizing the data taking time; for TOTEM, this means to take the maximum advantage, in terms of statistics, from the few special runs in which a machine optics configuration, reserved for TOTEM, is provided.In this framework a consolidation program [3] has been approved for TOTEM.The program includes the data acquisition system upgrade whose main requirement is to increase by more than one order of magnitude the experiment trigger rates.

Data acquisition system upgrade
In the legacy TOTEM DAQ architecture, optical receiver mezzanines (OptoRx), collecting data from detectors are hosted on Versa Module Eurocard (VME) boards.Each OptoRx is able to handle 12 Gigabit Optical Hybrid (GOH) [4] links running at 800Mb/s.Having up to 16 VFAT readout chip connected to one GOH the amount of data processed by the OptoRx is ∼ 4.7KB per event.Table 1 shows the event raw data size in detail for each of the TOTEM's detector.The data throughput on the VME bus is the bottleneck of the system: the maximum VME transfer rate of 23MB/s translates into a maximum trigger rate of 1KHz for the experiment.
A new DAQ architecture, shown in Figure 3, was proposed to remove the system bottleneck by replacing the VME interface with the Scalable Readout System (SRS) [5] components which provide a faster and cost effective transmission medium.In the new system OptoRx modules are plugged onto a custom designed card, named Opto-FEC, which allows the connection with the SRS Front-End Concentrator (FEC) board.Figure 2 shows a FEC card and an Opto-FEC card connected via PCI edge connector.An OptoRx mezzanine is plugged onto the Opto-FEC.
Data received from the OptoRx are processed in the FEC and formatted using the User Datagram Protocol (UDP) protocol.Up to 16 FECs are read out and controlled by 4 PCs via of point-to-point connections.A Scalable Readout Unit (SRU) is connected to all FECs by means of Data, Trigger, Clock and Control (DTCC) links [6] and acts as system master fulfilling the following tasks: • receive data from the LHC Timing, Trigger and Control (TTC) system • distribute the machine clock, the Level One trigger signal and the fast-commands (Resynch, Bunch Crossing zero) to each FEC and to each OptoRx • receives the Trigger Throttling System (TTS) data from each FEC Figure 3: TOTEM DAQ architecture scheme using the SRS. •

System design and verification
The SRS firmware for TOTEM has been developed using System Verilog [7].This allowed to integrate hardware description and verification in the same standard language.Although System Verilog is relatively new and Electronic Design Automation (EDA) tools supporting it are not fully mature, the language gradually gains attention in the industry thanks to its compactness and syntax structures which translate into more reusable and less error prone code.The usage of the SystemVerilog has implied a complete re-factory of the firmware shipped with the SRS system.
The architecture of the new FEC firmware is presented in Figure 4.It consists of two blocks: System Unit and Application Unit.The System Unit provides a set of common interfaces and services to application specific part which is included in the Application Unit.Such a distinction allows independent and parallel development of modules in the System Unit and in the data processing part enclosed in the Application Unit.The interconnections between all modules are implemented using standard buses from the Advanced Microcontroller Bus Architecture (AMBA) family [8]: • Advanced High-performance Bus (AHB) for dedicated for memory-mapped modules such as registers and peripherals; • Advanced eXtensible Interface 4 Stream (AXI4-Stream) unidirectional data push interconnection for modules exchanging stream of packets.
Although the firmware is mainly developed in Sys-temVerilog, the design still uses legacy, verified code.In A first SRS based demonstrator was build and tested before the end of LHC's Run One.The new DAQ platform was proved to run stably for several hours reaching ∼ 24Khz average trigger rate.Such limitation is set by the Gigabit Ethernet bandwidth that saturates at about 120MB/s when jumbo packet of 4.7kB are transferred.

Hardware data processing
In order to improve the rate limitation of 24kHz, and with the aim of increasing not only the trigger rates but also the quality of acquired data samples, data processing algorithms have been studied and implemented on Field Programmable Gate Arrays (FPGA).The algorithms are based on cluster reconstruction and pattern recognition techniques tailored to the Roman Pot silicon detector layout.The cluster reconstruction consists in translating the VFAT channels hit pattern into a list of clusters.Each cluster is a 2-bytes information: the first byte contains the cluster starting strip position, the second byte contains the number of adjacent strips that are active.After having studied data collected by TOTEM during Run One, under beam conditions similar to the ones expected in Run Two, the cluster reconstruction was proved to be effective in reducing the event size by up to one order of magnitude.Figure 5 shows the histogram of the number of clusters in 10 RP detector planes extracted from a real data sample.The distribution shows that most of the events have less than ∼ 24 clusters leading to a reduction of the event size of a factor from 5 to 10.
Pattern recognition algorithms have been studied as well in order to further improve data purity by introducing second level trigger event selection.These algorithms, based on pattern-recognition techniques, are able to identify particle track segments by looking at coincidences between detector planes.The reconstruction of track segments within detector planes is actually the first step to identify particle track candidates which are intercepted by more than one pot along the beam line, a similar pattern recognition technique is implemented in the TOTEM offline analysis framework.The hardware algorithms were implemented in the FEC Virtex 5 FPGA and were proved to be as efficient as the offline reconstruction algorithms.A comparative analysis between software and hardware algorithms was possible thanks to the usage of advanced firmware simulation techniques which exploit SystemVerilog in verification environment.Universal Verification Methodology (UVM) [9] test-benches were built to simulate the full DAQ chain from the VFAT output up to the FEC board using real data as test vectors.This allowed both to test extensively design functionalities and to tune the hardware selection algorithm on real data samples.
The online algorithms will be further extended in the prospect of implementing an online event selection at the SRU level in which tracking information from the full Roman Pot detector can be combined.

System commissioning and operation
The full system has been installed and commissioned in the experimental area during August 2015.Figure 6 shows part of the SRS installation.The system is composed of 16 FEC modules and one SRU.Every FEC module is equipped with one OptoRx receiving data from 3 full RP detectors containing about 120 VFATs.Up to 4 FECs  are read out by a standard PC, running the DATE software developed by the ALICE Collaboration [10].The readout PCs stream the data to 12 event builder processes running on 5 different storage servers.Each server has a storage capacity of ∼ 70T B and can achieve up to 1GB/s data throughput in write mode.This configuration allows TOTEM to run continuously for a few days at full rate.
The system is being operated during the TOTEM's special runs, it has allowed the experiment to reach ∼ 77kHz (on average) trigger rate, which is about 2 order of magnitude higher than the performance of the TOTEM legacy DAQ.The Figure 7 shows a screen-shot of the DATE data acquisition software GUI during data taking operation.The two columns show the mean performance of two out of sixteen Local Data Collector (LDC) readout processes; each software instance reads out data from a single FEC board.

Figure 5 :
Figure 5: Histogram of the number of clusters per plane in the TOTEM RP detector.

Figure 6 :
Figure 6: TOTEM DAQ installation of two SRS crates and one SRU module at IP5.

Figure 7 :
Figure 7: Screen-shot of DATE software GUI.Three LDC status displays showing the trigger rates achieved reading out 12 FECs.

Table 1 :
TOTEM Detectors raw data frame size.
merge 16 TTS connections into a single one and send it back to the TTC system Detector VFAT per ORx N. of ORx Ev.Size [kB]