AREUS: A Software Framework for ATLAS Readout Electronics Upgrade Simulation

Abstract. The design of readout electronics for the LAr calorimeters of the ATLAS detector to be operated at the future High-Luminosity LHC (HL-LHC) requires a detailed simulation of the full readout chain in order to find optimal solutions for the analog and digital processing of the detector signals. Due to the long duration of the LAr calorimeter pulses relative to the LHC bunch crossing time, out-of-time signal pileup needs to be taken into account. For this purpose, the simulation framework AREUS has been developed. It models analog-todigital conversion, gain selection, and digital signal processing at bit precision, including digitization noise and detailed electronics effects. Trigger and object reconstruction algorithms are taken into account in the optimization process. The software implementation of AREUS, the concepts of its main functional blocks, as well as optimization considerations will be presented. Various approaches to introduce parallelism into AREUS will be compared against each other.


Introduction
In order to reach the ambitious goal of collecting data equivalent to 4 ab −1 of integrated luminosity until 2038, the LHC is scheduled to undergo two major upgrades [1]. These will be taken during two long shutdown phases: the LS2 in 20192020 and the LS3 in 20242026. The end of LS3 marks the beginning of the High-Luminosity LHC phase (HL -LHC). After this point, the peak luminosity is expected to reach a maximum of 75 nb −1 s −1 , a value about 4 times higher than today. At the same time, the number of protonproton interactions per bunch crossing is expected to increase by a factor of 4 to 5, up to a value of 200.
In order to keep up with the increased luminosity, the ATLAS detector [2] will undergo upgrades at the same time as the LHC: The Phase-I upgrade during LS2 [3] and the Phase-II upgrade during LS3 [4]. During the Phase-II upgrade, the full liquid argon (LAr) calorimeter readout will be replaced. This is necessary because the old electronics have only been qualified for a radiation exposure equivalent to 1 ab −1 of integrated luminosity. Furthermore, the new electronics are to supply more fine-grained information to the trigger. Finally, they must accommodate an increase in the trigger rate (from 100 kHz to 1 MHz) and a larger data buffer due to an increased over-all latency on the lowest-level trigger (from 2.5 µs to at least 10 µs).
The greatest challenge for the precise energy measurement of signals in the LAr calorimeter subsystem (shown in Figure 1a) is the increased number of interactions per bunch crossing. These bunch crossings occur once every 25 ns, whereas the electronic pulses caused by   Techniques that reduce pileup effects often exacerbate electronics noise and vice versa, so an optimum has to be found. In order to minimize the combined impact of electronics noise and pileup effects, the LAr upgrade requires a detailed simulation of the new readout electronics. This simulation needs to be modular so that new filtering algorithms can be implemented and studied in a timely manner. It also needs to be flexible enough that simulated electronics can be replaced with data from real, prototype components as they become available over time.
This simulation is provided by the AREUS framework, which is described in Section 2. In Section 3, implementation details on AREUS' architecture are given. In Section 4, a custom smart pointer class as an example for the optimization considered in AREUS is presented. In Section 5, various approaches to apply parallelization to AREUS are considered and weighed against each other. Finally, a summary is given in Section 6. [10]. It directly depends on the Boost [11] and ROOT [12] libraries for common functionality and Git [13] for version control. It can be built using the GNU build system [14][15][16] or CMake [17]. Other dependencies are bundled into the source repository, either literally or as Git submodules:
In addition, it provides a Python [22] and an ATHENA [23] Figure 3. Exemplary output of AREUS, a correlation plot between E in , the energy of the cell hits, and E out , the corresponding reconstructed energy. This takes the filter-induced delay into account. The color represents the number of entries in each bin. The bins right at the X-axis from 0 to 3 GeV show undetected events. detector cells that have been hit with a certain amount of energy (cf. Figure 1b). AREUS simulates each of the calorimeter cells separately and forwards each hit to the corresponding cell. Figure 2 illustrates how AREUS processes each cell hit. It calculates the pulse induced by the hit, accounting for effects such as nonlinearity of the amplification stage or electronics noise that follows an arbitrary spectral density. This pulse is then overlaid with any pulses from previous hits in order to simulate pileup effects 1 . After this, the pulse is digitized both in time and amplitude, considering quantization noise and the ADC's finite range and resolution. The samples are subsequently sent to the digital filters, which can be chained by the user in an arbitrary manner. The result of this filter chain should be close to the original series of hits, except for a small, filter-induced delay. AREUS compares these two sequences on-line and produces a collection of diverse histograms detailing the performance of the filter chain. Two examples of these histograms are shown in Figure 3 and Figure 4. Figure 5 shows an analysis that aggregates the results of multiple AREUS runs and has been used in the technical design report for the LAr Phase-II upgrade[4, p. 32].

Implementation of AREUS
The modularity of AREUS is achieved through the Observer pattern [26, pp. 293-303] as well as a strict separation of concerns [27, p. 61]. The core library, called Common, contains only utility classes used by all other libraries as well as message classes used to interface the other 1 To do so, cells need to retain their state between events for a duration at least as long as an electronics pulse: 450 ns, or equivalently 18 bunch crossings.   This shows the total noise due to both electronics noise and pileup effects. Each of the points represents an individual AREUS run. Two parameters have been varied: µ, the average number of proton-proton interactions as a measure of pileup effects; and τ, the RC time constant of the bandpass filter that is employed in the LAr analog electronics. libraries (cf. Figure 6). The other libraries communicate with each other using the message classes from Common (cf. Figure 7). The advantage of this is twofold: 1. the libraries do not need to know about each other, only about Common; 2. the user can combine objects from these libraries to arbitrary chains (and even graphs) as long as the message types sent between any two connected objects match.  The second point is most important for the digital-filter chain, which may consist of many small and independent algorithms using the same message class. Figure 8 shows the exact procedure through which AREUS connects objects from different libraries.  Figure 7. The Observer pattern as implemented in AREUS. Execution starts in a root object that loops over events. On each event, it notifies its observers, which recursively notify their observers. Pre-and post-processing of events is done via inheritance from the special class EventListener, which is backed by a global registry object.

AREUS Smart Pointers
Due to the inter-library communication in AREUS, many of its runtime objects do not have a clear hierarchy of ownership and reference counting has to be used throughout the code base. To this end, AREUS uses a custom smart-pointer system based on two pointer classes: CSharedPtr and CPersistentPtr. Though unconventional, this design decision has several advantages over using smart pointers from Boost or the standard library: • AREUS' smart pointers are intrusive, i.e. the reference count is a part of the object that is being reference-counted. This choice avoids allocation of many small memory blocks and so decreases the chance of heap fragmentation.
• The reference counts are regular integers if AREUS is compiled for single-threaded usage (cf. Section 5 for details) and atomic integers otherwise. In contrast, the standard-library class std::shared_ptr always uses atomic integers 2 . Accessing atomic integers usually is slightly slower, though on some modern architectures, this additional cost may avoided via hardware lock elision [28].
• AREUS' smart pointers provide a runtime mechanism to guarantee data consistency. Whenever a CPersistentPtr starts pointing at an object, a flag inside the object is set that marks the object as immutable. Any further modification of such an object results in an assertion failure. As a result, objects are immutable as soon as code assumes they are immutable. Violations of this assumption are caught early and with a clear error message.

Parallelization Opportunities
By default, AREUS runs in single-threaded mode, taking a set of input files and producing a set of output files. On modern machines with multiple CPUs, this is often wasteful and makes run time longer than necessary. Single-CPU clock speeds have stopped increasing more than a decade ago and most gains in computing nowadays result from parallelization and vectorization [29, p. 9]. There are several methods to make use of more than one core per CPU with AREUS, each with its advantages and disadvantages: Multiprocessing: This is the simplest method of parallelization and requires no changes to the program at all. It simply means to split the user's data into batches and run one instance of AREUS on each of them in parallel. When comparing different setups, one may also run multiple instances of AREUS on the same data, each using a different configuration file. One of the disadvantages of this method is that the user has to do additional data pre-and postprocessing. Another one is that computing time is wasted if the run time of parallel AREUS configuration varies a lot -most cores will be idle while the user waits for the longest run to finish.
Vectorization: Another method is data-level parallelism, i.e. using a CPU's SIMD instructions (single instruction on multiple data), which are available on virtually all modern processors. This technique may speed up array-based algorithms by a factor between two and four. It can be used on a small scale and replace unvectorized algorithms without modifying the overarching architecture of the program. In certain situations, optimizing compilers can even apply it automatically (auto-vectorization). AREUS makes use of it where possible e.g. by using VDT [21] and by writing algorithms in a way that facilitates autovectorization. However, this method is inherently limited in scope and cannot, for example, parallelize independent portions of the simulation.
Task-level multithreading: AREUS' Thread library, which can be enabled with a compiletime flag, allows the user to parallelize the calorimeter cell simulation. This is possible because each of these simulations is independent of the others. However, no appreciable performance gain could be observed when introducing this library -possibly because the programs was not profiled sufficiently and the actual bottlenecks lie elsewhere. But even if there was a gain, the considerable number of users that simulate only a single cell per run would not profit from this optimization.
Stage-level multithreading: Another approach, that has yet to be investigated for AREUS, would go a step further and also parallelize the different stages of AREUS' processing chain. AREUS' architecture already is well-suited to such an approach due to its extensive use of shared ownership and message passing.

Conclusions
The High-Luminosity LHC is expected to start operation in 2026 and the electronics of AT-LAS' liquid-argon calorimeters will be upgraded for this purpose in two phases. AREUS is a valuable tool that has been applied successfully in technical design studies for both the Phase-I upgrade [3, pp. 97-103] and the Phase-II upgrade[4, pp. 31-36]. It faithfully simulates pileup effects and allows developers to quickly implement and investigate digital-filter https://doi.org/10.1051/epjconf/201921402006 CHEP 2018 algorithms through its flexible and modular design. Its biggest challenge with respect to computing is handling parallelization in the face of strong data dependencies and a deep stack of function callbacks.
The most promising approach to improve simulation speed is data-level parallelism through algorithms that facilitate automatic vectorization by the compiler. Another, more radical approach is parallelizing the distinct simulation stages through multithreading.
This work was supported in part by the German Bundesministerium für Bildung und Forschung (BMBF) within the FIS research grant 05H15ODCA9.
Copyright 2018 CERN for the benefit of the ATLAS Collaboration. Reproduction of this article or parts of it is allowed as specified in the CC-BY-4.0 license.