Implementation of the ATLAS trigger within the multi- threaded AthenaMT framework

Abstract. We present an implementation of the ATLAS High Level Trigger (HLT) that provides parallel execution of trigger algorithms within the ATLAS multithreaded software framework, AthenaMT. This development will enable the HLT to meet future challenges from the evolution of computing hardware and upgrades of the Large Hadron Collider (LHC) and ATLAS Detector. During the LHC data-taking period starting in 2021, luminosity will reach up to three times the original design value. In the following data-taking period (2026) upgrades to the ATLAS trigger architecture will increase the HLT input rate by a factor of 4-10, while the luminosity will increase by a further factor of 2-3. AthenaMT provides a uniform interface for offline and trigger algorithms, facilitating the use of offline code in the HLT. Trigger-specific optimizations provided by the framework include early event rejection and reconstruction within restricted geometrical regions. We report on the current status, including experience of migrating trigger selections to this new framework,and present the next steps towards a full implementation of the redesigned ATLAS trigger.


Introduction
Individual collision events in a particle collider can be treated as independent of the previous. In practice, there are time-dependent variations in detector response coming from e.g. outof-time pile-up and bunch train position, but events can still be simulated or reconstructed independently from one another (taking appropriate account of detector conditions). LHC computing therefore parallelises naturally by event, and indeed historically LHC event processing frameworks have processed single events serially on a single CPU core, with events distributed between independent processing node. This allowed for the success of grid computing, with event processing streamed to different grid sites around the world. Unfortunately, processing one event on each CPU core does not match current trends in computer architecture: the number of CPU cores available in a standard compute node has increased, but the amount of memory per core is increasing at a lower rate. Reducing the memory requirement is only possible by sharing memory between CPU cores. Sharing event processing between multiple threads ("multi-threading) allows more memory to be shared between cores, reducing the overall memory footprint per core. Additionally, co-processors such as Graphical Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs) can be used most effectively for asynchronously offloading compute-intensive tasks, freeing up CPU cores for work better suited to the CPU. Reducing the memory usage per core and effectively using co-processors is important for future LHC running conditions in Run 3 (2021-2023) and Run 4 (2026Run 4 ( -2029. The future running conditions and upgrades to the ATLAS [1] detector are detailed in [2] and [3], while the adaptation of existing software to a multi-threaded environment requires changes at the framework level as detailed in the next section.

ATLAS Software in Run 2
The ATLAS software framework Athena [4] is mainly written in C++, with a configuration layer written in Python. The underlying Gaudi framework [5] is shared with the LHCb experiment. Gaudi defines the basic classes used for event processing, and also provides the scheduler for optimal algorithm execution. In Run 2, the Gaudi scheduler was used by offline reconstruction, but the HLT used a custom layer known as "Trigger Steering". This was needed to implement additional HLT functionality: data flow, control flow and regional reconstruction to minimise processing time. Data flow refers to the configuration of each sequence of trigger algorithms with explicit input and output data, to remove duplicate algorithms and minimise processing time. Control flow refers to the ability to flag each event for acceptance/rejection, and to apply the flag for each algorithm. Finally, regional reconstruction refers to the ability to restrict event processing to a smaller geometrical Region of Interest (RoI). The Trigger Steering layer fulfilled these requirements successfully during Run 1 and Run 2 but with a significant development and maintenance overhead.

ATLAS Software for Run 3
During Run 2 a design effort for a new software framework, suitable for Run 3 and beyond, was undertaken. The key requirements were: • to reduce maintenance overhead.
• to make effective use of hardware.
As already discussed, effective use of modern multi-core machines requires multi-threading. To reduce the maintenance overhead, a common multi-threaded framework, AthenaMT [6] was proposed. From the beginning, the framework was designed to meet offline and trigger requirements, eliminating the need for a custom trigger-specific layer. Data and control flow, as well as regional reconstruction were designed to be part of the scheduler.
Data flow is expressed via data dependencies, in the form of ReadHandles and WriteHandles. An algorithm takes one or more inputs ReadHandles (either from a previous algorithm or directly from the initial detector data) and performs a transformation to create one or more output objects, published via WriteHandles. This information is known before the execution of the first event, and is used by the Run 3 Gaudi scheduler [7] to generate a directed acyclic graph 1 of dependencies controlling the order in which algorithms run for each event. The graph is fixed during initialisation and is then the same for all processed events. Algorithms with no common dependencies (e.g. tracking and calorimeter reconstruction) can run in parallel. A simple example is shown in figure 1 for an electron algorithm with independent tracking and calorimeter reconstruction. Control flow is a set of conditions (AND/OR) which allow the scheduler to avoid unnecessary processing. For instance, if an event contains only low momentum jets, subsequent algorithms (e.g. jet substructure calculations) can be skipped. An example for jet reconstruction is shown in figure 2. Note that control flow is also applied if a complete selection is disabled during the run -in this case the selection remains in the graph but is always rejected. Regional reconstruction is expressed via EventViews, which confine an algorithm to a given geometric region. This is designed to be entirely transparent to the algorithm, which simply requests data via ReadHandles, and receives only the data in the given region.

Menu and configuration
During Run 2, more than 2000 unique selections (e.g. one electron above 20 GeV, one electron and one muon, one electron and missing energy) were used. Each unique selection is known as a chain, and the set of chains is known as a menu. The configuration for each selection was generated by a large (several hundreds of thousands of lines) Python package known as TriggerMenu. It is important to know exactly which menu is used both for data-taking and production of simulated samples, so the configuration is stored in a database [8]. This configuration system also allows for chains to be enabled or disabled during the run, allowing e.g. for more expensive selections at lower luminosity. As part of the software upgrade for Run 3,

Integration with the online infrastructure
The AthenaMT software framework must interact with the online trigger and data-acquisition infrastructure, as well as new hardware triggers installed for Run 3. During Run 2, each node in the HLT forked multiple sub-processes thus sharing memory between them via the copy-on-write mechanism [9]. In Run 3, each forked sub-process will additionally run with multiple threads, and potentially process multiple events across these threads. This has a few implications: • Tuning of the number of forks, threads and concurrent events will be necessary to ensure maximum performance.
• If a sub-process crashes or takes too long to process, all concurrently processed events must be saved for offline reprocessing.
Due to rolling replacements of hardware, there are several different generations of machine in operation in the ATLAS HLT farm. In principle the different parameters could be tuned according to machine generation.
A simplified example of three concurrently processed events, each taking more or less time to process, is shown in figure 3. The job crashes during the processing of events #4, #5 and #6, so the processing must be stopped for all three event slots, and the events flagged for later reprocessing.