Key4hep: Progress Report on Integrations

Detector studies for future experiments rely on advanced software tools to estimate performance and optimize their design and technology choices. The Key4hep project provides a flexible turnkey solution for the full experiment life-cycle based on established community tools such as ROOT, Geant4, DD4hep, Gaudi, podio and spack. Members of the CEPC, CLIC, EIC, FCC, and ILC communities have joined to develop this framework and have merged, or are in the progress of merging, their respective software environments into the Key4hep stack. These proceedings will give an overview over the recent progress in the Key4hep project: covering the developments towards adaptation of state-of-the-art tools for simulation (DD4hep, Gaussino), track and calorimeter reconstruction (ACTS, CLUE), particle flow (PandoraPFA), analysis via RDataFrame, and visualization with Phoenix, as well as tools for testing and validation.


Introduction
Detector studies for future experiments require advanced software tools to optimize their design and technology choices and to estimate their performance.These advanced software tools must include the possibility for full or parameterized detector simulation, the reconstruction of tracks and calorimeter clusters, jet clustering, flavour tagging, and analysis.A combined solution for all these issues should also allow experiments to move seamlessly from different stages of their life cycle: for example, from parameterized detector studies to find a performance envelope of their experiment to full and detailed simulation and reconstruction  studies to confirm the validity and feasibility of the assumptions used in the parameterized simulation.The consistency of the framework also allows experiments to extract performance parameterizations from detailed simulation studies to create large scale samples that are not feasible for small communities with limited computing resources available to them.
The Key4hep project provides a solution for these use-cases by means of a structured software stack, which integrates individual packages towards a complete data processing framework for HEP experiments.The sharing of common components will reduce the overhead faced by different communities otherwise.Moreover, Key4hep aims to provide an easy-to-use product for librarians, who provide software installations, the developers creating or adapting components, and the endusers.
Figure 1 schematically shows the three major ingredients for the project: a processing framework that connects all the pieces; a way to describe the geometry of the experiments and use the information for simulation, reconstruction or analysis; and an event data model to exchange data between the pieces, or for persistency.For Key4hep the processing framework is Gaudi [1], the geometry information is provided via DD4hep [2,3], and the event data model is provided by EDM4hep [4][5][6] and podio [7,8].
The current contributors and users of the Key4hep ingredients are part of the CEPC, CLIC, EIC [9], ILC, FCC, and Muon collider communities.The source code for all components under direct development of the Key4hep project is hosted on GitHub 1 .Weekly open meetings are held to discuss ongoing developments and issues in Key4hep.Newcomers are always welcome to join the meetings or contribute to the developments.
As the goal of the project is not to develop all components itself, but as much as possible reuse existing solutions, this paper, in Section 2, describes the current status of the integration of tools for simulation, reconstruction and analysis, then shows, in Section 3, how some of the testing for the project is done, before it ends with a summary and outlook in Section 4.

Integrations
One part of the integrations into Key4hep are the existing experiment software components from CEPCSW and FCCSW, which are already based on Gaudi.Their adaptation to Key4hep required mostly adaptations to the EDM4hep event data model [10][11][12].For the integration of iLCSoft, used by the ILC and CLIC communities, the k4MarlinWrapper [13] was created to integrate processors from the Marlin framework [14] and its corresponding event data model LCIO [15,16] into Gaudi.The idea of the k4MarlinWrapper is shown in Figure 2. To run any processor from Marlin in a Gaudi workflow, the event data is converted in memory from EDM4hep to LCIO before the execution of the processor and back to EDM4hep after.
In the following sections, the status of the integration for simulation, reconstruction and analysis tools is laid out.

Simulation
For full or parameterized simulation, different tools are already available in Key4hep.The parametric simulation program Delphes [17], together with some utilities to handle the generation of primary events, has been integrated into Gaudi as k4SimDelphes [18].k4SimDelphes also contains standalone programs for different input files, such as HepMC, or controlling event generators, such as Pythia8.There are two possibilities for full detector simulation with Geant4 [19] and the DD4hep geometry.There is the ddsim [20] feature of DD4hep, which can produce EDM4hep output files and read a majority of generator output formats.In addition, there is the k4SimGeant4 [21] Gaudi integration, which came out of FCCSW.The framework integration of the full simulation via k4SimGeant4 -together with the other algorithms -allow one to run a complete chain from event generation to reconstructed objects in a single program execution.
In the meantime, also the Gaussino [22] functionalities from LHCb have become experiment agnostic [23] and will potentially provide a complete replacement of the k4SimGeant4 package.

Tracking
The iLCSoft ecosystem contains tools for track pattern recognition and track fitting, all of which can be used in Key4hep via the k4MarlinWrapper For the track fitting DD4hep::rec::Surfaces are attached to any sensitive element [24] and some dead material.These surfaces provide an abstract view of the geometry needed for reconstruction, such as automatically averaged materials (as shown in Figure 3) and measurement directions.In many cases, these surfaces can be pragmatically attached to an existing DD4hep The ACTS [25,26] integration into Key4hep progresses steadily.In the last months, a plugin to support the EDM4hep track format was added to ACTS [27].Support for DD4hep geometry is under active development as well.It is already possible to load DD4hep geometries following a certain hierarchical structure into ACTS.The existing conversion of DD4hep to ACTS geometry is currently heavily used by the ACTS developers to test their algorithms against the generic Open Data Detector, a detector model used for benchmarking tracking and calorimeter reconstruction approaches [28].However, the general use of DD4hep::rec::Surfaces is under development and will allow a broader range of detectors to be directly used.Using these surfaces also in the link to the ACTS reconstruction will enable a direct replacement of the iLCSoft track reconstruction with ACTS.Recently, the development of the necessary Gaudi algorithms to use ACTS from Key4hep has intensified.In particular, the focus is on enabling arbitrary track refits using different fitters via Gaudi.

Calorimeter Clustering
An important ingredient for the performance of future Higgs Factory experiments is the particle flow reconstruction for optimal jet energy resolutions.The Pandora particle flow algorithm package (PandoraPFA) [29,30] was developed to study particle flow clustering at linear colliders.
The particle flow clustering with Pandora makes use of the extensions attached to detector geometries [24], such as DD4hep::rec::LayeredCalorimeterData, to provide the properties of the calorimeter, e.g., radiation length, interaction length, and dimensions to the reconstruction algorithms.To support a larger range of detectors, for example those that foresee a noble liquid calorimeter [31], the necessary information will be obtained in a more dynamic way.This step was materialized using the DD4hep::MaterialManager to extract the necessary information between arbitrary space points.
At least for high granularity calorimeters with large occupancies, the reconstruction time can become dominant.For the HGCAL project of CMS a GPU friendly algorithm, CLUE (CLUstering of Energy), was developed [32,33].An integration of this algorithm in Key4hep is ready to be used [34].Figure 4 shows simulated hits from many photons in a single event, and how CLUE reconstructs them into clusters.Listing 1: Example for the usage of EDM4hep objects for analysis with RDataFrame [36].

Analysis
The ROOT persistency of the EDM4hep event data model and its columnar storage model allows the use of the RDataFrame [35] feature for analysis.Listing 1 shows an example how the EDM4hep objects, here ReconstructedParticles, can be used with a dataframe to select events with different criteria.To take full advantage of the EDM4hep datamodel based on podio, for example to handle relationships between different objects, an RDataSource is being developed.Figure 6: Bright red highlighting differing distributions.Current is after a bug was fixed.

Visualization
The DD4hep geometry, and the EDM4hep event data can both be converted to formats suitable for the Phoenix event display [37].Figure 5 shows an event in the CLD detector for the FCCee [38], which is also shown in Figure 4a using the C-Event display (CED) from iLCSoft.The advantage of Phoenix is the possibility of using a web-browser, and its broad configurability.Phoenix allows one to centrally host the detectors on the web.The FCC detectors, for example, are hosted on a webserver2 and continuously updated.

Testing
A continuous validation system has been set up for Key4hep.Every night, after the nightly build of the Key4hep stack is finished, detector simulation is done based on the preset detectors in the configuration.After that, a complete reconstruction is performed.The results of the reconstruction are then compared to a set of reference samples that were produced in known and reproducible conditions.If the new distributions are different from the reference ones, based on specified metrics, then the display of the plots in a webpage will point this out (see Figure 6).The plots in the webpage are classified in different categories depending on which class of results they belong to, for example plots about tracks in one category, those about jet reconstruction in another.The current system is evolving and several improvements are under development, such as easier configuration, more detectors being tested and a better reporting system when the new and reference distributions are different.Besides the physics performance, the CPU performance has to be monitored and controlled as well.For this purpose the Valprod toolkit is under development, which enables the building of comprehensive validation jobs and offers CPU flame graph, I/O profiling and an integration with the prmon [39] program.

Summary & Outlook
The Key4hep project provides a common framework for future Higgs factories and other experiments and has been fully adopted by FCC and CLIC.It sees increasing adoption also from the ILC and CEPC communities.Beyond these initial communities, the project has caught interest of the EIC, Muon Collider communities, and LUXE experiment [40].To match the needs of the communities, the software stack is expanding to state-of-the-art tools such as ACTS, PandoraPFA, CLUE, or Phoenix.Their integrations, as outlined in the previous sections, that are or will be available soon as part of the Key4hep stack, will allow its users to perform all the tasks needed for detector studies, as shown in Figure 7.

Figure 1 :
Figure 1: Main ingredients for the Key4hep project: geometry information, event data information, and a processing framework with a large number of algorithms.

Figure 2 :
Figure 2: Schematic of how Marlin processors are integrated into Gaudi workflows in Key4hep.

Figure 4 :
Figure 4: Simulated photons and their clusters reconstructed with k4clue.

Figure 5 :
Figure 5: An event display of the CLD detector using Phoenix.Figure6: Bright red highlighting differing distributions.Current is after a bug was fixed.

Figure 7 :
Figure 7: Potential data flows via full or fast simulation.