Issue |
EPJ Web of Conf.
Volume 295, 2024
26th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2023)
|
|
---|---|---|
Article Number | 06011 | |
Number of page(s) | 8 | |
Section | Physics Analysis Tools | |
DOI | https://doi.org/10.1051/epjconf/202429506011 | |
Published online | 06 May 2024 |
https://doi.org/10.1051/epjconf/202429506011
First implementation and results of the Analysis Grand Challenge with a fully Pythonic RDataFrame
1 CERN, Esplanade des Particules 1, 1211 Geneva 23, Switzerland
2 Princeton University, Princeton, NJ 08544, USA
3 Taras Shevchenko National University of Kyiv, Kyiv, Ukraine
4 Mannheim University of Applied Sciences, Mannheim, Germany
* e-mail: vincenzo.eduardo.padulano@cern.ch
** e-mail: enrico.guiraud@pm.me
*** e-mail: andrii.falko@cern.ch
**** e-mail: elena.gazzarrini@cern.ch
† e-mail: enrique.garcia.garcia@cern.ch
‡ e-mail: domenic.gosein@cern.ch
Published online: 6 May 2024
The growing amount of data generated by the LHC requires a shift in how HEP analysis tasks are approached. Efforts to address this computational challenge have led to the rise of a middle-man software layer, a mixture of simple, effective APIs and fast execution engines underneath. Having common, open and reproducible analysis benchmarks proves beneficial in the development of these modern tools. One such benchmark is provided by the Analysis Grand Challenge (AGC), which represents a specification for realistic analysis pipelines. This contribution presents the first AGC implementation that leverages ROOT RDataFrame, a powerful, modern and scalable execution engine for the HENP use cases. The different steps of the benchmarks are written with a composable, flexible and fully Pythonic API. RDataFrame can then transparently run the computations on all the cores of a machine or on multiple nodes thanks to automatic dataset splitting and transparent workload distribution. The portability of this implementation is shown by running on various resources, from managed facilities to open cloud platforms for research, showing usage of interactive and distributed environments.
© The Authors, published by EDP Sciences, 2024
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.