Issue |
EPJ Web of Conf.
Volume 295, 2024
26th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2023)
|
|
---|---|---|
Article Number | 06006 | |
Number of page(s) | 8 | |
Section | Physics Analysis Tools | |
DOI | https://doi.org/10.1051/epjconf/202429506006 | |
Published online | 06 May 2024 |
https://doi.org/10.1051/epjconf/202429506006
Boosting RDataFrame performance with transparent bulk event processing
1 CERN
2 FNAL
* e-mail: enrico.guiraud@cern.ch
Published online: 6 May 2024
RDataFrame is ROOT’s high-level interface for Python and C++ data analysis. Since it first became available, RDataFrame adoption has grown steadily and it is now poised to be a major component of analysis software pipelines for LHC Run 3 and beyond. Thanks to its design inspired by declarative programming principles, RDataFrame enables the development of highperformance, highly parallel analyses without requiring expert knowledge of multi-threading and I/O: user logic is expressed in terms of self-contained, small computation kernels tied together by a high-level API. This design completely decouples analysis logic from its actual execution, and opens several interesting avenues for workflow optimization. In particular, in this work we explore the benefits of moving internal data processing from an event-by-event to a bulkby-bulk loop. This refactoring dramatically reduces the framework’s runtime overheads; in collaboration with the I/O layer it improves data access patterns; it exposes information that optimizing compilers might use to auto-vectorize the invocation of user-defined computations; finally, while existing user-facing interfaces remain unaffected, it becomes possible to additionally offer interfaces that explicitly expose bulks of events, useful e.g. for the injection of GPU kernels into the analysis workflow. In order to inform similar future R&D, design challenges will be presented, as well as an investigation of the relevant timememory trade-off backed by novel performance benchmarks.
© The Authors, published by EDP Sciences, 2024
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.