Issue |
EPJ Web of Conf.
Volume 295, 2024
26th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2023)
|
|
---|---|---|
Article Number | 11024 | |
Number of page(s) | 5 | |
Section | Heterogeneous Computing and Accelerators | |
DOI | https://doi.org/10.1051/epjconf/202429511024 | |
Published online | 06 May 2024 |
https://doi.org/10.1051/epjconf/202429511024
Vectorization of CMSSW offline software
Fermi National Accelerator Laboratory, Batavia, IL, US
* Corresponding author: gartung@fnal.gov
Published online: 6 May 2024
The CMS experiment has been utilizing vectorization, or SIMD, in parts of its data processing applications for over a decade. On x86 platforms the vectorization level is still SSE3. In the past attempts to use wider vector instruction sets such as AVX or AVX-512 have, in practice, not resulted in improvements in the overall event processing throughput, because the CPUs scale down their frequency when processing AVX instructions. In addition, a notable part of the global pool of CMS resources has been old systems either not supporting AVX, or where the CPU frequency downscaling impacts all cores of the CPU. CMS has nevertheless continued to vectorize more of its application code, and in this work we review profiling methods we have found effective to find out pieces of code that would benefit from vectorization, and techniques to transform those codes such that the GCC compiler is able to auto-vectorize those codes. The build system used for CMSSW, Scram, has also been enhanced to be able to build code for multiple CPU microarchitectures such that the shared libraries of desired microarchitecture level can be loaded based on the CPU of the system. This multi-microarchitecture setup is invisible to the workflow management system, which makes its deployment straightforward. We describe in detail how this multimicroarchitecture build is set up, and measure the impact of using wider vector units than SSE3 on the event processing throughput of CMS applications such as simulation and reconstruction on recent x86 CPUs
© The Authors, published by EDP Sciences, 2024
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.