Issue |
EPJ Web of Conf.
Volume 295, 2024
26th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2023)
|
|
---|---|---|
Article Number | 11008 | |
Number of page(s) | 8 | |
Section | Heterogeneous Computing and Accelerators | |
DOI | https://doi.org/10.1051/epjconf/202429511008 | |
Published online | 06 May 2024 |
https://doi.org/10.1051/epjconf/202429511008
Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code
1 Barcelona Supercomputing Center, Spain
2 CERN, Geneva, Switzerland
3 INFN Bologna, Italy
4 Center for Advanced Systems Understanding (CASUS), Görlitz, Germany
5 University of Geneva, Switzerland
6 University of Bologna, Italy
7 Fermi National Accelerator Laboratory, Batavia, IL, USA
8 Institute of Technology and Higher Studies of Monterrey, Mexico
9 University of Milano Bicocca, Italy
10 RWTH Aachen University, Germany
11 Argonne National Laboratory, Lemont, IL, USA
12 Lawrence Berkeley National Laboratory, Berkeley, CA, USA
* e-mail: matti@fnal.gov
Published online: 6 May 2024
In the past years the landscape of tools for expressing parallel algorithms in a portable way across various compute accelerators has continued to evolve significantly. There are many technologies on the market that provide portability between CPU, GPUs from several vendors, and in some cases even FPGAs. These technologies include C++ libraries such as Alpaka and Kokkos, compiler directives such as OpenMP, the SYCL open specification that can be implemented as a library or in a compiler, and standard C++ where the compiler is solely responsible for the offloading. Given this developing landscape, users have to choose the technology that best fits their applications and constraints. For example, in the CMS experiment the experience so far in heterogeneous reconstruction algorithms suggests that the full application contains a large number of relatively short computational kernels and memory transfer operations. In this work we use a stand-alone version of the CMS heterogeneous pixel reconstruction code as a realistic use case of HEP reconstruction software that is capable of leveraging GPUs effectively. We summarize the experience of porting this code base from CUDA to Alpaka, Kokkos, SYCL, std::par, and OpenMP offloading. We compare the event processing throughput achieved by each version on NVIDIA and AMD GPUs as well as on a CPU, and compare those to what a native version of the code achieves on each platform.
© The Authors, published by EDP Sciences, 2024
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.