Enhancing software-hardware co-design for HEP by low-overhead profiling of single-and multi-threaded programs on diverse architectures with Adaptyst

Open Access

Issue		EPJ Web Conf. Volume 337, 2025 27^th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2024)


Article Number		01001
Number of page(s)		8
DOI		https://doi.org/10.1051/epjconf/202533701001
Published online		07 October 2025

CERN, Facts and figures about the lhc, https://home.web.cern.ch/resources/faqs/facts-and-figures-about-lhc, access: 2025-02-23 [Google Scholar]
CERN, High-luminosity lhc, https://home.cern/resources/faqs/high-luminosity-lhc, access: 2025-02-23 [Google Scholar]
Patched “perf” repository on cern gitlab, https://gitlab.cern.ch/adaptyst/linux, access: 2025-02-27 [Google Scholar]
perf: Linux profiling with performance counters, https://perfwiki.github.io/main, access: 2025-02-23 [Google Scholar]
R. Brun, F. Rademakers, Root—an object oriented data analysis framework, Nuclear instruments and methods in physics research section A: accelerators, spectrometers, detectors and associated equipment 389, 81 (1997). [Google Scholar]
B. Gregg, Flame graphs, https://brendangregg.com/flamegraphs.html, access: 2025-0223 [Google Scholar]
J. Morgado, L. Sousa, A. Ilic, CARM Tool: Cache-Aware Roofline Model Automatic Benchmarking and Application Analysis, in 2024 IEEE International Symposium on Workload Characterization (IISWC) (IEEE, 2024), pp. 68–81 [Google Scholar]
C. Lattner, V. Adve, LLVM: A compilation framework for lifelong program analysis & transformation, in International symposium on code generation and optimization, 2004. CGO 2004. (IEEE, 2004), pp. 75–86 [Google Scholar]
J. Auerbach, D.F. Bacon, I. Burcea, P. Cheng, S.J. Fink, R. Rabbah, S. Shukla, A compiler and runtime for heterogeneous computing, in Proceedings of the 49th Annual Design Automation Conference (2012), pp. 271–276 [Google Scholar]
C.J. Rossbach, Y. Yu, J. Currey, J.P. Martin, D. Fetterly, Dandelion: a compiler and runtime for heterogeneous systems, in Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (2013), pp. 49–68 [Google Scholar]
P. Banerjee, N. Shenoy, A. Choudhary, S. Hauck, C. Bachmann, M. Haldar, P. Joisha, A. Jones, A. Kanhare, A. Nayak et al., A MATLAB compiler for distributed, heterogeneous, reconfigurable computing systems, in Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No. PR00871) (IEEE, 2000), pp. 39–48 [Google Scholar]
T. Grosser, T. Hoefler, Polly-ACC transparent compilation to heterogeneous hardware, in Proceedings of the 2016 International Conference on Supercomputing (2016), pp. 1–13 [Google Scholar]
F. Franchetti, T.M. Low, D.T. Popovici, R.M. Veras, D.G. Spampinato, J.R. Johnson, M. Püschel, J.C. Hoe, J.M. Moura, Spiral: Extreme performance portability, Proceedings of the IEEE 106, 1935 (2018). [Google Scholar]
H. Riebler, G. Vaz, T. Kenter, C. Plessl, Transparent acceleration for heterogeneous platforms with compilation to opencl, ACM Transactions on Architecture and Code Optimization (TACO) 16, 1 (2019). [Google Scholar]
B.M. Gruber, G. Amadio, J. Blomer, A. Matthes, R. Widera, M. Bussmann, Llama: The low-level abstraction for memory access, Software: Practice and Experience 53, 115 (2023). [Google Scholar]
Khronos Group, Sycl overview, https://www.khronos.org/sycl, access: 2025-02-23 [Google Scholar]
C.R. Trott, D. Lebrun-Grandié, D. Arndt, J. Ciesko, V. Dang, N. Ellingwood, R. Gayatri, E. Harvey, D.S. Hollman, D. Ibanez et al., Kokkos 3: Programming model extensions for the exascale era, IEEE Transactions on Parallel and Distributed Systems 33, 805 (2021). [Google Scholar]
E. Zenker, B. Worpitz, R. Widera, A. Huebl, G. Juckeland, A. Knüpfer, W.E. Nagel, M. Bussmann, Alpaka–an abstraction library for parallel kernel acceleration, in 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (IEEE, 2016), pp. 631–640 [Google Scholar]
Openacc, https://www.openacc.org, access: 2025-02-23 [Google Scholar]
AMD, Vitis hls, https://www.amd.com/en/products/software/adaptive-socs-and-fpgas/vitis/vitis-hls.html, access: 2025-02-23 [Google Scholar]
Siemens, Catapult high-level synthesis & verification, https://eda.sw.siemens.com/en-US/ic/catapult-high-level-synthesis, access: 2025-02-23 [Google Scholar]
L. Josipovic, A. Guerrieri, P. Ienne, Synthesizing general-purpose code into dynamically scheduled circuits, IEEE Circuits and Systems Magazine 21, 97 (2021). [Google Scholar]
J. Duarte, S. Han, P. Harris, S. Jindariani, E. Kreinar, B. Kreis, J. Ngadiuba, M. Pierini, R. Rivera, N. Tran et al., Fast inference of deep neural networks in fpgas for particle physics, Journal of instrumentation 13, P07027 (2018). [Google Scholar]
C. Lattner, M. Amini, U. Bondhugula, A. Cohen, A. Davis, J. Pienaar, R. Riddle, T. Shpeisman, N. Vasilache, O. Zinenko, MLIR: Scaling compiler infrastructure for domain specific computation, in 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (IEEE, 2021), pp. 2–14 [Google Scholar]
T. Ben-Nun, J. de Fine Licht, A.N. Ziogas, T. Schneider, T. Hoefler, Stateful dataflow multigraphs: A data-centric model for performance portability on heterogeneous architectures, in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2019), pp. 1–14 [Google Scholar]
Intel, Fix performance bottlenecks with intel vtune profiler, https://www.intel.com/content/www/us/en/developer/tools/oneapi/vtune-profiler.html, access: 2025-02-23 [Google Scholar]
AMD, Amd µprof, https://www.amd.com/en/developer/uprof.html, access: 2025-02-23 [Google Scholar]
Valgrind home, https://valgrind.org, access: 2025-02-23 [Google Scholar]
Gnu gprof, https://ftp.gnu.org/old-gnu/Manuals/gprof-2.9.1/html_mono/gprof.html, access: 2025-02-23 [Google Scholar]
gperftools repository, https://github.com/gperftools/gperftools, access: 2025-02-23 [Google Scholar]
NVIDIA, Nvidia nsight developer tools - nvidia docs, https://docs.nvidia.com/nsight-developer-tools/index.html, access: 2025-02-23 [Google Scholar]
S.S. Shende, A.D. Malony, The tau parallel performance system, The International Journal of High Performance Computing Applications 20, 287 (2006). [Google Scholar]
L. Adhianto, S. Banerjee, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, N.R. Tallent, Hpctoolkit: Tools for performance analysis of optimized parallel programs, Concurrency and Computation: Practice and Experience 22, 685 (2010). [Google Scholar]

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.