Enabling Data Intensive Science on Supercomputers for High Energy Physics R&D Projects in HL-LHC Era

Alexei Klimentov; Douglas Benjamin; Alessandro Di Girolamo; Kaushik De; Johannes Elmsheuser; Andrej Filipcic; Andrey Kiryanov; Danila Oleynik; Jack C. Wells; Andrey Zarochentsev; Xin Zhao

doi:10.1051/epjconf/202022601007

All issues

Volume 226 (2020)

EPJ Web Conf., 226 (2020) 01007

Abstract

Open Access

Issue		EPJ Web Conf. Volume 226, 2020 Mathematical Modeling and Computational Physics 2019 (MMCP 2019)


Article Number		01007
Number of page(s)		8
Section		Plenary and Invited Lectures
DOI		https://doi.org/10.1051/epjconf/202022601007
Published online		20 January 2020

EPJ Web of Conferences 226, 01007 (2020)
https://doi.org/10.1051/epjconf/202022601007

Enabling Data Intensive Science on Supercomputers for High Energy Physics R&D Projects in HL-LHC Era

Alexei Klimentov¹, Douglas Benjamin², Alessandro Di Girolamo³, Kaushik De⁴, Johannes Elmsheuser¹, Andrej Filipcic⁵, Andrey Kiryanov⁶^,10, Danila Oleynik⁷, Jack C. Wells⁸, Andrey Zarochentsev⁹^,10 and Xin Zhao¹ ATLAS Collaboration

¹ Brookhaven National Laboratory, NY, USA
² Argonne National Laboratory, IL, USA
³ European Particle Physics Laboratory (CERN), Geneva, Switzerland
⁴ University of Texas in Arlington, TX, USA
⁵ Josef Stefan Institute, Ljubljana, Slovenia
⁶ Petersburg Nuclear Physics Institute NRC “Kurchatov Institute”, Gatchina, Russia
⁷ Joint Institute of Nuclear Research, Dubna, Russia
⁸ Oak Ridge National Laboratory, TN, USA
⁹ Saint-Petersburg State University, St. Petersburg, Russia
¹⁰ Plekhanov Russian University of Economics, Moscow, Russia

Published online: 20 January 2020

Abstract

The ATLAS experiment at CERN’s Large Hadron Collider uses theWorldwide LHC Computing Grid, the WLCG, for its distributed computing infrastructure. Through the workload management system PanDA and the distributed data management system Rucio, ATLAS provides seamless access to hundreds of WLCG grid and cloud based resources that are distributed worldwide, to thousands of physicists. PanDA annually processes more than an exabyte of data using an average of 350,000 distributed batch slots, to enable hundreds of new scientific results from ATLAS. However, the resources available to the experiment have been insufficient to meet ATLAS simulation needs over the past few years as the volume of data from the LHC has grown. The problem will be even more severe for the next LHC phases. High Luminosity LHC will be a multiexabyte challenge where the envisaged Storage and Compute needs are a factor 10 to 100 above the expected technology evolution. The High Energy Physics (HEP) community needs to evolve current computing and data organization models in order to introduce changes in the way it uses and manages the infrastructure, focused on optimizations to bring performance and efficiency not forgetting simplification of operations. In this paper we highlight recent R&D projects in HEP related to data lake prototype, federated data storage and data carousel.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.