Harnessing the power of supercomputers using the PanDA Pilot 2 in the ATLAS Experiment

Paul Nilsson; Alexey Anisenkov; Doug Benjamin; Wen Guan; Tomas Javurek; Danila Oleynik

doi:10.1051/epjconf/202024503025

All issues

Volume 245 (2020)

EPJ Web Conf., 245 (2020) 03025

Abstract

Open Access

Issue		EPJ Web Conf. Volume 245, 2020 24^th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2019)


Article Number		03025
Number of page(s)		6
Section		3 - Middleware and Distributed Computing
DOI		https://doi.org/10.1051/epjconf/202024503025
Published online		16 November 2020

EPJ Web of Conferences 245, 03025 (2020)
https://doi.org/10.1051/epjconf/202024503025

Harnessing the power of supercomputers using the PanDA Pilot 2 in the ATLAS Experiment

Paul Nilsson¹^*, Alexey Anisenkov²^,3, Doug Benjamin⁴, Wen Guan⁵, Tomas Javurek⁶ and Danila Oleynik⁷^,8

¹ Brookhaven National Laboratory, Physics Department, United States
² Budker Institute of Nuclear Physics, Russia
³ Novosibirsk State University, Russia
⁴ Argonne National Laboratory, United States
⁵ University of Wisconsin-Madison, Department of Physics, United States
⁶ CERN, European Laboratory for Particle Physics, Switzerland
⁷ University of Texas at Arlington, Department of Physics, United States
⁸ Joint Institute for Nuclear Research, Russia

^* Corresponding author: Paul.Nilsson@cern.ch
Copyright 2020 CERN for the benefit of the ATLAS Collaboration. Reproduction of this article or parts of it is allowed as specified in the CC-BY-4.0 license.

Published online: 16 November 2020

Abstract

The unprecedented computing resource needs of the ATLAS experiment at LHC have motivated the Collaboration to become a leader in exploiting High Performance Computers (HPCs). To meet the requirements of HPCs, the PanDA system has been equipped with two new components; Pilot 2 and Harvester, that were designed with HPCs in mind. While Harvester is a resource-facing service which provides resource provisioning and workload shaping, Pilot 2 is responsible for payload execution on the resource. The presentation focuses on Pilot 2, which is a complete rewrite of the original PanDA Pilot used by ATLAS and other experiments for well over a decade. Pilot 2 has a flexible and adaptive design that allows for plugins to be defined with streamlined workflows. In particular, it has plugins for specific hardware infrastructures (HPC/GPU clusters) as well as for dedicated workflows defined by the needs of an experiment. Examples of dedicated HPC workflows are discussed in which the Pilot either uses an MPI application for processing fine-grained event level service under the control of the Harvester service or acts like an MPI application itself and runs a set of job in an assemble. In addition to describing the technical details of these workflows, results are shown from its deployment on Titan (OLCF) and other HPCs in ATLAS.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.