EPJ Web Conf.
Volume 214, 201923rd International Conference on Computing in High Energy and Nuclear Physics (CHEP 2018)
|Number of page(s)||8|
|Section||T7 - Clouds, virtualisation & containers|
|Published online||17 September 2019|
Evaluating Kubernetes as an orchestrator of the Event Filter computing farm of the Trigger and Data Acquisition system of the ATLAS experiment at the Large Hadron Collider
2 CERN, CH-1211 Geneva, Switzerland (on leave)
3 Department of Physics University of Michigan, Ann Arbor MI
* Corresponding author: Giuseppe.Avolio@cern.ch
Published online: 17 September 2019
The ATLAS experiment at the LHC relies on a complex and distributed Trigger and Data Acquisition (TDAQ) system to gather and select particle collision data. The Event Filter (EF) component of the TDAQ system is responsible for executing advanced selection algorithms, reducing the data rate to a level suitable for recording to permanent storage. The EF functionality is provided by a computing farm made up of thousands of commodity servers, each executing one or more processes. Moving the EF farm management towards a solution based on software containers is one of the main themes of the ATLAS TDAQ Phase-II upgrades in the area of the online software; it would make it possible to open new possibilities for fault tolerance, reliability and scalability. This paper presents the results of an evaluation of Kubernetes as a possible orchestrator of the ATLAS TDAQ EF computing farm. Kubernetes is a system for advanced management of containerized applications in large clusters. This paper will first highlight some of the technical solutions adopted to run the offline version of today’s EF software in a Docker container. Then it will focus on some scaling performance measurements executed with a cluster of 1000 CPU cores. In particular, this paper will report about the way Kubernetes scales in deploying containers as a function of the cluster size and show how a proper tuning of the Query per Second (QPS) Kubernetes parameter set can improve the scaling of applications in terms of running replicas. Finally, an assessment will be given about the possibility to use Kubernetes as an orchestrator of the EF computing farm in LHC’s Run 4.
© The Authors, published by EDP Sciences, 2019
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.