Improving the Scheduling Efficiency of a Global Multi-Core HTCondor Pool in CMS

Brian Paul Bockelman; Diego Davila Foyo; Kenyi Hurtado Anampa; Todor Trendafilov Ivanov; Farrukh Aftab Khan; Amjad Kotobi; Krista Larson; James Letts; Marco Mascheroni; David Mason; Antonio Pérez-Calero Yzquierdo

doi:10.1051/epjconf/201921403056

All issues

Volume 214 (2019)

EPJ Web Conf., 214 (2019) 03056

Abstract

Open Access

Issue		EPJ Web Conf. Volume 214, 2019 23^rd International Conference on Computing in High Energy and Nuclear Physics (CHEP 2018)


Article Number		03056
Number of page(s)		8
Section		T3 - Distributed computing
DOI		https://doi.org/10.1051/epjconf/201921403056
Published online		17 September 2019

EPJ Web of Conferences 214, 03056 (2019)
https://doi.org/10.1051/epjconf/201921403056

Improving the Scheduling Efficiency of a Global Multi-Core HTCondor Pool in CMS

Brian Paul Bockelman¹, Diego Davila Foyo², Kenyi Hurtado Anampa³, Todor Trendafilov Ivanov⁴, Farrukh Aftab Khan⁵, Amjad Kotobi⁶, Krista Larson⁵, James Letts⁷^*, Marco Mascheroni⁷, David Mason⁵ and Antonio Pérez-Calero Yzquierdo⁸^,9 on behalf of the CMS Collaboration

¹ University of Nebraska-Lincoln, Lincoln, NE USA
² Benémerita Universidad Autónoma de Puebla, Puebla, México
³ University of Notre Dame, Notre Dame IN USA
⁴ University of Sofia, Sofia, Bulgaria
⁵ Fermi National Accelerator Laboratory, Batavia, IL USA
⁶ University of Malaya, Kuala Lumpur, Malaysia
⁷ University of California San Diego, La Jolla, CA USA
⁸ Centro de Investigaciones Energéticas Medioambientales y Tecnológicas (CIEMAT), Madrid, Spain
⁹ Port d'Informació Científica (PIC), Barcelona, Spain

^* Corresponding author: jletts@ucsd.edu

Published online: 17 September 2019

Abstract

Scheduling multi-core workflows in a global HTCondor pool is a multi-dimensional problem whose solution depends on the requirements of the job payloads, the characteristics of available resources, and the boundary conditions such as fair share and prioritization imposed on the job matching to resources. Within the context of a dedicated task force, CMS has increased significantly the scheduling efficiency of workflows in reusable multi-core pilots by various improvements to the limitations of the GlideinWMS pilots, accuracy of resource requests, efficiency and speed of the HTCondor infrastructure, and job matching algorithms.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.