Issue |
EPJ Web Conf.
Volume 251, 2021
25th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2021)
|
|
---|---|---|
Article Number | 02012 | |
Number of page(s) | 8 | |
Section | Distributed Computing, Data Management and Facilities | |
DOI | https://doi.org/10.1051/epjconf/202125102012 | |
Published online | 23 August 2021 |
https://doi.org/10.1051/epjconf/202125102012
Archival, anonymization and presentation of HTCondor logs with GlideinMonitor
1 Fermilab, MS120, PO Box 500, Batavia, IL (USA)
2 Valparaiso University, Valparaiso, IN (USA)
3 University of Illinois at Chicago, Chicago, IL (USA)
* Corresponding author: marcom@fnal.gov
Published online: 23 August 2021
GlideinWMS is a pilot framework to provide uniform and reliable HTCondor clusters using heterogeneous resources. The Glideins are pilot jobs that are sent to the selected nodes, test them, set them up as desired by the user jobs, and ultimately start an HTCondor schedd to join an elastic pool. These Glideins collect information that is very useful to evaluate the health and efficiency of the worker nodes and invaluable to troubleshoot when something goes wrong. This data, including local stats, the results of all the tests, and the HTCondor log files, is packed and sent to the GlideinWMS Factory. To access this information, developers and troubleshooters must exchange emails with Factory operators and dig manually into files. Furthermore, these files contain also information like email and IP addresses, and user IDs, that we want to protect and limit access to. GlideinMonitor is a Web application to make these logs more accessible and useful: it organizes the logs in an efficient compressed archive; it allows to search, unpack, and inspect them, all in a convenient and secure Web interface; via plugins like the log anonymizer, it can redact protected information preserving the parts useful for troubleshooting.
© The Authors, published by EDP Sciences, 2021
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.