| Issue |
EPJ Web Conf.
Volume 337, 2025
27th International Conference on Computing in High Energy and Nuclear Physics (CHEP 2024)
|
|
|---|---|---|
| Article Number | 01073 | |
| Number of page(s) | 8 | |
| DOI | https://doi.org/10.1051/epjconf/202533701073 | |
| Published online | 07 October 2025 | |
https://doi.org/10.1051/epjconf/202533701073
Archive Metadata for Efficient Data Collocation on Tape
CERN, Esplanade des Particules 1, 1211 Geneva 23, Switzerland
* e-mail: julien.leduc@cern.ch
Published online: 7 October 2025
Due to the increasing volume of physics data being produced, the LHC experiments are making more active use of archival storage. Constraints on available disk storage have motivated the evolution towards the “data carousel” and similar models. Datasets on tape are recalled multiple times for reprocessing and analysis, and this trend is expected to accelerate during the Hi-Lumi era (LHC Run-4 and beyond).
Currently, storage endpoints are optimised for efficient archival, but it is becoming increasingly important to optimise for efficient retrieval. This problem has two dimensions. To reduce unnecessary tape mounts, the spread of each dataset - the number of tapes containing files which will be recalled at the same time - should be minimised. To reduce seek times, files from the same dataset should be physically collocated on the tape. The Archive Metadata specification is an agreed format for experiments to provide scheduling and collocation hints to storage endpoints to achieve these goals.
This contribution describes the motivation, the review process with the various stakeholders and the constraints that led to the Archive Metadata proposal. We present the implementation and deployment in the CERN Tape Archive and our preliminary experiences of consuming Archive Metadata at WLCG Tier-0.
© The Authors, published by EDP Sciences, 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.

