Creating a content delivery network for general science on the internet backbone using XCaches

A general problem faced by computing on the grid for opportunistic users is that delivering cycles is simpler than delivering data to those cycles. In this project we show how we integrated XRootD caches placed on the internet backbone to implement a content delivery network for general science workflows. We will show that for some workflows on different science domains like high energy physics, gravitational waves, and others the combination of data reuse from the workflows together with the use of caches increases CPU efficiency while decreasing network bandwidth use.


Introduction
Scientific organizations which have distributed computing resources (sites) and need access to big volumes of data at those resources have either allocated local storage at the sites or, more recently transport (using the Wide Area Network WAN) the data on demand from its original location to the sites where the computing workflow is done [1]. In this project we show a solution that sits in the middle and does not require (but can profit) from allocated storage at the sites. This last part is of particular importance for individual researchers using the opportunistic resources in the Open Science Grid (OSG) [2] but also to scientific organizations such as International Gravitational Wave Network (IGWN) and DUNE that have some compute allocations without storage allocations in addition to opportunistic sites.
Our solution includes providing storage access on all sites using a POSIX-like mount system: Cern Virtual Machine File System (CVMFS) [3] to deliver data to jobs without using local storage, preplacing the data or overrunning the origin bandwidth. Although CVMFS had been commonly used in the WLCG [4] to distribute scientific software and small files (less than 100 MB experiment calibration data), the novel approach to distribute "experiment data" (files over 1 GB) was originally introduced in Ref. [5]; the jobs of IGWN collaboration used to read data remotely from the origin using CVMFS in a transparent way. However, with parameter estimation and other science workflows, the same data is read several times, hence the idea of caching. Stashcache [6] was born out of this opportunity to only read the data once from the origins (usually far away, mass storage services) and subsequent reads from a "nearby" cache. In this project, we take this idea further and show how we can deploy not only nearby caches(at the site level) but also caches in the internet backbone operated centrally via Kubernetes. This paper is organized as follows: first we introduce XRootD [7], the software which Stashcache is based on, and the Stashcache architecture. Then we explain how we use Kubernetes to deploy stashcaches and as a result help create a virtualized content delivery network for general science. Finally we show our current worldwide deployment of Stashcaches and statistics about their usage.

Stashcache Background
The Stashcache service is a file block caching technology based on XRootD caches [8]. XRootD's original architecture is a tree-based structure of servers and redirectors. Once a client requests a file from the redirector, the redirector queries the servers below it in the tree if they have the file. If they do, then the client is redirected to start a connection with the correct server. If none of the servers have the file, the redirector contacts the redirector above it for the file. This type of tree structure is called a federation, such as CMS Federation: Anydata, Anytime, Anywhere (AAA) [1].
A Stashcache sits in front of a federation of XRootD servers which from now on we will refer as origins. Clients will contact a Stashcache for a file instead of asking a redirector; then Stashcache will contact the origin and retrieve a file from it and serve it (to the client) from memory, and then queue it to be saved on the cache local disk. If the file is already on local disk, the stashcache will serve it from there instead of querying a redirector. The architecture can be seen in Figure 1. Stashcache server can be concated from the client via the XRootD protocol, but also via HTTP(S), which allows transparent access to the files via CVMFS.

Difference between Stashcache and frontier squid
There is already an HTTP-based file caching technology accessible via CMVFS that's widely deployed in WLCG: frontier squid [9]. There are two major differences between frontier squid and stashcache: the lifetime of files in the cache, and the sizes of files in the cache. A file in a squid cache has a limited lifetime: if the file gets changed in the stratum server (the squid equivalent of an origin), then after a few minutes the squid cache will pick up the change and serve the new version. If a file in a stashcache is changed on the origin, that change will not be picked up by the stashcache, and the cache will serve the old version of the file (until the file is purged from the disk). In other words, the squid model is "write few, read many," whereas the stashcache model is "write once, read many." The second difference is on the size of files that can be server by both architectures: Squid caches are optimized for handling small files O(MB) while Stashcache is meant for big files O(GB). Moreover, CVMFS is configured to have compute node level caching of files (i.e In general 20GB of space in the compute nodes are reserved to locally cache the files read from the squid). This feature is not used by Stashcache since the assumption is that given the files transported by it are of the same magnitude of the local partition storing them would negatively impact the performance of the jobs using files delivered via the squid.

Stashcache Deployment
The OSG, through its Virtual Organization (VO) provides several million opportunistic CPU hours for its users and small scientific communities. The computing resources are geographically distributed over the whole country (see Figure 2), and the repositories (origins) that hold the scientific data are dispersed (see Figure: 3).  This distribution implies that even though the data is reused, it travels through the WAN several times. By placing additional caches in the internet backbone, we reduce the chances that the same data travels through more than one link. In Figure 4, one can see the deployment map of Internet2, with locations of Stashcaches we deployed in the backbone and at certain academic institutions. Some of the scientific organizations that were using the caches (e.g IGWN and DUNE) had data in the US but a great amount of allocated computing resources in Europe. This led us to deploy additional caches world-wide ( Figure 5).

Worldwide deployment via Kubernetes
In order to deploy Stashcaches in heterogeneous hardware, owned by geographically distant institutions, and maintain a high quality of service, we opted to use Kubernetes, with a Stashcache container maintained by OSG. By deploying the same container everywhere OSG performs a single test and optimization procedure which guarantees that all the caches would look and perform alike from the client point of view. Moreover it enabled OSG to move  towards a Development/Operations (DevOps) model in which newly released version by the XrootD developers are quickly turned into containers that get deployed in production. Hence a central DevOps team can now turn newly developed features into production-quality edgeservices in hours as opposed to days and without local system administrators intervention. Kubernetes had several other features that were interesting to us such as secrets and volume claims. The first one allowed us to send securely the certificates to each cache location, and the second one allowed us to bridge the gap between the stateless micro services that Kubernetes is based on and the fully stateful caches that XrootD was based on.
We summarize the results of the deployment of caches on the backbone using Kubernetes in the table 1. This shows how much data the deployment of caches have saved traffic in the network (by means of data reuse and cache placement). However this cache-based scientific data delivery network also has some positive externalities beyond the saved traffic and improved I/O performance of the applications. On one hand it makes the entire infrastructure more reliable: if one cache is down, CVMFS can pick the next one on geographical order. On the other, it prevents the origins from being overloaded by direct access from the jobs.

Conclusions and Future Work
We deployed a content delivery network for general scientific purposes based on XRootD caching technology and Kubernetes as a deployment model. This network has been successful at delivering data all around the world and saving network link traffic. Future iterations of this work plan to have file based caching monitor (reuse metrics at the file level as opposed to whole namespace), and integration with Scitokens [11] for authenticated access.