Extending CERN computing to volunteers - LHC@home consolidation and outlook

. LHC@home has provided computing capacity for simulations under BOINC since 2005. Following the introduction of virtualisation with BOINC to run HEP Linux software in a virtual machine on volunteer desktops, initially started on test BOINC projects, like Test4Theory and ATLAS@home, all CERN applications distributed to volunteers have been consolidated under a single LHC@home BOINC project. As part of an e ﬀ ort to unite CERN’s batch, cloud, grid and volunteer computing e ﬀ orts, the BOINC service has been integrated with the other compute services at CERN, notably HTCondor, in terms job submission and accounting. The paper will also address contributions to improve the BOINC software and community e ﬀ ort to evolve BOINC for a sustainable volunteer computing environment. Furthermore, we discuss future challenges to reduce the e ﬀ ort required by volunteers to run virtual machines for experiment simulations and improvements to BOINC to widen the appeal of volunteer computing.


Introduction
At CERN, the bulk of computing capacity is devoted to data reconstruction and analysis of data from the LHC and other experiments. In order to obtain sufficient capacity for simulations of accelerator and detector upgrades, there is a need for additional computing resources. Part of simulation capacity comes from in-house servers or cloud-bursting, but also the general public can contribute, thanks to volunteer computing. Volunteer computing allows researchers and organizations to harvest computing capacity from volunteers, e.g. donors among the general public. A number of science projects resort to volunteer computing to handle massive simulations on small data sets. The principal framework for volunteer computing is the Berkeley Open Infrastructure for Network Computing, popularly known as BOINC [1,2]. BOINC originates from ClimatePrediction.net [3] and Seti@home [4] and has been in use since the early 2000s.
Similar to batch systems, BOINC is designed as a client/server application, with a Web server to handle scheduling, task and user management. Additionally there are message boards to communicate with the users. The data for BOINC is stored in a MySQL [5] database back-end. BOINC clients are available for Windows, Mac, Linux and Android. The BOINC server distributes applications and tasks to the computer hosts belonging to volunteers who subscribe to the projects. With BOINC, volunteers are awarded credit for their contributions. Although the credit scheme is purely virtual, for part of the community of volunteers competing for credit is a source of motivation for them to contribute CPU power.
At CERN, our volunteer computing is hosted by a BOINC project called: LHC@home [6]. Since LHC@home was launched in 2005 for the 50th anniversary of CERN, the beam dynamics simulation application Sixtrack [7] has accumulated hundreds of CPUyears thanks to the contributions of volunteers running on Windows, Linux and Macs. The original LHC@home was in 2011 joined by Test4Theory, that pioneered the use of virtualization for volunteer computing, thanks to the use of CernVM [8] to launch a Linux virtual machine [9] on the volunteers computers. To run virtualized applications, a local hypervisor, the free Oracle VirtualBox [10] can be installed with BOINC. When contacting the BOINC server, BOINC downloads a wrapper application "vboxwrapper" [11] that takes care of launching the VM. This advance led CERN IT to launch a renewed effort on volunteer computing [12], and soon more BOINC pilot projects based on the experience with Test4Theory followed with applications from the LHC experiments. LHCb, ATLAS and later CMS.

BOINC applications at CERN
At CERN, several teams provide applications that run under LHC@home, a brief description of the applications and their infrastructure is provided here.

Sixtrack
SixTrack [13] is a particle tracking code for simulation of particles in a high energy particle accelerator [14]. Sixtrack has been running on LHC@home under BOINC since the start of the project in 2005, and is the only application on LHC@home that is compiled with BOINC for Windows, Mac and Linux as a native BOINC application. Recently there has been a renewed effort on Sixtrack development for HL-LHC, and a new build system to produce executables for more computer architectures. Hence an Android test version of Sixtrack is now available on LHC@home. During recent simulation campaigns with Sixtrack, there have been peaks of 400k tasks in progress.

Theory
The Theory simulations application on LHC@home runs Monte Carlo simulations of physics events in a virtual machine under CernVM. The results are gathered on MCPlots [15], a repository of Monte Carlo data. The jobs running in the VMs dispatched with BOINC are handled via HTCondor [16] and the application pulled from the CERN VM file system (CVMFS) [17].

ATLAS
The ATLAS application [18] runs Monte-Carlo simulation of events in the ATLAS detector. Jobs are submitted to the BOINC server using the ARC CE [19] grid middleware. Recently a native Linux version of the ATLAS application was developed which removes the need for virtualisation on Linux platforms, instead using Singularity [20] containers. This opened the door to ATLAS jobs running on data centre infrastructures such as new machines not yet commissioned or even grid worker nodes where BOINC can run as a "backfill" when the grid jobs are not fully using CPU. These new resources have hugely increased the amount of work done by the ATLAS application, and in 2018 it processed roughly 4.5% of all ATLAS simulation events, of which 85% were processed by the native Linux app.

CMS
Started in the summer of 2014 as a proof of concept [21], the CMS@home application has evolved over the past few years into a production quality platform [22]. Similar to the other high energy physics applications, the CMS software is currently only available for Linux and hence Virtualization is required to reach Windows and Mac users. BOINC is to obtain a resource and run a virtual machine for a fixed period of time. The virtual machine image is contextualized similarly to how VMs would be configured in a cloud resource. From the volunteer's perspective, the wrapper that runs the virtual machine is just another application, but from the project's perspective, this is just another VM in a resource pool.
For the LHC experiments, the preferred method to run on remote grid or cloud computing resources these days is the "pilot approach', where a so-called pilot job is submitted to check the environment on the worker node and verify that the experiment software can run correctly. Then this process will pull e.g. simulation or analysis tasks submitted by users in the experiment collaboration to run on the computing resources. For CMS, the virtual machine launches an agent (pilot) to obtain (pull) work from a central queue of CMS tasks. The agent and central task queue has been implemented using HTCondor and the output data is upload to a databridge, which acts as a sandbox between the volunteer and grid storage. The authentication mechanism used by both HTCondor and the databridge is based on x509 [23] credentials from a certificate authority set up for the purpose. When the virtual machine starts, an x509 credential is obtained via the Volunteer Computing Credential Service by authenticating with the BOINC id and authenticator.

LHCb
The first prototype of the Beauty@Home project was created in 2013 [24] to help address LHCb's growing need for computing power by profiting from volunteer computing resources. This was initially restricted to members of the LHCb Virtual Organisation as a trusted x509 credential was required on the resource to contact the DIRAC system [25] for operations such as job matching, status updates and data management operations. BOINC was used to obtain time on a volunteer resource and the application started a Virtual Machine running the CernVM Virtual Software Appliance. This then ran a standard LHCb pilot job which interacted with the DIRAC system to obtain the real job and upload the output. To remove the constraint of requiring secure authorization and authentication, a DIRAC gateway service called Workload Management System Secure Gateway (WMSSecureGW) was developed to act as a sandbox for the outside world. The service provides an interface for untrusted volunteer resource to the DIRAC System, allowing volunteer resource to execute LHCb jobs. The WMSSecureGW service authenticates volunteers using an x509 credential from a custom certification authority (CA) setup specifically for this purpose. The service receives all job requests from volunteer resources and these are directed to different DIRAC services. The output data produced by the volunteer resources are uploaded to the gateway where a check is performed before being transferred to LHCb's storage resources.

Computing capacity
Statistics about running jobs under BOINC is exported to BOINC statistics sites, and to our CERN Monitoring infrastructure where the data can be displayed on a Grafana dashboard. As it can be seen from figure 1, the number of running tasks of Sixtrack fluctuates strongly with peaks during simulation campaigns and periods of inactivity. During peaks of Sixtrack campaigns, peaks of 400k tasks in progress have been observed. ( fig. 1.) The computing Figure 1. Running tasks on LHC@home over the last 6 months, by application capacity contributed from volunteers compares with that of a Tier-2 WLCG site for the experiment applications. For Sixtrack, the number of active volunteers is much higher, as there is no need for virtualisation to run the Sixtrack executables. During simulation campaigns, a total of more than 80 TFLOPS of contributed computing power has been made available by volunteers. (Ref. fig. 2.) Hence the contributed computing power for Sixtrack is far more than what would have been available on CERN's batch facilities, given the load for LHC-data processing. We can observe a capacity gap between the number of volunteer PCs that can run the native Sixtrack application, and virtualization-enabled PCs that also can run Theory and experiment applications. Improvements of the BOINC VM wrapper [11] are a step forward, but the volunteers still need to install Virtual BOX on their PCs. Linux containers provide a more lightweight option and is already used by ATLAS for Linux volunteer hosts, but still require virtualisation on other operating systems. While the number of active volunteers contributing to the project has stabilized compared to earlier years, we have seen an increase in terms of contributed CPU. This is partially due to a number of highly dedicated volunteers, and partially due to the increased number of institute desktops and idle server machines that contribute to LHC@home.

LHC@home consolidation
In the past, LHC@home consisted of multiple BOINC projects. The original LHC@home classic, with the classic BOINC application Sixtrack, and more experimental VM projects such as Test4Theory (later vLHC@home) and ATLAS@home. A shared Drupal portal was used as a front-end for these projects. [12] The LHC@home approach with several BOINC projects for different CERN experiments and communities had advantages in terms of flexibility, but led to some confusion among volunteers in the BOINC community. We conducted a poll among volunteers, who preferred a single LHC@home project BOINC with multiple applications. They also preferred that new applications should be validated first for volunteers in a dedicated development project, LHC@home-dev.
Rather than running multiple BOINC projects for each experiment on a common platform, we therefore decided to unify LHC@home into a single BOINC project. The experimental "credit by application" feature of BOINC was used to allow for credit for contributions for each experiment. Using the classic LHC@home with Sixtrack as a basis, the other projects were merged in to the database. Merging BOINC project data bases had not been done before, and has some complications as the user-and host-ids are different in the different projects. However, the email address of volunteers can be used as a unique identifier of volunteers in each project. Hence volunteers who had contributed to the virtualisation projects were migrated to the original LHC@home, and their BOINC credit transferred.

Server infrastructure
The BOINC server infrastructure for LHC@home runs on a cluster of OpenStack VMs in the CERN private cloud. The virtual machines are managed by Puppet as part of CERN's Agile Infrastructure. In order to ensure scalability, the service is comprised of a load balanced set of BOINC web servers that handle the BOINC user facing pages and scheduler, along with work generation for VM applications. Furthermore there are another set of load-balanced file servers for upload and download of application data. The storage back-end for the BOINC application data was migrated from NFS to CephFS [26] in spring, allowing for better I/O performance with large simulation campaigns. Notably I/O of the fairly large files for the AT-LAS application scales better on the CephFS back-end than the former NFS server. The old NFS server became saturated with around 8000 simultanously running ATLAS jobs, while the CephFS backend can handle more than 20000 tasks. (A record load of 49k simultaneos ATLAS jobs on LHC@home has been reached in January 2019.)

BOINC community development
BOINC development used to be done mainly by a small team of developers at Berkeley, who were also contributors to SETI@home. Occasional contributions from others were not always incorporated into the BOINC code. Following changes to the funding of the Berkeley team a couple of years ago, there has been an on-going transition towards a communitybased development and maintenance model for BOINC.  workshop a set of community based procedures and workflows for BOINC code development and maintenance were agreed, and BOINC has now become a true community project, hosted on GitHub [28]. Contributions and changes are handled via issues and pull-requests like for other Git projects, and release managers have been appointed for the different components of the BOINC software stack. Contributions from CERN include fixes to the BOINC server web pages and remote submission with HTCondor to BOINC, as well as a prototype Web-based manager for the client.

Conclusions and outlook
The consolidation of LHC@home into a single BOINC project gives a more solid foundation for volunteer computing at CERN. Volunteers in particular, but also the scientific community behind other BOINC projects appreciate a more focused effort on BOINC for LHC@home. The BOINC community is healthy and active, and while we may have reached a plateau of the overall number of volunteers due to the stagnating number of home PCs, there is still potential to attract newcomers with mobile devices and to further exploit idle capacity at institute desktopes and servers. Volunteer computing has proven to be a valuable contributor to CERN's computing effort, and also allows for a lightweight backfill option for smaller batch clusters and other opportunistic resources. Further integration between our batch system HT-Condor and BOINC would also allow for a wider range high-CPU low-I/O applications on LHC@home via a generic volunteer computing HTCondor queue.

Acknowledgments
We would like to express our warm thanks to all the volunteers who contribute computing power to LHC@home. In particular we would like to thank our volunteer helpers and contrib-