Enabling Grid Computing resources within the KM3NeT computing model

KM3NeT is a future European deep-sea research infrastructure hosting a new generation neutrino detectors that – located at the bottom of the Mediterranean Sea – will open a new window on the universe and answer fundamental questions both in particle physics and astrophysics. International collaborative scientific experiments, like KM3NeT, are generating datasets which are increasing exponentially in both complexity and volume, making their analysis, archival, and sharing one of the grand challenges of the 21st century. These experiments, in their majority, adopt computing models consisting of different Tiers with several computing centres and providing a specific set of services for the different steps of data processing such as detector calibration, simulation and data filtering, reconstruction and analysis. The computing requirements are extremely demanding and, usually, span from serial to multi-parallel or GPU-optimized jobs. The collaborative nature of these experiments demands very frequent WAN data transfers and data sharing among individuals and groups. In order to support the aforementioned demanding computing requirements we enabled Grid Computing resources, operated by EGI, within the KM3NeT computing model. In this study we describe our first advances in this field and the method for the KM3NeT users to utilize the EGI computing resources in a simulation-driven use-case.


Introduction
KM3NeT is a future European deep-sea research infrastructure hosting a new generation neutrino telescope with a volume of several cubic kilometres located at the bottom of the Mediterranean Sea. The KM3NeT research infrastructure will comprise a deep-sea neutrino telescope at different sites (ARCA), a neutrino-mass-hierarchy detector (ORCA) and nodes for instrumentation for measurements of earth and sea science (ESS) communities. ARCA/ORCA, the node(s) for ESS instrumentation, the cable(s) to shore and the shore infrastructure will be constructed and operated by the KM3NeT collaboration.
International collaborative scientific experiments, like KM3NeT, are generating datasets which are increasing exponentially in both complexity and volume, making their analysis, archival, and sharing one of the grand challenges of the 21st century. These experiments, in their majority, adopt computing models consisting of different Tiers with several computing centres and providing a specific set of services for the different steps of data processing such as detector calibration, simulation and data filtering, reconstruction and analysis. The computing requirements are extremely demanding and, a e-mail:cfjs@outlook.com This is an Open Access article distributed under the terms of the Creative Commons Attribution License 4.0, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. usually, span from serial to multi-parallel or GPU-optimized jobs. The collaborative nature of these experiments demands very frequent WAN data transfers and data sharing among individuals and groups. Typical, such a computing model utilizes several different computing infrastructures like: Grids, Clouds, HPCs, Data Centers and Local computing Clusters.

KM3NET computing model
The KM3NeT computing model (data distribution and data processing system) is based on the LHC computing model. The general concept consists of a hierarchical data processing system, commonly referred to as Tier structure (see Fig. 1).
At the first layer (Tier 0), a farm of computers filters the full raw data from the neutrino detectors (and other instruments) in real-time (all-data-to-shore strategy). This computer farm is located in the shore station at each detector site. The main requirement is the reduction of the overall data rate by about three orders of magnitudes (from 5 GB/s to 5 MB/s for one building block). The output data are temporarily stored on a persistent medium and distributed with fixed latency (typically less than few hours) to various computers centres. For detector monitoring and alert handling, selected events are reconstructed using fast algorithms with a low latency of the order of minutes (quasi-online reconstruction).
In the second layer (Tier 1), the filtered data are processed. Various models for hit patterns are fitted to the data (usually referred to as reconstruction) and the results of the fits are stored for further analysis. The overall rate of reconstructed data amounts to less than 1 MB/s. Calibration data are used to determine the time offsets and positions of the photo-multiplier tubes (PMTs). The results of the calibration are stored in a central database system. The typical update frequencies of the PMT positions and time offsets are 10 −2 Hz and 10 −5 Hz, respectively. The corresponding data rate amounts to less than 1 MB/s. These data are necessary input to the reconstruction to achieve an optimal pointing resolution of the detectors. An automated calibration procedure is needed to process the filtered data in a timely manner. Ultimately, the calibration and the reconstruction procedures should operate with fixed and limited latency (few hours). The results from the reconstruction are made available to all scientists for detailed analyses and dissemination of scientific results.

07002-p.2
Very Large Volume Neutrino Telescope (VLVnT-2015) To assess the detector efficiency and systematics, dedicated Monte-Carlo-simulations are needed. Due to the changing running conditions of the detector in the deep-sea environment, time (condition)dependent simulation data sets (run-by-run simulations) are considered -the data taking is split in sufficiently small time intervals with stable conditions (referred to as runs) and the detector response is simulated individually for these periods, allowing for a direct comparison of experiment and simulation. Due to the construction of the detector in independent, nearly identical building blocks, simulations can be performed for the fixed geometry of two types of building blocks (ARCA and ORCA) -the overall detector performance can be directly deduced from that.
The analyses of the reconstructed data are primarily processed at the local computing clusters of the partner institutes (Tier 2). The following computing centres and pools provide resources for the KM3NeT experiment:

KM3NET-EGI flowchart (first use-case)
For the steps of job submission and user authentication-authorization the default EGI [1] procedure is being used. For these purposes the collaboration has set up the "KM3NeT.org" Virtual Organization (VO). At Table 2 we show the preliminary KM3NeT.org central services, within the EGI platform.
One of the main tasks is the efficient distribution of the data between the different computing centres -CC-IN2P3, CNAF will act as central storage, i.e. the resulting data of each processing step are transferred to those centres. The data storage at the centres is mirrored. In Fig. 3 the preliminary flowchart between KM3NeT and the resources hosted by EGI is shown. The data output for each process is being transferred from the Grid worker node to the Grid storage element at the Napoli ReCaS Grid site and then to CC-IN2P3 at Lyon by using the gridFTP protocol or directly to user destination (laptop, local computer cluster, CC-Lyon) by using the eT-IKAROS [2] utility.
Our biggest challenge for the near future is to finalize the VO policy and develop data management tools and utilities that will create a smooth connection between the EGI platform and the KM3NeT, in order to fulfill the KM3NeT computing model requirements.

Service Grid Site
Authentication