The INFN scientiﬁc computing infrastructure: present status and future evolution

. The INFN scientiﬁc computing infrastructure is composed of more than 30 sites, ranging from CNAF (Tier-1 for LHC and main data center for nearly 30 other experiments) and nine LHC Tier-2s, to ∼ 20 smaller sites, including LHC Tier-3s and not-LHC experiment farms. A comprehensive review of the installed resources, together with plans for the near future, has been collected during the second half of 2017, and provides a general view of the infrastructure, its costs and its potential for expansions; it also shows the general trends in software and hardware solutions utilized in a complex reality as INFN. As of the end of 2017, the total installed CPU power exceeded 800 kHS06 ( ∼ 80,000 cores) while the total storage net capacity was over 57 PB on disk and 97 PB on tape: the vast majority of resources (95% of cores and 95% of storage) are concentrated in the 16 largest centers. Future evolutions are explored and are towards the consolidation into big centers; this has required a rethinking of the access policies and protocols in order to enable diverse scien-tiﬁc communities, beyond LHC, to fruitfully exploit the INFN resources. On top of that, such an infrastructure will be used beyond INFN experiments, and will be part of the Italian infrastructure, comprising other research institutes, universities and HPC centers.


Introduction
The National Institute for Nuclear Physics (INFN) is the research agency, funded by the Italian government, dedicated to the study of the fundamental constituents of matter and the laws that govern them.It conducts theoretical and experimental researches in the fields of sub-nuclear, nuclear and astroparticle physics.Since its foundation, in 1951, the activities it fosters require larger and larger computing and storage resources, due to the increasing complexity of the experiments.INFN has also been the seminal institution for the national research network in Italy, now handled by GARR 1 .
INFN has 20 divisions, four national laboratories and two national centers: most of them host one computing center at least, varying in size from the WLCG Tier-12 to small local facilities.A new committee, C3S 3 , has been formed two years ago to coordinate the scientific computing at INFN and lead its evolution.
As a first step, in order to get a complete picture to plan future evolution and find possible optimizations in the short term, the C3S organized a survey of the INFN computing centers.We will present, in the next sections, the results of the survey and identify some possible evolutionary scenarios.

Cost, expandability and optimization
The focus of the survey was on the resource deployment (figures relative to the end of 2017), the infrastructure capabilities and the effort needed for their support.For this reason, only a questionnaire was requested for each computing infrastructure, even if hosting several logical data centers (such as CNAF with the INFN Tier-1 and a Tier-2 for the LHCb experiment4 ); on the other hand, for other sites, e.g.Napoli, more questionnaires were needed due to multiple infrastructures present.The most relevant aggregate values for the infrastructure are reported in Tab. 1.
We collected 29 distinct answers, for data centers ranging from few tens to more than 15,000 hosted CPU cores.In order to ease the analysis of the collected data, we divided the sites into three categories: • Large size data center -having a computing farm with more than 1,000 cores and installed disk of at least 750 TB-N5 .These values correspond to ∼ 50% of the amount of available resources at the average INFN Tier-2 in 2017; • Small size data center -having a computing farm with less than 200 cores or less than 100 TB-N of disk6 ; • Medium size data center -all the remaining centers (8 in total).
As shown in Fig. 1, with the chosen thresholds, the three groups are clearly identified and separated.https://doi.org/10.1051/epjconf/201921403001CHEP 2018  Eventually, the large sites correspond to the official WLCG Tier-1 and Tier-2 sites7 as depicted in Tab. 2. The division of sites into these three categories helps us in understanding the different cost patterns: as probably expected, there is a higher need of support for small installations (when scaled by the resource deployment, see Fig. 2 and Fig. 3).In fact, the cost of support per unit resource is evidently smaller for large sites, which suggests (or better, confirms) that economies of scale are important.
Another important economic aspect is the cost of electric power: for the Tier-2s hosted by a University department, it is so far included in the rental agreement (e2.4M is the annual "in-kind" contribution evaluated with current power costs).Although this is a clear advantage at the moment for INFN, it is uncertain whether this could be true for the future, because of the increase in the amount of the resources foreseen in the coming years.Concerning the capability for expansion, it was found that a +60% increase in space, power and cooling can be accommodated with current INFN distributed data centers, without the need of major infrastructure reworkings.Anything larger requires the deployment of different centers.Still, assuming the validity of Moore's law at a level of +20%/year, this means the current infrastructure could sustain a deployment of resources seven times larger by 2026.

Future directions
The need for scientific computing is expected to increase widely in the next decade.High Luminosity LHC (HL-LHC) projects a 20x increase in needs by 2027, which cannot be ac-commodated in the current infrastructure.By the same date, other experiments like CTA [1] and SKA [2] are expected to require the nearly same amount of resources as a typical LHC experiment.
To cope with these challenges, INFN has started several R&D activities towards HL-LHC, mainly through the participation to European projects: • Evolution of Computing towards Clouds with the European projects INDIGO-DataCloud [3] for the development of the services, HNSciCloud [4] to test the use of commercial clouds for the research, EOSCpilot [5] and EOSC-hub [6] to build the European Open Science Cloud; • Study of solutions for Exabyte level storage, including caches and optimized access through the European project eXtreme Data Cloud (XDC) [7]); Furthermore, INFN is exploring unconventional ways to increase the computing served by its Tier-1 center at CNAF, via cloud and remote resources: tests were performed using commercial cloud providers (Aruba, Azure, T-Systems), academic sites (Bari-ReCaS), and more recently by offloading a large fraction of CNAF computing resources to the PRACE Tier-0 at CINECA [10].
These R&D programs enable INFN to disentangle a variety of issues like: • The need for caching systems to overcome latency and bandwidth limits (solutions like GPFS/AFM [11] and Xcache [12] have been tested on production systems).
• The need to be able to cope with different cloud technologies (VMWare9 , Azure10 , Open-Stack11 , OpenNebula12 ); in this respect, INFN has been the promoter of the Dynamic On Demand Analysis Service (DODAS) [13], which is currently being tested in several CMS 13centers (e.g.opportunistic Tier-3 sites on commercial cloud providers such as Azure and regular Tier-2 sites such as Imperial College, IFCA14 and Sofia); • The need of high bandwidth connectivity among remote sites: the experimentation had started with the distributed Tier-2 between INFN-Legnaro and INFN-Padova, and a 1.2 Tbit/s link is now operational between CNAF and CINECA via the emerging technology of Data Center Interconnect (DCI) [10].
• The need to incorporate external computing entities like HPC sites into the INFN infrastructure, either via opportunistic allocations, or grants.The handshaking with HPC sites is generally difficult, due to site policies (access, networking, local disk space, ...), and the experimentation allows to gain experience on how to develop and deploy custom solutions.

Towards the Data Lake
In the framework of XDC, ESCAPE and collaborations with other European centers, INFN is performing tests on possible Data Lake solutions, which are currently the most popular options for the storage handling in the 2020s.INFN has also launched a specific R&D experiment, the Italian Distributed Data Lake for Science (IDDLS) initiative, in order to test "Data Lake like" solutions by linking three WLCG sites in Italy, with a mesh of DCI solutions.
A great opportunity is given from the new facility of European Centre for Medium-Range Weather Forecasts (ECMWF) [14] which is going to be launched in Bologna from 2020.This facility will accommodate not only a data center for ECMWF but also another one for CINECA and CNAF.On one hand, the co-location of CINECA and CNAF will allow for additional and closer interactions between the INFN HTC and the PRACE HPC infrastructures; on the other hand, the new data center, with a dedicated power up to 10 MW for INFN computing and data resources, will constitute the main component of the future INFN Data Lake.
As mentioned in Sec. 3, also in the program of the project IBiSCo it is foreseen the setup of a Data Lake among the main sites in the south of Italy.
The structure of a Data Lake in Italy is still an open question with at least three variants (see Fig. 4): • The main Data Center in Bologna, other sites serving as Compute Nodes, possibly with caches; • A single logical Data Center, physically distributed with co-location of CPUs and storage; • Compute nodes logically distributed per region.While the centralized approach represented in the first and third scenarios (ie a single Data Center hosting the data, see Fig. 4) could help in the optimization of resource management and would fit into a purely WLCG ecosystem (redundancy would be provided by other centers at European level), on the other hand, it would not be optimal for other experiments that could instead require some kind of data redundancy within the INFN infrastructure.Furthermore, the consolidation of the present Tier-2 infrastructure in fewer larger sites could reduce the total cost of its maintenance in the medium to long term.Thus, a probable scenario for the Data Lake could see a distributed site for data, with the main data center in Bologna and a smaller one in the South for redundancy of the data not replicated elsewhere, and a few large sites with computing nodes (see the middle picture of Fig. 4) .

Conclusions
The facility under construction in Bologna for the new ECMWF data center will also be able to host the CNAF and CINECA data centers.The new CNAF data center will have a dedicated power of up to 10 MW for IT and will constitute the main component of the future INFN Data Lake.
The final deployment of INFN computing for the next decade is still not decided, but most probably it will involve: • the consolidation of storage in fewer sites; • the optimization of the Tier-2 infrastructure into bigger sites; • the exploitation of the coming Tier-1 data center and possibly of another one that would be constructed in the south of Italy (the follow-up of ReCaS); • the realization of a closer connection of HTC and HPC resources, enhanced by the colocation of CNAF and CINECA; • the consolidation of small sites into the INFN wide Cloud infrastructure.
Test of use of GPUs via Cloud Interfaces with the European project DEEP-Hybrid Data-Cloud (DEEP) [8]; • Study of 'Data Lakes' via the European project ESCAPE (European Science Cluster of Astronomy & Particle Physics ESFRI Research Infrastructures).• Test of a common Computing and Data Infrastructure lake among data centers in the south of Italy in the framework of the PON 8 IBiSCo, still under evaluation.The project goal is to carry out the strengthening of the current southern ReCaS [9] computing infrastructure, already funded with INFN ordinary funding, past PON programs and the Distributed High Throughput Computing and Storage in Italy (DHTCS-IT) project also funded by the Italian Ministry of Education, and to constitute the first real and concrete step towards the Italian Computing and Data Infrastructure, a multi-disciplinary and multi-functional platform, which is able to adapt to the needs of all the scientific communities.

Figure 4 .
Figure 4. Possible Data Lake models for INFN.

Table 2 :
INFN sites grouped by size