GSDC : A Unique Data Center in Korea for HEP research

Global Science experimental Data hub Center (GSDC) at Korea Institute of Science and Technology Information (KISTI) is a unique data center in South Korea established for promoting the fundamental research fields by supporting them with the expertise on Information and Communication Technology (ICT) and the infrastructure for High Performance Computing (HPC), High Throughput Computing (HTC) and Networking. GSDC has supported various research fields in South Korea dealing with the large scale of data, e.g. RENO experiment for neutrino research, LIGO experiment for gravitational wave detection, Genome sequencing project for bio-medical, and HEP experiments such as CDF at FNAL, Belle at KEK, and STAR at BNL. In particular, GSDC has run a Tier-1 center for ALICE experiment using the LHC at CERN since 2013. In this talk, we present the overview on computing infrastructure that GSDC runs for the research fields and we discuss on the data center infrastructure management system deployed at GSDC.


Introduction
Nowadays, we can access to e-mails, files and documents from anywhere at anytime via web [1] and apps not only on laptop or desktop but also on mobile devices.This is true when we use commercial services such as Google Docs [2] or Apple iCloud [3] in which our e-mails, files and documents are always accessible when we are connected to the Internet.Also it is true in research fields where IT infrastructure is crucial to store and process the large scale of data produced by the instruments and to make the data accessible to stakeholders.These can be done thanks to so called Data Centers and we have benefited from them for many years without knowing their existence.This paper discusses about Data Center based on practical experience of operations of Global Science experimental Data hub Center (GSDC) at Korea Institute of Science and Technology Information (KISTI) in terms of which a Data Center should satisfy for highly available infrastructure and introduces GSDC as an example of the highly available infrastructure for IT operations.

Data Center for Big Data
Recently Data Centers have been built for facilitating the process of Big Data in business.The trends in business have shifted to provide services and products tailored for individual customers based a e-mail: sahn@kisti.re.kr

ISMD 2016
upon their personal interests.The information related to each individual's interest remains to his or her Internet activities via mobile devices and personal computers, for example, keywords entered in search engines, web pages or images clicked, and articles to read.The amount of these information is too big to be dealt with a mainframe or a conventional supercomputer so that the Data Center's role became crucial.Also, Data Mining [4] on these scattered information in order to extract a few specific keywords or items, and even more complicated pattern recognition techniques such as machine-learning [5] or deep-learning [6] are important in the same regards.
In research fields, development of high resolution instrumentation for better results has lead to the huge production of experimental data.Usually the amount of produced data is far beyond of the capability of a researcher's workstation or a moderate cluster in his or her laboratory.For example, four giant experiments built at European Organization for Nuclear Research (CERN) [7] to exploit the Large Hadron Collider (LHC) [8] -ALICE [9], ATLAS [10], CMS [11], and LHCb [12] -have produced a hundred petabytes of data since the start of operations in 2009 [13].A hierarchical model [14] for distribution of the data produced from the experiments was introduced at the design step of the LHC and it has evolved to the formation of the intercontinental collaboration of Data Centers around the world, which is called the Worldwide LHC Computing Grid [15].
A Data Center can be defined as a facility used to house computer systems and associated components, such as telecommunications and storage systems.It generally includes redundant or backup power supplies, redundant data communications connections, environmental controls (e.g.air conditioning, fire suppression) and various security devices.Large Data Centers are industrial scale which consumes electricity as much as a small town.The keywords for Data Center are housing computer systems, redundancy or backup, power efficiency and security.

Highly Available IT Infrastructure
The principle goal of Data Center in both business and research fields is to preserve data -user's e-mails, files and documents or invaluable experimental data produced from the giant instrumentation -safely.In order for Data Center to achieve it, a fault-tolerant system architecture -in other words, redundancy -is mandatory among the required design features [16].Nowadays, IT operations are a crucial aspect of most organizational operations around the world.If a IT infrastructure for a service has a single server connected with a single switch or router, and one of them goes failure, the entire IT service will be stopped.Having backup to keep IT service alive helps avoid the worst scenario, i.e. business continuity relies on the reliability of IT infrastructure which is the key for the business success.
There are two folds to have IT infrastructure highly available: not allowing a single point of failure in the system architecture and building a fast recovery mechanism.No single point of failure can be achieved by having all critical points in pairs.For example, each server can have dual power supplies in which it prevents from accidental power cut from one source.For a cluster, servers can be connected to two or more physical switches in order to have a backup uplink for networking.In a scale of Data Center, data stored in a Data Center can be replicated to the other sites so that the access to the data should be available all the time unless all sites are closed.
Recovery cycle for a IT service is carried out as following: 1) malfunctioning or glitches are occurred 2) they are detected by monitoring system and alarms are raised 3) for a critical case, a specific service has to be rolled back to its original state in which it can be achieved by automated provisioning and configuration management 4) the IT service comes back online.Figure 1 describes the recovery cycle for a IT service.In this cycle, automated provisioning and configuration management coupled with alarm system are crucial to have a fast recovery system for IT services.To have a backup for   the data or meta-data in the service is mandatory.Virtualization [18] and High Availability add-on [19] provides more options for reliable and agile IT infrastructure.

Global Science experimental Data hub Center
GSDC is a government funding project to promote the fundamental research in South Korea by providing IT infrastructure.In the first phase of its project, GSDC focused on collider physics conducted at CERN, FNAL[20] and KEK [21].Now in the second phase, GSDC expand its services to domestic region where requiring large amount of storage and computing power.The final goal is to become the National Data Center for fundamental research.The roadmap of GSDC is described in figure 2. GSDC has supported officially six experiments: ALICE, CMS, Belle II [22], LIGO[23], RENO [24] and Genome project.The specification of computing resources provided for each experiment is summarized in table 1.To support each experiment, a certain level of complexity is required to build essential IT services.For example, User Interface (UI) to give an access point to users, Computing Element (CE) to process user's tasks, Storage Element (SE) to store and manage data.However, these are like the tip of iceberg which means that the services provided to users, i.e.UI or CE, are only the seen part of the whole IT infrastructure.A complicated system administration software stack shown in figure 3 is in behind the scene and the computing and storage resources are provided upon the stack.For example, remote control on hardware can be done via IPMI tools, and installation of Operating System and required packages for a specific service can be done via provisioning tool.For provisioning and configuration management, GSDC introduced Foreman [25] and Puppet [26].Most of all services are profiled in Puppet code written in Ruby.Puppet is coupled  with Foreman so that a server can be defined to run a specific service at the provisioning step.The entire infrastructure of GSDC is monitored by various kind of tools, such as Elasticsearch coupled with Logstash to query service status and it is visualized by Kibana [27].Storage system and network are monitored by other tools, for example, Splunk [28] and Observium [29].For Grid systems, perfSONAR [30] services are enabled to monitor network status among sites periodically.The system architecture for GSDC network is fully redundant as shown in figure 4.
It is worth to point out that the entire software stack for system administration is based upon opensourced projects.It requires a high skillset to deploy and configure in order to satisfy production level of service.

Conclusion
Recently Data Centers are built for processing and storing the large scale of data.Highly available IT infrastructure is the key of the success of the business and fundamental research.GSDC is the sole Data Center in South Korea which government funds for fundamental research, and it is targeting the most reliable Data Center in the world.

Figure 1 .
Figure 1.Recovery cycle for a IT service in case that malfunctioning or glitches are occurred.Automated provisioning and configuration management are crucial in this cycle.

Table 1 .
Summary of specification of computing resources provided by GSDC for experiments

Figure 2 .
Figure 2. A roadmap of GSDC project.Now it is in the second phase and the final goal is to become the National Data Center for fundamental research in 10 years.

Figure 4 .
Figure 4. GSDC System Architecture for Network which is fully redundant.