OSiRIS: A Distributed Storage and Networking Project Up-date

. We report on the status of the OSiRIS project (NSF Award #1541335, UM, IU, MSU and WSU) after its fourth year. OSiRIS is delivering a distributed Ceph storage infrastructure coupled together with software-deﬁned networking to support multiple science domains across Michigan’s three largest research universities as well as the Van Andel Institute. The project’s goal is to provide a single scalable, distributed storage infrastructure that allows researchers at each campus to work collaboratively with other researchers across campus or across institutions. The NSF CC*DNI DIBBs program which funded OSiRIS is seeking solutions to the challenges of multi-institutional collaborations involving large amounts of data and we are exploring the creative use of Ceph and networking to address those challenges. We will present details on the current status of the project and its various science domain users and use-cases. In the presentation we will cover the various design choices, conﬁguration and the tuning and operational challenges we have encountered in providing a multi-institutional Ceph deployment interconnected by a monitored, programmable network fabric. We will conclude with our plans for the ﬁnal year of the project and its longer term outlook.


Introduction
The OSiRIS project [1] has successfully connected four campuses and a smaller edge site with a software defined networking and storage system that allows the seamless sharing of large datasets. By the end of the fourth year of the project, we have an established rapid deployment infrastructure, automated virtual organization provisioning, self-service user enrollment with delegated approval, and AAA (Authentication, Authorization, and Accounting) infrastructure allowing for role-based fine grained access to resources. Early in the project we completed an engagement with CTSC to externally evaluate our security model [2]. We manage our own access and usage of project resources using the very same system as our users. Automation and orchestration are fully leveraged for stable and well-monitored services. All of this is * e-mail: smckee@umich.edu linked to 13 PiB of Ceph storage accessible through gateway filesystem mounts or through scalable and universally accessible S3 protocols.
OSiRIS currently serves approximately 14 science virtual organizations housed across 6 US institutions or labs with collaborators worldwide. The scalable, flexible nature of OSiRIS leaves the door open for more users and more collaboration with other research platforms. The remainder of the paper will cover our accomplishments and future goals in detail.

The OSiRIS Project
OSiRIS (Open Storage Research InfraStructure) is a collaboration of scientists, computer engineers and technicians, network and storage researchers and information science professionals from University of Michigan/ARC-TS (UM), Michigan State University/iCER (MSU), Wayne State University (WSU), and Indiana University (IU) (focusing on SDN and network topology). Recently we have also collaborated with the Van Andel Institute (VAI) in Grand Rapids, MI to extend our Ceph cluster with a small, fast cache at their site which is backed by the much larger amount of storage at the primary sites.
We are one of four NSF "Campus Computing: Data, Networking, Innovation: Data Infrastructure Building Blocks" (CC*DNI DIBBs) projects funded in 2015. OSiRIS has been prototyping and evaluating a software-defined storage infrastructure, initially for our primary Michigan research universities, designed to support many science domains. Our goal is to provide transparent, high-performance access to the same storage infrastructure from wellconnected locations on any of our campuses. By providing a single data infrastructure that supports computational access "in-place," we can meet many of the data-intensive and collaboration challenges faced by our research communities and enable them to easily undertake research collaborations beyond the border of their own universities.
A single scalable infrastructure is easier to build and maintain than isolated campus data silos. Data sharing, archiving, security, and life-cycle management can all be implemented under one infrastructure. At the same time, our architecture will allow the configuration for each research domain to be optimized for performance and resiliency.

Ceph in OSiRIS
Ceph is a distributed object storage system that gives us a robust open source platform to host scientific data used in multi-institutional collaborations. The core of Ceph is the Reliable Autonomic Distributed Object Store. RADOS is self healing, self manages replication, and has excellent scalability and performance [3]. RADOS supports multiple data interfaces including POSIX, S3 compatible object storage, and kernel block devices. Ceph has sophisticated allocation mapping using the Controlled Replication Under Scalable Hashing (CRUSH) algorithm to allow us to customize data placement by use-case and resources [4].
Our Ceph deployment is distributed across sites at WSU, MSU, UM, and VAI. We have also had several experiences distributing the deployment to sites geographically farther away with a slight loss of performance while retaining functionality [5] and/or augmenting the distributed deployment with local caches [6] [7]. Ceph allows us to choose the level of replication among these sites based on the needs of participating science domains. Typically our highest level of data resiliency would be provided by having one or more replicas at each site. Ceph also has options for creating Erasure Coded data pools which provide configurable redundancy similar to RAID. Our most recent storage purchases were specified to increase overall node count at each site so that we will be better able to leverage EC functionality in Ceph [8].

OSiRIS Network Management Abstraction Layer (NMAL)
Another important part of the OSiRIS project is active network monitoring, management and network orchestration via the NMAL. Network topology and perfSONAR Periscope monitoring components deployed to hosts and switches to ensure that our distributed system can optimize the network for performance and resiliency through SDN (Software Defined Networking) control.
Main components in NMAP include BLiPP, UNIS, and an SDN controller.
• BLiPP -Basic Lightweight Periscope Probe. BLiPP agents may reside in both the end hosts (monitoring end-to-end network status) and dedicated diagnostic hosts inside networks.
• UNIS -Unified Network Information Store. The Periscope UNIS data store exposes a RESTful interface for information necessary to perform data logistics. The data store can hold measurements from BLiPP or network topology inferred through various agents.
• SDN Controller -Driven by information collected in UNIS, an SDN controller can dynamically modify network topologies to enable the best path between clients and data and between internal OSiRIS components (i.e., for Ceph replication).

OSiRIS Authentication, Authorization, and Auditing
The OSiRIS approach to authentication is to use identity federations and avoid managing authentication accounts. Federation participants will use their local providers to verify an identity and begin a self-service enrollment process which creates an OSiRIS identity belonging to one or more OSiRIS virtual organizations. We leverage InCommon [9] and eduGAIN[10] federations, Shibboleth [11], Grouper [12] and COmanage [13] to enable our science domain users to self-enroll, self-organize, and control access to their own storage within OSiRIS. Virtual organizations are known as 'COU' internally to COmanage (CO organizational units). Once a user is approved by designated VO admins or OSiRIS project admins provision their identity is established in COmanage and linked to their institutional identity. From there access to OSiRIS services is provisioned as shown in Figure 1. Access to service credentials is via the COmanage gateway. Should an individual move organizations we can simply link their new organization to existing OSiRIS identity. Multiple federated identities can be linked to a single OSiRIS identity as well. Virtual OSiRIS organizations can self-organize and manage members and roles via OSiRIS services such as COmanage and Grouper. They can further control access to data via service specific tools such as S3 ACL, Globus shares, or Posix permission tools.

COmanage Ceph Provisioner Plugin
COmanage has no built-in capability to provision users to Ceph, but it is designed with a plugin-based architecture. Different identity management events in COmanage trigger calls to configured plugins with information about the event. Events might include new user identifier, new user groups (new COU membership), request to reprovision a user, and more. OSiRIS created a new plugin under this architecture to handle provisioning storage for virtual orgs, link storage pools to CephFS directories or S3 placement targets, create users, and assign user capabilities. The plugin is freely available from our Github repository which is linked along with instructions on our website [14]. The structure of the plugin is shown in Figure 2.

Globus Gridmap LDAP Callout
OSiRIS provides Globus access to our data storage on CephFS and S3. We configure Globus to use CIlogon for authentication with Gridmap for certificate DN mapping to local S3 or POSIX users. However, we use the voPerson LDAP schema [15] to store certificate DN where Globus as-is relies on a text mapfile. For a brief period we used our own utility to generate the mapfile from LDAP [16] but thanks to the work of an undergraduate student at U-M we now use a Gridmap callout module which directly looks up the DN to username mapping in our LDAP directory. Documentation and source code for the module is available from the OSiRIS website [17].

OSiRIS Service Monitoring
In the past year we've made improvements to our monitoring and alerting infrastructure. We now consolidate node health, service health, performance metrics, ceph metrics, and alerts using Prometheus for metric gathering and alerting with Grafana for visualization. Our architecture is fully deployed and managed by Puppet. This also includes orchestration which showing Prometheus instances at each OSiRIS site with a variety of metric feeds and long term storage in InfluxDB using remote read/write capabilities defines new monitoring targets automatically. For example, every node we build automatically becomes a Prometheus target providing a range of performance metrics and monitored for basic functionality such as availability, ssh access, access to relevant services for type of node, etc. Through the use of Puppet 'exported resources' this configuration occurs without administrative action. A variety of alert rules notify us via email, Slack, or summary Grafana dashboards should any host or service have an issue. Our alert rules and dashboards are collected for reference in our Github repository [18].
We've deployed Prometheus ( Figure 3) in a redundant configuration with an instance of the Prometheus server and Alertmanager at each site. Alertmanager is clustered for redundancy and alert deduplication. Prometheus is not well suited to long-term data storage so we continue to use InfluxDB to store and downsample data for the long term (currently 5 years).

Science Domain Engagements
OSiRIS plays an active role for multiple science domains and institutions. Some of these are just beginning to come up to full scale usage whereas as others have been incorporating OSiRIS since our first year. We also have discussions open with other projects such as the Open Science Network [19] and FABRIC [20]. Some recent engagements are highlighted below: • Oakland University: Two groups at Oakland University in Rochester, Michigan are leveraging OSiRIS storage for their research. The Battistuzzi research lab focuses on long-term evolutionary patterns of microbial life, and the OU Genomics group aims to use and promote next-generation sequencing and bioinformatics technologies for research and education at the OU Biological Sciences Department.
• Brainlife.io: An online platform to accelerate scientific discovery by automated data management, large-scale analyses, and visualization. Brainlife plans to switch over to OSiRIS as their primary archival storage system before the end of this year.
• Building on existing collaboration between MSU and the Grand Rapids based Van Andel Institute, OSiRIS has installed NVMe-based Ceph OSD nodes running S3 instances to enable direct access to bioinformatics research data ( Figure 4). OSIRIS at VAI will enable VAI bioinformaticians to work with MSU researchers to better understand Parkinson's disease and cancer. OSiRIS also facilitates data access for VAI researchers to leverage the computational resources at MSU's Institute for Cyber Enabled Research (iCER).

Next Steps
To reach our goals the project faces a number of interesting challenges: • Working with more scientific domains to leverage the strengths of OSiRIS as an worldwideaccessible object storage platform, especially interfacing with distributed compute/storage like OSG, XSEDE, Open Storage Network, and more.
• Building up a tool-kit of client interface options spanning from laptop to cluster systems. OSiRIS needs to be simple and easy to use even for those unfamiliar with object storage.
• Increasing the resiliency and scale of our object storage (S3) infrastructure to support IO at scale with no single points of failure. Ceph makes this kind of scaling is relatively straightforward in combination with commonly used open source components and only requires implementing resilient IP services on our existing proxy-backend architecture.
• Implementing software-defined networking (SDN) orchestration of both science-user and OSiRIS infrastructure network connectivity.
• Enabling science domain specific metrics to track, manage and optimize use of OSiRIS.
• Developing automated data life-cycle meta-data creation for users of OSiRIS.

Conclusions
The OSiRIS project goal is enabling scientists to collaborate on data easily and without building their own infrastructure. Scientists should be able to use our infrastructure by leveraging their existing institutional identities for authentication and self-management of resources. We aim not only to provide a scalable shared storage infrastructure, but to enable the most efficient use of that infrastructure with active network management via our NMAL layer. Users of OSiRIS should be able to get science done with their data instead of becoming bogged down in the details of data management and access.