Addressing Scalability with Message Queues: Architecture and Use Cases for DIRAC Interware

The Message Queue (MQ) architecture is an asynchronous communication scheme that provides an attractive solution for certain scenarios in a distributed computing model. The introduction of MQ as an intermediate component in-between the interacting processes allows to decouple the end-points making the system more flexible and providing high scalability and redundancy. DIRAC is a general-purpose interware software for distributed computing systems, which offers a common interface to a number of heterogeneous providers and guarantees transparent and reliable usage of the resources. The DIRAC platform has been adapted by several scientific projects, including High Energy Physics communities like LHCb, the Linear Collider and Belle2. A Message Queue generic interface has been incorporated into the DIRAC framework to help solving the scalability challenges that must be addressed during LHC Run3, starting in 2021. It allows to use the MQ scheme for a message exchange among the DIRAC components or to communicate with third-party services. Within this contribution we describe the integration of MQ systems with DIRAC and several use cases are shown. Message Queues are foreseen to be used in the pilot logging system, and as a backbone of the DIRAC component logging system and monitoring.


Introduction
We live in a world of large data streams, which are constantly provided by various sources and need to be processed efficiently. This massive amount of data requires the use of all available processing resources together with an efficient computing model, which is scalable and reliable.
High Energy Physics (HEP) communities face similar challenges, since the data produced by the experiments' detectors and by the Monte Carlo simulation jobs form a significant data stream that must be processed in a coordinated manner [1]. For this purpose, several approaches have been proposed, among them the DIRAC framework [2,3]. DIRAC, the interware, is an open-source software platform that provides the interface between the enduser and the underlying resources.
DIRAC implements a flexible distributed agent model that assures scalable processing over heterogeneous environments. The DIRAC interware was adopted as a computing solution by the HEP experiments like LHCb and Belle2, and also by many other projects which use it as a platform to perform advanced GRID operations.
Message Queue (MQ) architectures implement asynchronous communication schemes which fit very well to distributed models based on microservices and provide several advantages, including good scalability and performance. This paper is dedicated to the incorporation of existing MQ solutions in DIRAC.
The paper is organized as follows: First, in section 2, the concept of Message Queue communication model is introduced. Section 3 explains the details of Message Queue module implementation into the DIRAC framework. Section 4 presents several use cases. Finally, a summary is given in section 5.

MQ communication is based on the idea of introducing an intermediate component (queue)
in-between interacting entities, which are typically called consumer and producer (see Fig. 1). A queue acts as a buffer that stores messages sent by producers. This separation allows the communication to become asynchronous as the consumer and producer do not need to interact at the same time. This approach has several advantages: it allows to decouple the end-points making the system more flexible and providing high scalability and redundancy. In some MQ systems additional mechanisms are implemented to ensure, e.g., resilience or message delivery guarantee. Also, MQ architecture introduces flexibility at the technology level, permitting to interconnect heterogeneous environments.
The MQ paradigm is applicable at very different levels. It may serve as an inter-process communication mechanism acting within one operating system as well as a way of connecting the processes or services in distributed computing models. Various message-oriented middlewares (  . Each section is uniquely identified by the pseudo-url string, e.g., "mardirac3.in2p3.fr::Queue::Q2", which can be provided as an argument to the factory methods create-Consumer() and createProducer() responsible for the creation of producer and consumer instances. To improve the performance, the same connections can be reused by several consumers or producers. This functionality is provided by MQConnectionManager module. The purpose of MQConnector interfaces is to provide a mechanism that accommodates various communication protocols e.g. STOMP [8]. More details are given in the text.

Message Queues in DIRAC
A generic MQ interface has been introduced in DIRAC version 6, release 17. It allows to connect DIRAC components to (external) MQ services and to exchange messages with them. An access to the MQ services is realised via logical Queues or Topics [7]. The architecture of the MQ interface is presented in Fig. 2.
The MQCommunication interface provides factory methods to create MQConsumer and MQProducer instances which can be used to exchange messages with the MQ. The only requirement for the message format is that it must conform with a json structure. The configuration settings are loaded from the DIRAC Configuration Service, identified by the pseudo-url string, e.g., "mardirac3.in2p3.fr::Queue::Q2", which is provided as an argument to the factory methods. MQConnectionManager manages internally the list of open connections and assures thread-safe access. The same connections can be reused by several consumers/producers to improve the performance. The specialisation of the MQConnector then provides an interface mechanism tailored to a chosen MQ communication protocol. Currently, the handler implementation for Simple Text Orientated Messaging Protocol (STOMP) [8] standard is available. Both user-password and SSL certificates authentication mechanisms are supported. The implementation was tested with two message brokers: Rab-bitMQ [4] and ActiveMQ [5]. The existing scheme can be easily extended by adding a specialized module, e.g., to support more MQ protocol types.

Use Cases
In this section we briefly describe several use cases in which the MQ components are being used. The MQ has been used as part of the perfSONAR-DIRAC bridge architecture that is used for network performance monitoring, providing the metrics, and for network problem identification. More details can be found in [9, 10].
The DIRAC system is installed on worker nodes (WN) by distributed agents called pilots [11,12]. The development of a universal and scalable logging system for all pilots is also foreseen to accommodate the use of the MQ (see Fig. 3). Due to the variability of WN types, it is challenging to provide a generic solution that would provide information about possible failures during, e.g., the installation or configuration phases. The proposed architecture is shown in Fig. 3. The Pilot Loggers are components of the new DIRAC pilot generation. They are responsible for sending the logs to a dedicated system. The development is ongoing.
MQ is also used as the main buffer for internal DIRAC services' logging systems. This system is currently in production used together with the CERN ActiveMQ system. Finally, MQ is implemented as a failover mechanism for the ElasticSearch [14] in DIRAC monitoring services [15]. The monitoring system is dedicated to monitoring various components of DIRAC. It is based on Elasticsearch distributed search and a NoSQL analytics database. The implemented failover mechanism allows to redirect the logs to a dedicated MQ server. This solution has been tested with the RabbitMQ server.

Summary
The MQ generic interface has been successfully introduced in the DIRAC framework. It is being used as an important part of the DIRAC service logging system, as a failover mechanism for the DIRAC Monitoring System, and it is foreseen to play an important role in the universal pilot logging architecture being developed.