Status and Future of the CMS Tracker DCS

Detector Control Systems (DCS) for modern High-Energy Physics (HEP) experiments are based on complex distributed (and often redundant) hardware and software implementing real-time operational procedures meant to ensure that the detector is always in a safe state, thus maximizing the lifetime of the detector. Display, archival and often analysis of the environmental data are also part of the tasks assigned to DCS systems. The CMS Tracker Control System (TCS) is a resilient system that has been designed to safely operate the silicon tracking detector in the CMS experiment. It has been built on top of an industrial Supervisory Control and Data Acquisition (SCADA) software product WinCC OA extended with a framework developed at CERN, JCOP, along with CMS and Tracker specific components. The TCS is at present undergoing major architecture redesign which is critical to ensure efficient control of the detector and its future upgrades for the next fifteen years period. In this paper, we will present an overview of the Tracker DCS and the architecture of the software components as well as the associated deliverables. 1 The TCS Architecture The CMS is a Large Hadron Collider (LHC) detector that consists of several subdetectors that fill the detector volume in compact cylindrical layers. The CMS Tracker detector is the largest silicon detector in the world, consisting of multiple layers of silicon pixel and silicon strip modules that cover an active surface of 210 m2. A particle hit in the Tracker generates a signal that reveals the hit position accurately. A particle track can then be reconstructed by connecting the particle hits in the different tracking detector layers thus revealing the charge and momentum information of the particle. This comes at a cost: the Tracker detector has many millions of readout channels and dissipates more than 100 kW. It must be actively cooled and its operating temperature is actually below the dewpoint of the cavern, so it is installed in a humidity controlled atmosphere and flushed with dry air to avoid condensation and ice formation. * Corresponding author: wassef.karimeh@cern.ch on behalf of the CMS Collaboration © The Authors, published by EDP Sciences. This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (http://creativecommons.org/licenses/by/4.0/). EPJ Web of Conferences 245, 01005 (2020) CHEP 2019 https://doi.org/10.1051/epjconf/202024501005


The TCS Architecture
The CMS is a Large Hadron Collider (LHC) detector that consists of several subdetectors that fill the detector volume in compact cylindrical layers. The CMS Tracker detector is the largest silicon detector in the world, consisting of multiple layers of silicon pixel and silicon strip modules that cover an active surface of 210 m 2 . A particle hit in the Tracker generates a signal that reveals the hit position accurately. A particle track can then be reconstructed by connecting the particle hits in the different tracking detector layers thus revealing the charge and momentum information of the particle. This comes at a cost: the Tracker detector has many millions of readout channels and dissipates more than 100 kW. It must be actively cooled and its operating temperature is actually below the dewpoint of the cavern, so it is installed in a humidity controlled atmosphere and flushed with dry air to avoid condensation and ice formation.
The Tracker Control System (TCS) is crucial for the supervision of all environmental conditions and powering processes and is designed to ensure a stable and safe operation of the detector. The TCS (see Fig. 1) handles all interdependencies between the powering and the cooling systems and the environmental control systems in one easily operable Human-Machine Interface (HMI). This paper is about the evolution of the TCS architecture that started as a service to the CMS Tracker but evolved in a way to provide tooling to other subdetectors, aiming for a scalable DCS architecture.

Power system
The Power Supply System for the CMS Silicon Tracker provides High Voltage (HV) and Low Voltage (LV) power to the control and readout electronics and to the silicon sensors of the detector. The detector power scheme includes about 1000 CAEN [1] Power Supply Modules (PSM) that are housed in power supply crates (a total of 149), distributed in 31 different racks in the experimental cavern. Each rack, containing up to 5 power supply crates, is controlled via a "branch controller". The controllers are, in return, housed in 9 mainframes that provide, between other, the Ethernet connection used for communicating with the crate controllers and the TCS over the OPC (Open Platform Communications) [2] protocol. The communication between the branch controllers and the individual power supply crates is over CAN BUS [3] and is already implemented by the power supply vendor. The power supply system building block is the Power Supply Unit (PSU), two per PSM, providing the two LV sources and two HV sources that form one detector power group. The PSU has two independent HV channels, each powering half of the modules in one power group, to increase flexibility.
The experimental cavern is mostly inaccessible which makes the need for a remotely controlled powering system unavoidable. The possibility of Single Event Upset (SEU) overwriting memory cells makes the ability to externally interlock the power supplies mandatory.
Switching on the detector must be done in a safe sequence: control power supplies, low voltage then high voltage to prevent possible damage. Switching off the detector must be done in the opposite sequence. This is controlled by the TCS that enables or disables powering the detector and ensures that the correct sequence is followed. Additionally, the TCS implements software interlocks that act on the relevant channels by evaluating software defined limits. This allows for a smooth and rapid power ramp down instead of brutal power cuts.
Information from power supplies, Tracker Safety System (TSS) and Detector Control Units (DCUs) [4], transmitted from the experiment Data Acquisition (DAQ) over SOAP (Simple Object Access Protocol) [5], are constantly flushed and evaluated by the TCS.

Environmental sensors
At CERN, Programmable Logic Controllers (PLC) [6] systems are the basis for most of the control systems managing the detectors and the accelerators. The environmental sensors needed to ensure a safe operating environment for the Tracker are directly connected to a set of PLC systems which form the core of the autonomous hardware TSS and manage the interlock system for the Tracker power supplies (should the TCS not have already intervened). Three sets of PLC systems with different functionality communicate with the TCS: • Interlock systems that act on the basis of high temperature limits, and global Detector Safety System (DSS) actions. The interlock system parameters are configured, within limits, by the TCS.
• Thermal screen systems that prevent thermal stresses on the Tracker Support Tube, and condensation on the outer surfaces of the Tracker and in the channels where cooling pipes run together with power cables.
• Monitor systems that collect data from a variety of analog sensors and communicate with the TCS for archival and further analyzes.

Cooling system
The cooling system is designed to remove the heat generated by the electronics and provide temperatures as low as needed (down to -25 ºC) to minimize the increase of leakage current. Sensors for coolant temperature, pressure and level measurements are native to PLC systems external to the Tracker. These parameters are transmitted to DCS over MODBUS and DIP [7]. The cooling PLC systems also communicate with the TSS via hardwired lines that enable the latter to force the cooling units into "warm" operation or even into total stop if needed, bringing the Tracker to a safe state upon error detection.

Dry gas system
The Tracker is a huge object, operating, as explained, at temperatures below the experimental cavern dewpoint. To avoid condensation and ice formation, the Tracker volume and its periphery are permanently flushed with dry air or nitrogen depending on the operating conditions. Monitoring the gas quality in several locations in the detector volume is a very important factor for the Tracker safety. Parameter values from the PLC running the dry air plant and data from all the relevant sensors connected to the TSS are available to the TCS. In addition to interlock actions available to the TSS, detailed analyses are run continuously within the TCS and the resulting plots are examined and validated by human operators at least once daily.

Introduction
The SCADA system used in CMS and all other LHC detectors as well as the accelerator has been chosen long before their commissioning. The choice has been made after an extensive market survey and PVSS (ETM, Austria) was the only market product capable of fulfilling most of the requirements for a CERN-wise SCADA system. Today PVSS has transformed into WinCC OA [8], and the ETM company belongs to Siemens. WinCC OA provides a powerful toolkit with a live and persistent internal database. It is device oriented, classes can be defined in datapoint types that can be instantiated with datapoints. Additionally, it provides a set of tools for data archival, hardware addressing, alert classes definition and other important functionality.
CERN holds hundreds of small and large control system projects that have much in common. The Joint Controls Project (JCOP) [9] was initiated in order to handle all the common points, requirements collection, software development and training and support. It is a collaboration between the LHC accelerator and the LHC experiments, and it aims to provide a framework (FW) for common control solutions. The FW hides the complexity of underlying SCADA layers, defines guidelines and provides tools that ensure the homogeneity of control systems across the LHC experiments. The main deliverables of the JCOP FW are components that include a Finite State Machine (FSM) [10] toolkit, tools and schemas for configuration and condition data archival and retrieval, two communications protocols developed at CERN -DIP [7] and DIM [11] -frameworks and configuration tools for power supply systems produced by the main vendors in the field.

The TCS software architecture
The TCS computing infrastructure is composed of two quadruplets of servers located in two geographically separated computing farms on the CMS site and connected to the CMS private network. Redundant architectures are applied for most of the CMS DCS projects to minimize the impact of software and hardware failures [12]. It involves one peer host being active, handling all monitoring and control tasks, while the second peer is running in a minimally functional configuration. Data from the active peer is constantly copied to the passive peer to enable a rapid switchover as needed. The four TCS WinCC OA projects are distributed among these servers as shown in Fig. 2.
The TCS software projects are interdependent. Each project is a building block intended to cover specific sets of functionalities. Data and event handling is continuously shared between the four projects resulting in optimal computing load distribution: • Tracker DCS 1 contains the supervisory layer connecting all the TCS projects and implementing a software detector protection mechanism capable of triggering software interlocks when given conditions are met. To optimize the load distribution between the projects, Tracker DCS 1 also configures and controls the powering system of the Silicon Pixel Tracker.

The Tracker FSM and its role in operating the detector
The Tracker FSM is a powerful tool that assists operators and shifters in their daily job. It groups the power, cooling, dry gas and DCU-based monitoring systems defined in the four TCS projects in one hierarchical tree. The leaves of the state machine tree correspond to the hardware devices while the upper nodes form abstraction layers reflecting the logical partitions of the Tracker. Commands (e.g. turn ON) can be initiated from any logical node. They propagate down the hierarchy, while states are evaluated in the opposite direction.
The global state of the detector (e.g. ON) is continuously evaluated and made visible from the root Tracker FSM node giving critical information to the detector operator (see Fig.  3). For later use in physics analysis, data is tagged according to the HV status of the Tracker partitions, HV status which is "ON" only when the FSM top node of the partition is in the "ON" state. Due to the redundant design of the Tracker, data quality is good even with a significant number of inoperative channels. To avoid that individual power supply channel failures -among thousands running-could result in an entire partition entering the "ERROR" state, a majority mechanism was implemented [13]. This allows for a given fraction of power supply channels to be in the "ERROR" or "OFF" states without affecting the HV related data quality flags.

Archiving
Keeping track of the states and conditions that affect the Tracker main actors is crucial for security and offline operations analysis. These logs also play a significant role in the evaluation of the quality of the physics data and are directly consulted when performing physics analysis. To this end, all data collected by the TCS software projects is archived in a dedicated Oracle database. To reduce the overall Input/Output load, data is archived in blocks and a deadband is applied to avoid the needless archival of noise fluctuations.

The TCS components
The units of the project deployment are the WinCC OA components. They are an encapsulation of installation scripts, runtime files (scripts, libraries and panels) and configurations. Components are intended to facilitate the development, deployment and maintenance of a DCS project. Components can be frameworks that implement an abstraction layer. In this case, they define a set of primitives that can be used to implement a representation of a physical or logical system, and tools to configure this representation fulfilling specific requirements. Other components instead, create and instantiate the required objects with the help of the abstract frameworks and following the actual system structure.
Due to the maintenance requirements and the demands for new features, the TCS components have been continuously modified by several different developers with different design and programming styles. This has led to a major maintenance burden and has been the reason for deciding to re-design and refactor the TCS components during the Long Shutdown 2 (LS2) [14], when the detector was mostly not operated. The LS2 allowed also migrating to a newer version of WinCC OA (version 3.16) accompanied with an upgrade to the JCOP framework release. This had additional impact on the code re-design because of numerous backward incompatibilities.
The renovated TCS software consists of 14 distinct components built on top of the WinCC OA toolkit and CERN frameworks (see Fig. 4). A set of architectural principles [15] have been followed during the re-design phase to increase the internal cohesion of the TCS components and lower the coupling between components. The following guidelines have been set and followed: • The object structure definition has been separated from project specific requirements. For that, the configurations database is extensively used to store the Tracker specific datapoint configurations while keeping datapoint type definitions in the component. • All the remaining components conform strictly to the principles of high cohesion and low coupling. The Tracker tools and libraries were refactored and extended, becoming generic enough to be used by CMS subdetectors other than the Tracker. An effort was made to create framework components that provide a level of abstraction and standardization for a set of functionalities. For instance, the heartbeat mechanism, which was originally developed to monitor the connection to the power supplies was re-engineered into a framework component that provides an interface for verifying the connections to any hardware type. This work also had a significant impact in encouraging all CMS subdetectors to adopt some of the TCS development guidelines and components (which are no longer exclusive to the Tracker) for their future applications, and provided more uniformity and scalability to the CMS DCS.

TCS software Testing
With a total of 48 virtual CPUs, 96 GB of RAM and 2 TB of disk storage, the TCS virtual machine infrastructure was used to test the newly designed TCS components and to create the distributed TCS projects instances.
To ease the installation procedure, the CMS DCS installation tool was used. An installation procedure was defined to be able to recreate the four TCS projects from a predefined list of JCOP, CMS and Tracker components.
Basic hardware simulators were developed in control language to simulate the behaviour of real hardware in multiple scenarios. This has been used accompanied with test scripts to verify the TCS software reactions in predefined normal and extreme situations.

TCS software validation
Validation of the TCS software will be carried out in three different phases (see Fig. 5) during LS2. IQ is the process of validating the TCS software installation in the CMS DCS computing farm. It consists of two consecutive stages: installation of the TCS projects without running hardware drivers followed by hardware connection validation in the case of successful installation. The OQ will be carried out in isolation from the rest of CMS. Tracker experts, using predefined test scripts, will trigger automated actions to verify the hardware configuration, access control, FSM, detector protection mechanism and other key functionalities of the TCS software. The PQ will be carried out through the participation of the Tracker in CMS-wide operation where the TCS hardware, software and integration with CMS DCS will be validated.

Conclusions And Future
The TCS has proven throughout the years to be a reliable system capable of ensuring high availability of the subdetector without coming to compromises concerning its safety. In this paper, we presented an overview of its main hardware and software components as well as the ongoing LS2 upgrades aiming to fulfil the evolutionary CMS Tracker requirements. The TCS group, following solid design principles and best practices, strives to define development guidelines and to build common frameworks that can be exported to the current and future CMS subdetectors.