The upgrade and re-validation of the Compact Muon Solenoid Electromagnetic Calorimeter Control and Safety Systems during the Second Long Shutdown of the Large Hadron Collider at CERN

The Electromagnetic Calorimeter (ECAL) is one of the subdetectors of the Compact Muon Solenoid (CMS), a general-purpose particle detector at the CERN Large Hadron Collider (LHC). The CMS ECAL Detector Control System (DCS) and the CMS ECAL Safety System (ESS) have supported the detector operations and ensured the detector's integrity since the CMS commissioning phase, more than 10 years ago. Over this long period, several changes to both systems were necessary to correct issues, extend functionality and keep them in-line with current hardware technologies and the evolution of software platforms. Due to the constraints imposed on significant changes to a running system, major hardware and software upgrades were therefore deferred to the second LHC Long Shutdown (LS2). This paper presents the architectures of the CMS ECAL control and safety systems, discusses the ongoing and planned upgrades, details implementation processes and validation methods and highlights the expectations for the post-LS2 systems.


Introduction
The CMS ECAL detector [1] is composed of a scintillating crystal calorimeter and lead/silicon preshowers. The detector is divided into three major partitions known as the ECAL Barrel (EB), the Endcaps (EE) and the Preshowers (ES). The EB and EE partitions are subdivided in smaller partitions as follows: 36 super modules in the barrel and 4 Dees in the Endcaps. Each of these EE/EB partitions contain a set of four paired thermistors and a water leakage sensor to provide environmental information. This data is used by the ECAL Safety system (ESS) to protect the detector from multiple hazardous situations: e.g. temperature overheating when the detector is powered on. The ESS also monitors a set of digital input signals (DIs) to protect the detector from external conditions such as cooling system faults or changes in the CMS solenoid magnetic field that could damage the crystal photo-sensors. In total, the ESS monitors 352 thermistors, 4 water leakage sensors and 96 DIs. The ESS implements protection mechanisms through Siemens Programmable Logic Controllers (PLC) [2]. The EE/EB partitions are monitored by a single redundant controller, known as the ESS PLC. In its original configuration, the ESS PLC was designed to acquire environmental information using a custom middle hardware approach (readout units). These readout units can interface the detector from the CMS experimental cavern and retransmit data to the PLC at the service cavern, protected from the experiment's radiation field. This particular approach requires the usage of serial communication interfaces (RS-485) and a custom communication protocol to operate.
After more than 10 years of operations, additional requirements were identified. In addition to the existing features, the new version of the ESS must provide the following features: 1) individual alarm thresholds for temperature sensors, 2) a manual interlock interface and, 3) a new type of interlocks at the Detector Safety System (DSS) level (see figure 1). Two main constraints were found by the developers during the analysis of requirements. On one hand, the legacy software of the PLC was written in a low-level programming language known as Siemens Statement List (STL) [3], complicating the software maintenance. On the other hand, the readout unit's communication protocol did not provide fault-tolerance mechanisms or sufficient diagnostic information, compromising the migration to newer versions. These particular design choices made the original system very complicated to extend, motivating a change of the readout method in favour of an industrial solution in line with the rest of the experiment. This paper focuses on the changes made in the software to integrate the new features during the LHC Long Shutdown 2 (LS2).

The ECAL Safety System upgrade
The replacement during the LS2 of the previous ESS sensor readout system (with custom readout units) by industrial analogue input (AI) modules is the cornerstone of the CMS ECAL Detector Control System (DCS) upgrade plan [4]. In contrast to the custom readout units, the new AI modules introduce a fully transparent acquisition method of the environmental information. The resulting hardware setup removes the 12 readout units and the RS-485 interfaces, introducing instead a total of 45 AI modules across six rails. The AI modules are standard SIEMENS certified peripherals connected through the profibus [5] communication buses. This type of connection provides access to extensive diagnostics information and enables a full sensor's readout at a very quick sampling rate (approximately 100 milliseconds, compared to the initial 23 seconds). This hardware modification does not only alleviate the PLC program from having to support a complex communication protocol, but also provides the PLC with a fresh snapshot of the environmental data at every cycle. Following the update of the readout method, the ESS PLC was reprogrammed with a completely new software based on the CMS PLC framework. The CMS PLC framework allowed the developers to re-implement the features of the original system and to introduce the new ones, while reducing the overall system complexity.

The CMS PLC framework
The CMS PLC framework (also known as the CMS Tracker PLC framework) is a compilation of tools to develop PLC-based control applications for particle detectors. The framework consists of two main parts: the PLC program and a configuration toolkit to monitor the PLCs from the supervisory layer. The PLC program is written in a high-level language, the Siemens Structured Control Language (SCL) [6]. The PLC framework does not model specific applications directly. It focuses instead on modelling several types of devices, which are commonly used across the CMS experiment. The framework provides a set of basic data structures with more than 30 types of device definitions (PLCs, sensors, relays, DIs, etc.). The toolkit can be easily extended, allowing developers to introduce new definitions and using existing ones as templates.

PLC program
Among other things, the PLC program is mainly designed to detect hazardous conditions and to execute interlocks. Interlocks are a mechanism that makes the PLC output state dependent on the inputs. Interlocks are used in the ESS PLC to stop the cooling for a certain area, to turn off the powering systems or to request the DSS for an emergency shutdown of the detector. The dependencies between inputs and outputs are established through the concept of interlock groups. An interlock group sets the conditions to operate the PLC digital outputs (DO) based on input alarms (e.g. to open a contact and remove power from a certain area when overheating is detected). Interlock groups are, in conclusion, a data-driven mechanism where conditions and actions form a large action matrix. This matrix can be easily populated and reformulated without altering the program structure. For this reason, the risk of introducing errors in the PLC program is very low, reducing the programming of the PLC to a matter of configuration.

Data organization
The PLC framework establishes a set of rules to organize the program's data. Electronic components such as relays, sensors or digital inputs are modelled as objects that combine state, attributes and relations between components. Objects of the same type are grouped together in the form of arrays at specific memory locations, in sections known as data blocks. The data block names are part of the address namespace in the Siemens S7 programming platform, identified by the prefix DB following from a 3 digits sequence (e.g. DB400). Objects are typically declared in two data blocks at the same position, keeping apart monitoring data from configuration data. According to this rule, we can find sensor objects in the data blocks DB400 and DB600, DIs in DB402 and DB602, and relay objects in DB404 and DB604.

CMS PLC toolkit
The CMS PLC toolkit is a WinCCOA [7] application that encloses knowledge from the PLC. Among other things, this toolkit provides a massive configuration tool to help mapping large collections of objects into the DCS. The toolkit requires a set of configuration files with the list of inputs and outputs channels connected to the PLC. This information is stored in the DCS, using specific data structures known as data points. Each channel is represented by three different data points, identified by a numeric triplet and the following suffixes: "read", "write" and "readback". The numeric triplet refers to the crate, module and channel used in the hardware setup, while the suffixes refers to the data access mode. The toolkit provides a library of device definitions with various types of PLC models, sensors and other device definitions using the JCOP [8] convention. These definitions contain descriptive and structural metadata to access the memory of the PLC, including properties such as peripheral addresses, alarms and conversion functions to translate the PLC data into standard units of measurements.

Extensions for ECAL
In close collaboration with the CMS Tracker developers, the CMS ECAL DCS team contributed in the extension and improvement of the framework. Among other things, the team developed a new option for massive configuration of data point aliases, giving the possibility to abstract from the underlying hardware definition without affecting the upper program's functionality. The toolkit has also been adapted to the specific characteristics of the ESS. In particular, several device definitions were created to accommodate the large number of inputs and outputs governed by the PLC: 165 Dos, 65 DI channels and two copies of the 356 analogue sensors. To enable these numbers, the framework now includes new device definitions that expand the capacity of the data blocks. Also, two new sensor types have been introduced to monitor NTC470 thermistors and water leakage sensors. The new definitions will be included as part of the framework and available to CMS DCS community in future releases.

The ESS PLC Software
The ESS program has been re-written according to the new requirements. Thanks to the new readout method, sensors data can be processed without any further decoding stage. The processor scans the input modules and transfers the environmental and digital data contents to the input's image table. This data is processed and distributed across different data blocks at the beginning of every PLC cycle. The data is processed according to the interlock groups. The results are stored and transferred to the output modules, triggering or resetting interlocks as defined by the ESS action matrix (see figure 2).

ESS action matrix
The action matrix defines the behaviour of the PLC in terms of input and output signals. The ESS action matrix establishes the conditions for firing interlocks that put the detector into a safe state. The ESS interlock system interacts with the detector equipment with a granularity of one Supermodule or Dee, using several DOs for the following purposes: 1) to switch off the detector high voltage (HV) power, 2) to switch off the front-end detector electronics (LV) and 3) to close the cooling valves. In addition to this, the system implements an emergency interlock to request to the DSS a full shutdown of the detector. To summarize, the ESS PLC evaluates more than 800 safety conditions and can execute up to 165 different actions to ensure the safe operation of the detector.

Environmental data
The ESS has been designed to monitor 352 temperature sensors and water-leakage sensors distributed across the EE and EB partitions. In the past years, four temperature sensors presented readout problems and were disabled. However, during the current LS2, these four sensors seem to be recovered after the removal of the middle readout hardware. They will be closely monitored during the re-validation of the ESS and considered for inclusion into the action matrix by the end of the LS2. Inside the PLC, temperature sensors are modelled as objects with multiple attributes such as quality flags, the probe identifiers or a range of alarm thresholds (upper and lower thresholds). These data structures are, in most of the cases, sufficient to store sensor's information. However, the ESS specification requires two upper alarm thresholds to implement a multi-layered set of actions on temperature sensors (see figure 3). To fulfil this requirement, the program allocates two instances of the sensor objects with different upper alarm thresholds. A single interlock group configures the DSS shutdown action on critical overheating conditions for all the sensors.

Individual temperature thresholds
One of the main features introduced in this version of the software is the individual temperature thresholds. In the legacy system, alarm thresholds were configured in groups of eight sensors (per Supermodule or Dee) without an adequate calibration. A thorough analysis of the archived data was done to extract the nominal values of the air temperatures inside the detector at standard operational conditions. Using these values as reference, the alarm thresholds were adjusted for each sensor, to trigger interlocks at +4°C and +8°C over the nominal temperatures. This adjustment will result in a quicker reaction of the PLC in the event of detector overheating. The calculations, based on a real overheating event, estimate the reaction of the PLC after 5 minutes overheating, instead of the 20 minutes observed in the original system.

Digital inputs
Most of the features of the PLC program are based on the standards of the CMS PLC framework. However, there are some functionalities that differ significantly from these standards. This is the case for the DI alarms. In the current version of the software, the DI objects contain a few additional elements to store the following data: 1) flags to register deviations from the normal state ("in alarm" state), 2) flags to set the DI "in alarm" state (which means that the state is abnormal) and, 3) timers to register for how long the DI has been in alarm. The fact that a specific timer runs out means that the DI has been in an abnormal state longer than the admissible time. Once the timer expires, any interlocks associated with the DI will be triggered. Similar to the acknowledgement mechanism of the sensor alarms, once a DI object reports a timed-out alarm state, interlocks are set and cannot be removed until the DI resumes to its normal state. Only then, the interlocks can be removed when an explicit external acknowledgement is received. This mechanism applies for all DIs except for those which are associated to watch dog signals.

Interlocks
Interlocks are automatically triggered by the PLC when a hazardous condition is detected (e.g. increasing air temperatures inside the detector). When an interlock happens, the digital output configuration of the PLC is modified to maintain the detector into a safe state (e.g. by removing power from the relevant power supplies). Interlocks are autonomous operations executed exclusively at the safety system level. However, there are multiple situations in which interlocks are required from the supervisory layer, even when no dangerous conditions are present. This feature is commonly known as the manual interlocks. In the legacy system, there was no explicit option to interact with the interlock mechanism. In the current system, interlocks are exposed to the DCS in form of relay objects, which can be commanded by the operators.

The ESS supervisory software
The ESS supervisory software is part of the CMS ECAL DCS applications. In the CMS experiment, the DCS applications are implemented using the SIMATIC WinCCOA platform, mostly written in its native programming language known as control language (CTRL). A new version of the ESS/DCS application was implemented to accommodate the changes in the PLC program's data. Inputs and output channels are modelled inside the PLC data blocks as objects, which in turn are mapped as data points in the DCS using the configuration toolkit. The new data structures do not provide the complex functionality required by the CMS ECAL DCS by themselves. They need to be grouped into logical units according to the distribution of the hardware within the detector, to give the feeling of a whole (e.g. Supermodule is the aggregation of 4 relays, 8 thermistors and 1 water leakage sensor). To achieve this feature, the ESS/DCS application has been structured in layers, separating the presentation layer (user interfaces, scripts, etc.) from the data access layer (backend to manipulate data). The software backend has been implemented as an application programming interface, so that the ESS data can be used in a coherent way across any of the CMS ECAL DCS applications. The ESS data is used in the DCS by a variety of applications. Among other things, the data need to be archived, displayed, processed and monitored at different levels of the architecture. More concretely, the ESS data are used by several applications to execute preventive automatic actions, configure alarms and notifications, interact with interlock system, and to display the status of the detector from the safety point of view.

Interlocks & alarm acknowledgement
Interlocks are retained until an input is received from the supervisory system, provided that the alarms that originated the interlocks are no longer active. This mechanism is commonly known as alarm acknowledgement and prevents the detector from exiting the safe state without an explicit confirmation by the experts. The PLC data structures offer the maximum level of alarm granularity: one alarm per sensor and digital input. This means that DCS is responsible for resolving the number of alarms to acknowledge in order to remove the interlocks from a certain area of the detector. Thanks to this mechanism, the DCS is capable of detecting the exact reasons for interlocking the detector. The DCS backend is then used to scan, group and acknowledge alarms according to their domain or location within the detector.

Automatic actions
The automatic actions in the DCS are preventive measures intended to protect the detector before escalating the problem to the safety system. There are various types of automatic actions. Some of the actions combine information from multiple sub-systems, including the ESS. The ESS data are grouped by partitions and distributed across the ECAL Finite State Machine (FSM), forming a hierarchy of objects to monitor and control the detector. Sensor's data are used in the FSM to compute the overall status of each individual partition, executing automatic actions on the power supplies if any of the nodes reports an undesired state. The automatic actions in the state machine are tuned dynamically, using the PLC individual alarm thresholds as a reference. At a different level, the interlocks information is also used by the DCS to operate the LV distribution system of the detector through the CMS electrical distribution system. Following the ESS interlocks, the DCS is capable of issuing commands to the upper layers of the electrical system to switch off the racks containing power-related equipment.

Fig. 3. Multi-layered set of actions on temperature sensors
In combination with the ESS, the CMS ECAL DCS configures a multilayer set of actions that go from soft preventive actions at the supervisory level, to hardwired interlocks at the DSS level, ensuring the safety of the ECAL detector in a comprehensive and robust way.

User Interface
The CMS ECAL DCS is used by shifters, operators and multiple experts. To enable a smooth migration between software versions, new user interfaces have been designed to access the new data structures while preserving the look and feel of the legacy system. The new interfaces use the same style, colours and graphical distribution, but including subtle modifications to complete the interface (e.g. classifying inputs and outputs or including a dual naming for hardware and logical names).

Testing & Validation
The ESS upgrade project undergoes multiple levels of testing and validations to ensure its correctness and reliability. The testing and validation phases are two of the most important steps to confirm that the system meets the specification and, to guarantee that it fulfils its intended purpose: to ensure the safe operation of the ECAL detector. Experts have designed a complete battery of tests to perform even before the migration was scheduled. Some parts of the testing and validation procedures will be explained in the following paragraphs.

Automatic software testing
A test setup was built in the ECAL labs using a mock-up of the CMS ECAL DCS and a test PLC. This setup has been used for the following purposes: 1) automatic testing of part of the PLC functionality 2) unit testing of the DCS software and, 3) integration tests into the CMS ECAL DCS environment. As mentioned before, the programming of the PLC using the CMS PLC framework is reduced to a matter of configuration. This means, that one of the main aspects covered by the testing procedures is to ensure the correctness of the PLC matrix. To this purpose, a unit test program was developed to extract and compare data from the PLC, verifying the correct configuration of the PLC: This includes the 702 individual threshold alarms for the thermistors, probe identifiers and the physical output addresses of the interlock system. Another automatic procedure was designed to evaluate part of the interlock mechanism. The PLC program was partially modified to provide an interface to inject simulated temperature values. Thanks to this mechanism, the DCS evaluated the correct configuration of the interlock groups. More than 1000 temperature combinations were simulated, confirming the correct behaviour of the interlock system and the DCS automatic actions on overheating conditions.

System validation
A complete validation procedure was established by the hardware experts, to prove the correct behaviour of the PLC after any changes of the program. This procedure focuses mainly in the identification of the different hardware components after the installation of the new readout hardware and the verification of the interlock system with the new PLC program.

Conclusion
The adoption of industrial equipment for the readout of environmental information does not only solve the limitations identified by the developers during the analysis phase. This modification also brings multiple other benefits such as standardization, provisioning and support from the vendor. The replacement of the PLC program using the CMS PLC framework enables the ECAL detector to comply with all the new requirements, while making the system simpler and easier to maintain. After this migration, the ESS is part of the pool of CMS PLC framework applications, sharing the knowledge and experiences about the system with the rest of the community. Many relevant details about the system have been disclosed, documented and brought to the supervisory layer, resulting in a more intelligent and self-descriptive system. A new DCS software has been designed to provide a better user experience, with a good balance between technical information and detectororiented constructs, while respecting the concept of the original design. After almost two years of planning and development, and thanks to the collaboration between experts from different teams, the new version of the ESS has been deployed in August 2019. During this time, many hardware and software components of the architecture have been replaced, by better and more efficient ones. The first part of the validations is currently on-going. In this context, the DCS has proven to be a useful tool to identify and understand the different configurations of the detector during the interactions with the ESS. A second validation stage will provide us with sufficient information, confirming the correctness of the system according to the specification and the safety of the CMS ECAL detector for the next operational period.