Automation and Testing for Simplified Software Deployment

Deployment André Sailer, Marko Petric (CERN) On Behalf of the CLICdp Collaboration Computing in High Energy Physics, July 9–13, 2018, Sofia, Bulgaria Code Quality Assurance and Monitoring GitHub, GitLab, or Jenkins offer continuous integration for pull requests or repeated checks of software. GitHub/Travis/Coverity Even though it is entirely for free for open source software one still has enough possibilities to test complex software. Compile/Test/Check I GitHub and Travis are most easy to use as they do not require any in-house deployments I Configured with yaml file I Compilers and dependencies can be provided with CVMFS (e.g., sft.cern.ch) mounted in to a docker container (e.g., clicdp/slc6-build) hosted on docker hub wget h t t p s : // e c s f t . c e rn . ch/ d i s t / cvmf s / cvmfs−r e l e a s e / cvmfs −r e l e a s e− l a t e s t a l l . d eb sudo dpkg − i cvmfs−r e l e a s e− l a t e s t a l l . deb sudo apt−get update sudo apt−get i n s t a l l c vmfs cvmf s−con f i g−d e f a u l t rm −f cvmfs− r e l e a s e− l a t e s t a l l . de b wget h t t p s : // l cd−data . web . c e rn . ch/ l cd−data /CernVM/ d e f a u l t . l o c a l sudo mkdi r −p / e t c / cv mfs sudo mv d e f a u l t . l o c a l / e t c / c vmfs / d e f a u l t . l o c a l sudo / e t c / i n i t . d/ a u t o f s s top sudo c vm f s c o n f i g s e tup sudo mkdi r −p / cvmfs / c l i c d p . c e rn . ch sudo moun t −t cvmfs c l i c d p . c e rn . ch / cvmf s / c l i c d p . c e rn . ch l s / cvmfs / c l i c d p . c e rn . ch docke r ru n − i t −−name C I c o n t a i n e r \ −v $PKGDIR : / Package \ −e COMPILER= $COMPILER \ −v / cvmfs / c l i c d p . c e rn . ch : / cvmf s / c l i c d p . c e rn . ch \ −d c l i c d p / s l c 6−b u i l d / b i n / bash Static Analysis The Coverity static code analysis can be run via Travis


Introduction
Sophisticated simulation and reconstruction software is needed to address the detector and physics issues for the future Compact Linear Collider (CLIC), a high-energy electronpositron accelerator [1]. Our software suite is developed in the linear collider community and consists of: detector geometry description using DD4hep [2][3][4] which is based on Geant [5,6] and Root [7]; the iLCSoft framework for reconstruction and analysis based on Marlin [8]; the event data model and persistency format LCIO [9,10]; and the distributed workload and data management tool iLCDirac [11] based on the DIRAC grid middleware [12]. The simulation and reconstruction software is used to perform detector optimisation [13] and physics studies [14,15].
To preserve our agile and pragmatic approach to software development without sacrificing maintainability and code quality, we rely on a number of tools to automate necessary but tedious tasks as much as possible. Code review is an irreplaceable ingredient of code quality assurance, but many aspects can be automated before someone has to take a look at new developments or improvements. Unit-testing, formatting and static analysis can be performed automatically and thus remove some of the burden from the code reviewer. Further automation is provided for the creation of release notes and the creation and deployment of software releases. In the following sections we describe some of the tools that we have adopted into our workflow that have significantly improved our process of software development. First we describe tools related to the GitHub environment, which are most straightforward and free to use for anyone working with open source software. Then we describe tools related to the GitLab instance available at CERN, and finally some plug-ins available for Jenkins.

Continuous Integration with GitHub and Integrated Tools
The commonly developed and maintained software is hosted on GitHub for ease of distribution and developer permission management. Continuous integration (CI) can be easily set up for any public project on GitHub with various services integrated with that platform. We have added the use of Travis-CI to the projects we maintain or contribute to: the packages that make up iLCSoft and DD4hep, which are based on C++, and DIRAC [12] which is based on Python. The Travis-CI can run the configured tasks for each push, pull request, or as a weekly cron job.

CI for Projects without Dependencies
The simplest use case occurs for projects without, or only with easily available, dependencies. An example is the Python-based DIRAC software. Listing 1 shows part of the configuration to execute the CI tasks. In lines 2-4 the requirements are installed with the standard pip program. This is sufficient to run the unit-tests discovered and executed by py.test (line 9). In lines 11-12 the documentation is compiled. If warnings occur during the compilation of the documentations the build is marked as failed, which, for example, ensures that all files are referred to inside the documentation 1 . Line 14 contains a command to run the pylint static code analysis, which is configured to look only for errors, for example the use of undefined variables or functions.
Lines 15-19 verify that the formatting of the modified lines in a pull request conform to the DIRAC conventions. A strict enforcement of coding style was only recently added, therefore large parts of the code base do not yet conform to the style conventions. By only verifying modified lines of the source code we can enforce the conventions for new files without having to format the entire code base, which would affect the information of the revision history for almost every line of code.
The complete Travis-CI configuration and the customised commands it runs can be found in the DIRAC GitHub repository https://github.com/DIRACGrid/DIRAC in the .travis.yml file and the travis-ci.d folder. A similar approach to run continuous integration tasks can be adopted by other projects if the installation is easy and fast and no infrastructure needs to be provided.

CI with CVMFS to Provide Dependencies
For the continuous compilation and testing of the C++ based software, for example DD4hep, we first need to provide its dependencies, which include Geant and Root and for our use case also parts of iLCSoft. To access the dependencies, we use the docker service in Travis-CI and also install CVMFS [16]. This is shown in Listing 2. In lines 1-12 the CVMFS packages are downloaded, installed, and configured on the virtual machine and finally the clicdp.cern.ch CVMFS repository is mounted. In lines 14-18, the docker container, that runs the compilation and unit tests, is executed.
As we can provide any dependency via CVMFS we can quickly and efficiently compile and test all of our C++ based software packages on GitHub for each pull request.

Static Code Analysis with Coverity
Another useful tool that is available free to open source projects hosted on GitHub is the coverity static code analysis [17]. After registering the project with the service, the analysis can be run via a cron job in the Travis-CI system. Listing 3 shows how the analysis result, which is run parallel to a compilation in Travis-CI, is submitted to the coverity system 1 wget https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest_all.deb  Figure 1 shows the number of outstanding and the cumulative number of fixed defects in DD4hep over time. Whenever a new defect is found, an email is sent to the members of the project, and usually significant defects are quickly fixed, as can be seen by the bump in the outstanding defects in March and April of 2018, shown in Figure 1.

Continuous Integration with GitLab
The continuous integration tasks running on GitHub integrated services can also be run in a similar fashion on the CERN GitLab instance, however the access cannot be as easily obtained as on GitHub. One advantage of using an in-house deployed tool is that tasks requiring privileges can be executed. For our iLCDirac [11] extension of DIRAC we run the same checks as for DIRAC, but in addition we can automatically deploy the documentation as a website, and publish the client installation to our CVMFS repository.

Continuous Deployment to CVMFS with GitLab
To deploy our software releases and required dependencies we are using a ssh based gitlab-runner to connect to the CVMFS stratum 0. Many of our projects are using the same runner, but as we do not allow the execution of jobs in parallel, GitLab-CI is responsible for scheduling the tasks one after the other. In case the same project has more than one active pipeline to publish to CVMFS any redundant pipeline not yet in the deploy stage will be cancelled. This mostly happens for the continuous deployment of the HEAD installations of iLCSoft, when multiple packages trigger a new build of the software suite simultaneously.

Code Quality Monitoring with JenkinsCI
While the GitLab-CI is easy to configure and use, and allows pass-fail level checks for pull requests, the JenkinsCI software allows one to install a wide variety of plug-ins that can be used to monitor the code quality over time.
While our goal is to remove all compiler warnings -and once this is achieved for one of our packages it will be enforced with -Werror in CI compilations -the Warnings plug-in for Jenkins provides an overview of the existing warnings, and allows one to browse the source code where the warnings occur. Similar functionality is provided by the clang scan-build plug-in, which allows one to monitor the reports provided by the clang scan-build tool. The final plug-in we make use of is the Valgrind plug-in for the valgrind mem-check [18] tool to find memory leaks or errors. These three plug-ins also offer a graphical representation. Figure 2 shows the trend of the warnings (left), bugs (middle), and issues affecting memory (right). The valgrind plug-in allows one in addition to set thresholds of acceptable number of lost bytes. If these thresholds are exceeded, the build is marked as unstable or failed. The nightly running and analysis of the valgrind output greatly reduces the burden on the release manager and decreases the danger of releasing faulty software. As can be seen in the right graph of Figure 2, the spike in definitely lost memory was identified and fixed in the following day.

Leveraging APIs
Both GitHub and GitLab offer programmable access to their functionality. We use the APIs to access the pull requests and their content or comments, and to create new releases.

Release Notes
Release notes should contain the new developments for new versions of the software. To spread the burden of writing release notes and for simplification we moved the drafting of release notes to the comments accompanying each pull request. Each pull request should contain a brief statement explaining its new features or bugfixes. The release notes are written between BeginReleasenotes EndreleaseNotes and parsed by a script accessing the information via the APIs. All pull requests that occurred between the last tag on a given branch are selected and parsed. Depending on the preferences of the maintainers the release notes can be sorted, for example by pull requests or sub-system. An example of a script using the GitHub API can be found in the DIRAC repository 2 and a script using the GitLab API in the iLCDirac repository 3 .
This approach has the advantage of directly asking contributors to add release notes to their pull requests. It also avoids merge conflicts, that would be almost guaranteed if every contributor were to modify a release notes file by themselves.
The output from the collation of the release notes can be further formatted, manually committed into a release notes file, or used in the meta data of tags made in GitHub or GitLab. It is also possible to automate the manual steps, as described in the next section.

Creation of New Software Releases
For the complex releases of iLCSoft which contains more than 30 separate packages, we create a tagger script 4 , that uses the GitHub API to first create release notes for each package, directly commits the release notes into the project and then makes a tag with an incremented version. If no modifications have been done in the package with respect to a prior version no new tag is created. While the specific implementation makes use of conventions for iLCSoft the general idea can be adapted to other projects.

Conclusions
The combination of continuous integration and continuous deployment has immensely sped up the round trip time from changes in our detector simulation or improvements in the reconstruction algorithms to large scale validation samples. The automation of the testing, validation, and deployment increased the reliability of the release procedure and of the release software product considerably, while at the same time increasing the productivity of the librarians.

Disclaimer
All product and company names are trademarks or registered trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.