Machine Learning with ROOT/TMVA

. ROOT provides, through TMVA, machine learning tools for data analysis at HEP experiments and beyond. We present recently included features in TMVA and the strategy for future developments in the diversiﬁed machine learning landscape. Focus is put on fast machine learning inference, which enables analysts to deploy their machine learning models rapidly on large scale datasets. The new developments are paired with newly designed C ++ and Python interfaces supporting modern C ++ paradigms and full interoperability in the Python ecosystem. We present as well a new deep learning implementation for convolutional neural network using the cuDNN library for GPU. We show benchmarking results in term of training time and inference time, when comparing with other machine learning libraries such as Keras / Tensorﬂow.


Interoperability with the machine learning ecosystem
Because of the recent explosion in research on machine-learning methods, the pace of novel developments has seen a rapid increase. This imposes a high work load on library maintainers implementing these methods, as they have to keep up with the fast development cycles. The requirements in terms of person power to stay up-to-date can only be managed by large technology companies, which differs significantly from the situation before the rise of modern neural network architectures.
This changes the focus of ROOT/TMVA for modern neural network architectures from providing the algorithms itself to being the glue between the third-party machine learning libraries and the software environment in HEP experiments. Interoperability with the machine learning ecosystem is achieved by supporting the common data interface for these packages, namely NumPy arrays in Python [14]. Since the data used in HEP analysis is commonly stored in ROOT files, the crucial feature for interoperability with machine learning packages is the conversion of this disk format to that of in-memory NumPy arrays. The functionality of the machine learning libraries is then fully accessible for analysis. ROOT implements this feature on top of the RDataFrame [15] infrastructure with the method AsNumpy, which allows the analyst to perform computational expensive preprocessing of the data in compiled C++ code and load only the required data to memory. See figure 1 for a code example which shows the loading of data from a ROOT file to memory and subsequently pushing the data to common Python based data analysis facilities such as Pandas [16]. The feature is available in ROOT since version 6.18. Moreover, we provide in the experimental Python bindings for ROOT the feature to write data from NumPy arrays to ROOT files with the factory function MakeNumpyDataFrame. More details can be found in [17].

Modernization of TMVA
Because the primary design decisions for TMVA were taken around 2005, the package is missing features expected by modern software. First, the interfaces do not seamlessly support common C++ data containers such as std::vector but requires handling of raw pointers to data in memory, which, for example, complicates ownership in modern C++. Second, the API does not ensure thread-safety, which is highly important in today's computing environments and for data analysis with steadily increasing dataset sizes. The new API design for TMVA follows concepts of the sklearn API [18] and strives for elements of functional programming such as pure functions with no internal state to support thread-safety and code correctness. Figure 2 shows an example workflow constructing a BDT from existing model parameters and the application on data in C++. The new RTensor class serves as a NumPylike container for multi-dimensional arrays in C++ as long as the C++ standard does not provide a comparable container. The introduction of such a container in the standard is under discussion in the study group 19 of the ISO C++ committee [19]. In Python, the API fully supports NumPy arrays as replacement for the RTensor class to allow seamless interoperability with the machine learning ecosystem.  The feature is in an experimental stage and available with the ROOT release 6.20.

Fast decision tree inference
Following the change of strategy for TMVA such as discussed in section 2, new developments focus less on the training of models but the integration of machine-learning in the data analysis workflow. Besides moving data from ROOT files to a readable format in memory, this includes the application of machine-learning models in data analysis workflows provided by ROOT. A crucial feature for the inference of such models on large datasets is the performance in terms of runtime. While thread-safety is important to parallelize efficiently the full workflow, a fast inference is key to speed up the data analysis further. Therefore, TMVA aims to provide fast inference facilities for commonly used machine-learning methods starting with BDTs.
The TMVA workflow is modularized to allow training of models externally, for example with XGBoost [20] for BDTs, and the subsequent application using the inference implementation of TMVA. For the BDT inference, we provide an inference engine, which is threadsafe, zero-copy and fully accessible in C++ and Python. For the implementation, we take care to ensure minimal latency for an efficient event-by-event inference since this is important in online systems like triggers or in branched data analysis workflows, which cannot gain from batch computation. Figure 4 shows this workflow using XGBoost for training and TMVA for the application in C++ and Python.  Figure 5 shows the runtime for XGBoost and two different BDT inference backends. Since ROOT comes with the C++ interpreter cling [21], we are able to just-in-time compile (jit) code at runtime, which is useful for the optimization of inference code to static runtime parameters such as the depth of the trees. TMVA provides two backends, one which compiles the inference code using cling and a second backend computing the predictions without jitting. Using jitting, TMVA is able to perform the inference up to six times faster than XG-Boost. We expect further improvements by parallelizing the batched inference on multiple threads, which is a target of future developments.
Full technical details can be found in [22]. The feature is in experimental stage and available with the ROOT release 6.20.

Fast neural networks
TMVA is investigating the implementation of a fast inference engine for neural networks. We have already shown [23] that a specialized implementation for dense neural networks can outperform commonly used setups like TensorFlow [8] interfaced by Keras [24] in terms of training and application time. Further investigations have been done by benchmarking the implementation of convolutional neural networks in TMVA. The backend in TMVA is using similar to TensorFlow the library CuDNN [25], which implements primitives for modern neural network architectures for GPU, though we are able to reduce the time spent for training and application for small batch sizes and architectures. Figure 6 shows that TensorFlow has an overhead for smaller computations and both implementations converge to the same runtime for larger batches when the processing time is dominated by CuDNN. We interpret these Figure 5. Benchmark of fast BDT inference in TMVA using model with 500 trees, maximum depth of three and ten input variables results to mean that the larger complexity of TensorFlow comes at the cost of an increased overhead for issuing small computations and that TensorFlow is optimized for large machine learning models such as studied in most of the recent literature.
The fast inference of smaller neural network models is interesting in production, for example for real-time applications such as triggers or applications in data analysis with reduced complexity. In these cases, the minimal latency implementation such as presented above has a large impact on the performance. Further, TMVA can provide full C++ support, which is usually the language of choice for the internals of experiment frameworks. The long-term support for such a feature is currently under investigation.

Outlook
ROOT/TMVA continues to invest into supporting analysts applying machine learning in data analysis at HEP experiments and beyond. Due to the diversified landscape in machine learn-ing, we adapt by shifting the focus of future developments towards interoperability with the growing ecosystem outside ROOT and fast inference of machine learning models. Interoperability is provided by the feature to efficiently load data from ROOT files to NumPy arrays, which satisfies the data interface of most machine learning software. Moreover, modernized interfaces for TMVA allow to interact seamlessly in Python and C++ with common data containers. New features for fast inference are studied and we show for models of boosted decision trees and convolutional neural networks a significant increase in performance for relevant tasks in HEP data analysis.