Topology-based description of the NCA cathode configurational space and an approach of its effective reduction

Modification of existing solid electrolyte and cathode materials is a topic of interest for theoreticians and experimentalists. In particular, it requires elucidation of the influence of dopants on the characteristics of the studying materials. For the reason of high complexity of the configurational space of doped/deintercalated systems, application of the computer modeling approaches is hindered, despite significant advances of computational facilities in last decades. In this study, we propose a scheme, which allows to reduce a set of structures of a modeled configurational space for the subsequent study by means of the time-consuming quantum chemistry methods. Application of the proposed approach is exemplified through the study of the configurational space of the commercial LiNi0.8Co0.15Al0.05O2 (NCA) cathode material approximant.


Introduction
Nowadays, the Li-ion batteries (LIBs) are of great importance in production of electronic devices, energy storage systems, electric vehicle manufacturing and many other industry realms [1,2]. In LIBs, the properties of cathode material are largely responsible for the most relevant battery characteristics (energy density, output voltage, cycling stability, operation safety, etc). The improvement in the LIBs' performance can be achieved either by utilization of new cathode compounds or by the improvement of the existing cathode materials. The latter can be done through the doping the pristine lattices in the existing cathode materials. As an example, such modification route resulted in the whole family of layered cathode materials derived from the lithium nickel oxide LiNiO 2 (LNO) by the substitution of Ni atoms by Mn, Co, Al, etc. These are NMC (Mn, Co), NCA (Co, Al), LNMO (Mn) materials with different amount of dopants that are already used in LIB manufacturing. Understanding of the structural changes caused by the dopant atoms is paramount in the pursuit for new cathode materials with advanced characteristics. Modeling of the structure and physical properties of crystals is currently affordable due to the development of different quantum-mechanical algorithms, among which density functional theory (DFT) approach is the most popular due to its high transferability. Recently, DFT calculations were employed in studying the ion transport processes, structural changes upon delithiation and doping effect on the characteristics of the cathode materials [3][4][5][6]. However, direct application of the modeling for a comprehensive study of the cathode materials features is still a complicated task because of high complexity of a disordered system' configurational space. This obstacle is manifested in the necessity to carry out the quantum-mechanical calculation of many thousands (up to millions) configurations belonging to the configurational space of a modeled crystal structure [7,8]. The purpose of the current study is to propose and to verify the scheme that will give an opportunity to perform quantum-mechanical calculation for reduced subsets of the complete configurational spaces of a modeled crystal structure. Application of the proposed scheme is exemplified through the study of the NCA cathode material configurational space that was explored using DFT modeling in our previous study [8].

Sampling of the NCA configurational space, DFT calculations and structural descriptors
The configurational space used in this study and the results of the DFT modeling are taken from the previous study [8]. In total 20760 independent doped configurations were generated from the 2x2x1 supercell of the parent LNO compound (Fig. 1). The composition of the generated NCA configurations corresponds to Li 1-β Ni 0.75 Co 0.167 Al 0.083 O 2 , which is quite close to the composition LiNi 0.8 Co 0.15 Al 0.05 O 2 of the commercial NCA material. In order to build a regression model, we applied a set of structural descriptors [8] for entries of the NCA configurational space that serve as independent variables of the model.

Ridge regression algorithm
The ridge regression as implemented in the scikit-learn library [10] was employed to evaluate the energy per modeled cell of NCA configurations and structural characteristics such as unit cell parameter c or Ni atomic charges. For each delithiation, an individual regression model was built using the abovementioned structural descriptors as regressors. The size of the training set was varied: 20%, 50% and 80% of the configurations at each delithiation level were used. The performance of the regression models built in the prediction of the aforementioned target variables was assessed by evaluation of the mean absolute error (MAE) and determination coefficient (R 2 ) estimators.

Prediction of NCA configurations energy
Expectedly, the quality of prediction of configuration energies depends on the number of configurations that were used during regression models training. The plots of the estimators of regression models are shown in Figure 2 (a, b). As can be seen from the dependencies, the predictions of the model trained on 50% and 80% sets of configurations for all delithiations differ negligibly. Further reduction of the training set down to 20% leads to subsequent decline in the prediction quality, which is clearly demonstrated by the changes of the R 2 values. Another point to be stressed is that the errors in prediction of the highenergy structures are generally larger that is visible in the plots of the E predicted vs. E DFT dependencies in Figure 2 (c, d). Fig. 2. (a, b) Estimators of the quality of the configuration energy (eV/cell) prediction made by the ridge regression models trained on 20% (red), 50% (yellow) and 80% (green) of configurations; (c, d) scatter plots E predicted vs. E DFT for the set of configurations that correspond to the delithiation β=0.5 obtained for different training set sizes.
Deviation of the points from the line corresponding to equality of E predicted and E DFT values becomes more pronounced with the increase of the configuration energy. Moreover, for the configurations of the highest energy the difference E predicted -E DFT is always negative. This can be attributed to the deficiencies of the used set of structural descriptors. Possibly, some additional descriptors are needed for the high-energy structures that are able to grasp their structural peculiarities.

Reduction of the modeled set
In the previous sections we have shown that the ridge regression models are able to predict variety of characteristics of NCA configurations fairly well. Besides, we have demonstrated 4 EPJ Web of Conferences 177, 02005 (2018) https://doi.org/10.1051/epjconf/201817702005 AYSS-2017 that the quality of parameter predictions depends on the size of the training set used to obtain the model, and it declines progressively as the training set shrinks. Taking that into account, it is possible to carry out the DFT calculations only for subsets of NCA configurations and then build regression model to predict the characteristics of a certain NCA configuration such as formation energy, c axis length, partial atomic charges, etc. The overall scheme proposed for the reduction of the DFT calculations set can be summarized in the block diagram depicted in Figure 4.

Conclusions
The machine-learning techniques have already been applied for similar tasks that require processing of large sets of structural information on Li superionic conductors [11] or singleand binary-component solids [12]. The scheme presented in this study could be of particular interest when modeling large configurational spaces. Almost 8-fold reduction of the DFT calculation set (2639 vs. 20760 structures) can be achieved at the expense of moderate errors in the forecast of a configuration characteristics. Thus, the presented combination of the topological analysis, DFT calculations and machine learning algorithms seems to be a promising tool for the comprehensive investigations of the existing materials for electrochemistry applications and forecasting new ones.