Comparison of the Asynchronous Differential Evolution and JADE Minimization Algorithms

Thanks to its simple structure, Differential Evolution (DE) [1] is a widely used method to find the global minimum f ∗ = f ( x∗). It has few control parameters, but some of them, the population size Np and the crossover rate Cr, drastically change the performance of the algorithm. Moreover incompatible settings are efficient to solve different classes of problems, e.g. Cr = 0 for separable problems and Cr ≈ 1 for non-separable ones. Therefore recent studies were focused on modifications of DE, which automatically adapt the control parameters during minimization [2]. We will compare the adaptive JADE algorithm [3] to the Asynchronous Differential Evolution with Adaptive Correlation Matrix [4], we will disentangle contributions due to differences in algorithms.


Introduction
Global minimization of a real-valued function f , defined in the continuous parameter space Ω of dimension D, is a common mathematical problem Thanks to its simple structure, Differential Evolution (DE) [1] is a widely used method to find the global minimum f * = f ( x * ). It has few control parameters, but some of them, the population size N p and the crossover rate C r , drastically change the performance of the algorithm. Moreover incompatible settings are efficient to solve different classes of problems, e.g. C r = 0 for separable problems and C r ≈ 1 for non-separable ones. Therefore recent studies were focused on modifications of DE, which automatically adapt the control parameters during minimization [2]. We will compare the adaptive JADE algorithm [3] to the Asynchronous Differential Evolution with Adaptive Correlation Matrix [4], we will disentangle contributions due to differences in algorithms.

Asynchronous Differential Evolution
Asynchronous Differential Evolution (ADE) [5] is a steady state variant of the DE method, its general scheme is shown in Fig. 1. DE uses a population P of N p vectors x i to represent candidate solutions in the search domain. The initial population is formed by a uniform random sampling of each coordinate x i, j within requested initial boundaries [ x min , x max ].
The standard DE is a generational algorithm: mutation and crossover operations are performed for all population members, then as a result of selection DE switches to a next generation. But in a steady state algorithm evolutionary operations are imposed on a selected member of the population, thus a a e-mail: Mikhail

Choice of a Target Vector
The choice of a target vector is a feature which emerges as soon as we switch from a generational algorithm to a steady state variant. In this article we will pick a random member from a population as a target vector x i . A faster convergence rate can be tried through a choice of either of the worst population members, thus enforcing replacement of not-so-good candidates. While this choice is profitable for some optimization problems, we found that it reduces the dispersion within the population, and usually leads to a lower probability of convergence to the global minimum. We found that in the case of restart strategies (see Sec. 2.4) the overall performance of ADE weakly depends on a particular choice of the target vector.

Mutation
In DE a mutation vector v i is constructed by adding to a selected population member a scaled difference vector, which is formed by a simple difference between randomly picked vectors from the population. In this article we will analyze the following strategies: 'rand' [1]: (2) 'current-to-pbest' [3]: Here r and s denote random members of the population. The index q corresponds to a randomly selected x q ∈ P ∪ A, with a so-called archive A, which stores N p former population members recently discarded by selection. The index p is a random index within 0.1N p best candidates. All vectors in the right sides of Eqs. (2-3) are enforced to be distinct. The scale factor F i is sampled for each mutation according to a Cauchy distribution with the location parameter μ F and the scale parameter σ F = 0.1 [3]. Iff a trial vector u i is selected to replace a target vector x i (see Sec. 2.4), the location parameter is updated Here c F = 0.01 is called a learning rate to update the location parameter, L 2 ({F}) is a contraharmonic mean of a set of all scale factors associated with the current population.

Crossover
In DE the coordinates of the trial vector u i, j are picked either from the mutant vector v i or from the target vector x i -the so called crossover operation. We will compare the uniform crossover with the crossover rate C r , adapted by the JADE scheme [3], to a crossover based on an adaptive correlation matrix (ACM) [4]. The coordinates of the trial vector u i after uniform crossover are Here C r,i ∈ [0, 1] indicates an average proportion of coordinates selected from the mutant vector into the trial vector. To ensure distinct trial and target vectors, at least one coordinate j rand is taken from the mutant vector. In JADE the rate C r,i is generated for each crossover according to a normal distribution with the mean μ c and the standard deviation 0.1. The mean μ c is updated as a result of successful iterations as with a learning factor μ c = 0.01, {C r } is a mean over all crossover rates within the current population. While the above JADE scheme treats all coordinates uniformly, ADE with the Adaptive Correlation Matrix [4] uses information of pairwise correlations between parameters. The current population is used to calculate a sample correlation matrix S , its elements read Successful steps are used to cumulatively update an estimation of the correlation matrix -an adaptive correlation matrix C: The coefficient c = 0.01 is a learning rate for updating the correlation matrix. From the adaptive correlation matrix, the algorithm identifies a group of variables correlated to a selected variable m The above set of correlated variables {I m } defines a subspace Ω m in the search domain. All components of the mutant vector v i , the indices of which are in the set {I m }, are propagated into a trial vector u i , while other components are taken from the target vector x i :

Selection and Restart
The Differential Evolution uses a greedy algorithm for the selection: a trial vector u i will replace a target vector x i iff there is an improvement in corresponding objective function values. During successive iterations the algorithm analyzes spreads of population members in each coordinate Δx j and in the function values Δ f to avoid stagnation. If at least one of the spreads is too small Mathematical Modeling and Computational Physics 2015 an independent restart is initiated [6]. In this work we analyze two restart strategies. One, named '5D', uses N p = 5D as a constant population size. Another strategy selects N min p = 10 as an initial size of the population, at each restart the population is increased by a factor 2, if after inflation a population size exceeds 20D, an independent restart with the size N min p is enforced.

Numerical Tests
The performance of several variants of ADE-ACM and JADE (Tab. 1) are compared on the set of realparameter black-box optimization problems BBOB-2015 [7], which is a widely used test bench: more than 100 articles and algorithms have been benchmarked. The test bench contains 24 functions: separable, non-separable, weakly-structured, unimodal, multimodal, and/or ill-conditioned for dimensions 2, 3, 5, 10, 20 and 40. The performance is measured by the number of successful trials #succ, when an algorithm has reached the function values below f * + 10 −8 , and the expected running time ERT(Δ f * ) to reach values better than f * +Δ f * , which is calculated as a ratio of the sum of the evaluations number before the above target value has been reached over the number of successful trials. The maximal number of function evaluations is limited to 10 6 D.
The graphical representation of ERT for all 4 variants is shown in Fig. 2. As the dimension of the problem is increased, ADE-ACM usually performs better than JADE. To exclude differences due to restart procedures, we cite results for ADE-ACM variant '5D' with the fixed population size, which differs from JADE only by the crossover operator. The better performance by ADE-ACM is mainly due to the new crossover, which takes into account the correlations between variables. Convergence rates are presented in Table 2 for the dimension D = 20. ADE-ACM algorithm solves 17 of 24 functions with probability higher than 0.5 within the allocated number of function evaluations. If restarts are used, both 'current-to-pbest' and 'rand' strategies have similar performance. For two multimodal functions: Büche-Rastrigin ( f 4 ) and Schwefel ( f 20 ) ADE-ACM outperforms previously tested algorithms.

Conclusions
In this work we have compared the performance of the recently proposed minimization algorithm of Asynchronous Differential Evolution with Adaptive Correlation Matrix to the widely used JADE method. By using additional information about linear correlations between variables, which is learned during successful iterations, the new algorithm shows better convergence probabilities and faster convergence rates as the dimension D of the problem exceeds 10. The ADE-ACM can competitively solve a wide range of global optimization problems, both separable, non-separable or partially-separable thanks to the new type of crossover, based on the estimation of the correlation matrix. The new algorithm has a simple structure and is quasi parameter-free from the user's point of view.