Effects of Numerical Precision on Simulation of Massive Spatial Data on GPU Based on Modified Planar Rotator Model

The present research builds on a recently proposed spatial prediction method for gridded data, based on a suitably modified planar rotator (MPR) spin model from statistical physics. This approach maps the measured data onto interacting spins and, exploiting spatial correlations between them, which are similar to those present in geostatistical data, predicts the data at unmeasured locations. Due to the short-range nature of the spin pair interactions in the MPR model, parallel implementation of the prediction algorithm on graphical processing units (GPUs), can achieve impressive speedups, compared to the CPU implementation. In this work we study the effects of reduced computing precision as well as GPU-based hardware intrinsic functions on the speedup and accuracy of the MPR-based prediction and explore which aspects of the simulation can potentially benefit the most from the reduced precision. It is found that, particularly for massive data sets, a thoughtful precision setting of the GPU implementation can significantly increase the computational effciency, while incurring little to no degradation in the prediction accuracy.


Introduction
Enormous amounts of Earth observation data collected by remote sensing techniques, such as geographical, natural resources, land use, and environmental remote sensing images, require highly efficient (preferably real-time) processing [1]. Such processing often involves the reconstruction of data which is missing due to various reasons, e.g. equipment malfunctions or gaps in the coverage of the targeted area that appear as a result of restricted satellite paths or bad weather conditions [2]. These massive data sets however, cannot be efficiently handled by the standard geostatistical methods, such as kriging [3]. The main drawbacks of such methods are high computational complexity, difficulties to automatize the algorithms to work without subjective user inputs and in the case of non-Gaussian data, preprocessing requirements [4].
An alternative approach inspired by spin models in statistical physics has been offered in [5], based on the so-called modified planar rotator (MPR) model. This approach maps the measured data onto spins situated on a grid and exploits spatial correlations between them, which turned out to be similar to those present in the geostatistical data, to predict the missing data points. The prediction procedure then consists of conditional Monte Carlo simulations with a single parameter to be inferred, called reduced temperature. The MPR method has been shown to be particularly competitive and effective for non-Gaussian data [5].
Furthermore, considering the fact that the spins in the MPR model only interact with their nearest neighbors, this method is suitable for parallelization and can be implemented on graphical processing units (GPUs), which posses far superior theoretical peak performance due to their inherently parallel architecture. In our very recent work [6], up to 500× speedup compared to the CPU implementation has been achieved.

Model and Methods
The Hamiltonian of the standard planar rotator described by Eq. 1 where S i , S j are neighboring spins at the sites i and j represented by unit vectors in the XY plane, φ i , φ j ∈ [0, 2π] are their turn angles and J is the exchange interaction parameter, can by modified to account for the spatial correlations typical for geostatistical data, simply by introducing the parameter q ∈ (0, 1/2] into the Hamiltonian, so it becomes It has been shown [7], that a spin system described by this modified Hamiltonian displays temperaturecontrolled short-ranged correlations, which are typical for geostatistical data. The temperature is the only simulation parameter and needs to be inferred from the incomplete data. Then, a conditional Monte Carlo simulation, in which only the spins at the prediction locations are allowed to change their states, is performed using the so-called hybrid algorithm [5]. By averaging over multiple configurations in equilibrium, one can obtain predictions of the missing data. The simulation codes in this study were written in the C++ language and parallelized on NVIDIA GPU GTX 980 using the Compute Unified Device Architecture (CUDA) framework. Details of the GPU implementation can be found in [6]. A large part of the achieved speedup in [6] is due to the fact, that the lowering of the arithmetic precision has far greater potential impact on the GPUs, compared to the CPUs. Below, we look into the effects of different precision setups on the efficiency and accuracy of the computations.
There are two main areas, which can potentially benefit from the lowering of the arithmetic precision. The spin variables and the system specific energy, defined as e = H /L 2 , where L is the linear size of the square grid, can both be represented as either 64-bit double precision (DP) or 32-bit single precision (SP) variables. Consequently, SP variables only require half of the memory needed by DP variables, which can considerably improve the speed of all read and write memory operations. The second area is the precision of the functions used during the course of simulations. In C++, it is possible to use precision-specific versions of common functions, such as trigonometric, exponential etc. SP functions sacrifice accuracy for decreased computation time. In addition, CUDA offers the so-called hardware-intrinsic versions of these functions, which utilize mapping to fewer hardware instructions, increasing speed at the cost of accuracy and different special case handling. These hardware intrinsic functions were used in all cases of the SP computations of functions. The above mentioned considerations give us eight possible configurations, from full DP, through mixed precision, where one variable is SP, the other DP and functions are either SP or DP, to full SP calculations.

Results and Conclusions
The simulations were performed on the synthetic Gaussian data Z ∼ N(m = 50, σ = 10) with exponential covariance given by where h is the Euclidean two-point distance, σ 2 is the variance, κ is the inverse autocorrelation length (set to κ = 0.2), on square grids with the side length L. The exponential covariance model is suitable for modeling of rough spatial processes, such as soil data [8]. 90 % of the data were removed at random points and stored for later comparison to the predicted values. Figure 1 shows, that lowering the arithmetic precision has a pronounced effect on the speed of the simulations. We record a relative speedup R X defined as the ratio of the GPU-time required by the precision setting X and DP, i.e., R X = t GPU X /t GPU DP . The speedup increases as larger portions of the simulations are ported to SP and it is further amplified by the increasing number of processed data points, reaching 6x speedup compared to the full DP for the largest considered grid size of L = 4096.
To evaluate the accuracy of the MPR-based prediction, we compare the predicted values, to the true ones and compute the average absolute error (AAE), defined as where ǫ(r p ) = Z(r p ) −Ẑ(r p ) is the difference between the true value Z(r p ) and the predicted valuê Z(r p ) at the site r p and P = 0.9L 2 is the number of prediction sites. For each complete data set we To evaluate, the behavior of the prediction error with the lowering of numerical precision, we compute the relative change of MAAE compared to its value obtained from full DP simulations in %, as ∆MAAE X = (MAAE X /MAAE DP − 1)100%, where X = 1, . . . , 7, and plot the results in Figure 2. The bars are sorted by the increasing speedup from Figure 1 with the slowest on the left and the fastest on the right. We note, that the increase of the prediction error is only of the order of 10 −4 % and that there is no trend of increasing the error with lowering precision. Thus, we conclude there is no practical reason to use any degree of mixed precision, or full DP, since full SP with hardware intrinsic functions provides far superior simulation performance with negligible error increase. We intend to extend this assessment by further lowering of the arithmetic precision to the 16-bit half precision variables and functions, which are available on the latest NVIDIA GPUs, in expectation of further considerable increase of the computational speedup over the CPU implementation.