Modified Depth-Map Inpainting Method Using the Neural Network

This paper proposes a method for reconstructing a depth map obtained using a stereo pair image. The proposed approach is based on a geometric model for the synthesis of patches. The entire image is preliminarily divided into blocks of different size, where large blocks are used to restore homogeneous areas, and small blocks are used to restore details of the image structure. Lost pixels are recovered by copying the pixel values from the source based on the similarity criterion. We used a trained neural network to select the “best like” patch. Experimental results show that the proposed method gives better results than other modern methods, both in subjective and objective measurements for reconstructing a depth map.


Introduction
Assessing the true values of image pixels is necessary in most digital image processing tasks. This problem is especially relevant for the automatic processing of images obtained in photosensitive matrices in digital photo and video cameras and machine vision systems. Machine vision is part of information technology that is directly related to robotics. The field of robotics is becoming increasingly important in our society. Today, robots can be found in almost all aspects of our daily lives. When using robots in industry, the relevant task of determining the distance from the end effector to the surfaces that it processes. To perform movements without crashing into obstacles, the robot must see the environment. In robotics, to determine the distance from the robot to the object, methods are required that allow for a minimum time to determine the distance with acceptable accuracy. A similar method is to build a depth map by distance. The depth map can be obtained using a special depth camera, or a stereo pair of images. If high accuracy is not required or a compromise between price and quality is required, a depth map is constructed using a pair of stereo images. This repeats the principle of binocular vision in humans and eliminates the use of numerous and expensive sensors, radars or lidars. A depth map by distance can be built using two cameras, which, taking a photo, create a stereo pair. The system that allows you to build a depth map by distance based on stereo images includes a pair of cameras set on one line, which is parallel to the image plane, with known distances between cameras and focal length. curacy, quality, and map creation time. In most cases, when constructing a depth map, lost areas appear on it that interfere with distinguishing one object from another. This problem can be solved using image reconstruction methods, restoring lost areas on the depth map, making it more accurate.
Simplified methods for reconstructing pixel values of images can be divided into the following groups: 1) Methods based on the solution of partial differential equations.
2) Methods based on orthogonal transformations.
3) Methods based on texture synthesis. 4) Methods based on the neural network An analysis of the existing processing methods shows that the area of their use, in the context of a limited amount of information about the components of the processed process, is extremely limited. The use of methods for reconstructing pixel values of images based on the solution of partial differential equations leads to blurring of sharp changes in brightness and contours and requires a priori information to select the parameters of the methods and minimize the functional. The inability to restore the texture of images and curved contours limits the use of these methods, which are mainly applicable when removing scratches and small defects on the image structure. To use methods based on orthogonal transformations, a priori information is required to select a threshold value, an orthogonal basis, and a block size of a spectral representation. It should also be noted that these methods lead to blurring of the texture and structure when recovering large areas with lost pixels, and a large number of iterations leads to significant computational costs. The use of methods based on texture synthesis requires a priori information about the size and shape of the restoration region and the geometric properties of the image to select the parameters of the methods. This paper presents a method for reconstructing a depth map based on the operation of a neural network. The main stages of the algorithm are presented in Fig. 1. The idea underlying the construction of a depth map using a stereopair is that for each point on one image a pair of points is searched for it on another. For a pair of such points, you can perform triangulation and determine the coordinates of the pre-image already in threedimensional space. Knowing these coordinates, the depth is calculated as the distance to the camera plane.
The search for paired points occurs on the epipolar line. Accordingly, to simplify the search, the images are aligned so that all the epipolar lines are parallel to the sides of both images.
For each pixel of the left picture with coordinates (x0, y0), a pixel is searched in the right picture. In this case, it is assumed that the pixel in the right picture should have coordinates (x0 -d, y0), where d is a quantity called disparity. The search for the corresponding pixel is performed by calculating the maximum of the response function, which can be, for example, the correlation of neighborhoods of pixels. The result is a disparity map. Actually, the depth values are inversely proportional to the pixel offset. If you use symbols from the left half of the Fig. 2, the relationship between disparity and depth can be expressed in the following way: Due to the inverse relationship of depth and displacement, the resolution of stereo vision systems that work on the basis of this method is better at close distances, and worse at far distances. The depth map can be seen in the Fig. 3.

Fig. 3. Depth map with defects.
Next, the resulting depth map is processed by the reconstruction method using a neural network.

Reconstruction method
At the first step, for each pixel of the boundary j i S , δ using the inversion method the shape of the domain for similarity search is adaptively determined by combining two adjacent homogeneous sub regions in the direction of the maximum gradient [2,5,7].
The second step calculates the priority value ) ( S P δ for each pixel value of the boundary, which consists of two factors [3,9]: , where large blocks are used to reconstruct homogeneous areas, and small blocks are used to reconstruct structure image details (Fig. 4). The pixel values in the region η adjacent to the pixel with the highest priority p are reconstructed by averaging the corresponding pixels from chosen areas ) ( h q ψ in the area of available pixels S using a neural network, in particular, a multi-layer perceptron [10].
The confidence coefficient С for restored pixels is assigned to the current value After that, the procedure of priority correction and search of similar areas with subsequent replacement is repeated.

Neural network
In this work, a neural network of direct signal propagation, which was trained using the algorithm for back propagation of the error was developed.
The activation function which is used in this network is the sigmoidal nonlinear function (sigmoidal nonlinearity), namely the hyperbolic tangent function (5) [6,10].  At the training stage of the neural network, preprepared data was fed to the input: a blocks with random coordinates 3 3× , 5 5 × , 7 7 × ,..., 21 21× , in size was allocated to the image, then the central pixel was removed, and five most similar blocks were found on the whole image comparing them by MSE.
Then the procedure was repeated on 35 images, and 100,000 blocks were obtained, they were used as a training sequence for this network.
This network contains three layers [10]: the first layer contains 20 neurons, the second layer also contains 20 neurons, the third layer contains ten neurons (Fig. 5). The network created ten inputs, 5 of them were fed only the central pixels in the blocks found, and the other five inputs fed the MSE of these blocks. As an output, there were the central pixels of the original blocks. Figure 6 shows an example of the depth map reconstruction by the proposed method (a -images obtained from the left and right cameras, b -depth map, c -reconstructed depth map). The proposed method allows to correctly restore borders without blur. a) image received from the right and left camera b) depth map c) reconstructed depth map   Analysis of the processing results shows that the proposed method allows you to correctly restore the details and background on the image, and there is also the absence of artifacts when restoring lost pixels. It should be noted that the proposed method does not blur the texture and structure when restoring large areas with