چكيده به لاتين
Inertial Navigation System (INS) is one of the safest navigation systems. This system determines the position at any moment without any telecommunication connection and only by measuring the acceleration and angular velocity of the object. The disadvantage of this system is its increasing exponential error over time due to the bias and drift of the accelerometer and gyroscope. This error causes significant deviation of the moving object over long periods of time. Auxiliary systems are used to solve this problem and correct the information of the INS. vision navigation system is another option for correcting INS information. These systems are inexpensive, small in size and practical. As a result, in order to increase the accuracy and reliability of the system while keeping the price of the necessary hardware low, the use of both types of systems is recommended. This system uses the images taken from the camera connected to the device to locate. In general, two approaches to integrating information are considered. In the first approach, the information is integrated by the Kalman filter. The process of applying the integrated filter is such that during the movement, the position of the device is based on inertial data, according to the data collection frequency defined for the camera, it captures an image of the environment in its field of view. This image and the image of the previous shooting step are applied to the SIFT or SURF algorithm. The algorithm reports the common points of the two images (the resulting points are screened by the Ransac algorithm). These points are placed in the Kalman filter measurement equation, and according to the filter relationships, the IMU and image information are fused and the state variables are corrected. In the second approach, the camera and inertial sensor information are fused by deep neural networks, in which the image is fed to the input of the convolutional neural network (CNN) to extract the features of the image. Then, communicated using a Long short-term memory (LSTM) network. Time is extracted between picture frames. Various visual inertial methods have been proposed. Traditional methods are not resistant to dynamic and changing lighting environments and take a long time to adjust manually. In learning methods, using deep neural networks, the problem is examined. Learning methods require a dataset with a lot of data, so that the training and testing data of the network are similar. For this purpose, a dataset was designed and provided in Gazebo environment. Due to the poor performance of the FlowNet network in pos estimation, a propose network based on VGG16 was introduced. The proposed network is comparable to the state-of-the-art methods in terms of simplicity, number of learning and non-learning parameters, as well as the need for the number of training data. The proposed method was compared with the state-of-the-art method and the mean squared error was improved from 4.99 to 0.63.