چكيده به لاتين
In recent years interest in autonomous vehicles has increased significantly due to their potential to improve driving safety and comfort. The perception system of autonomous vehicles, which plays a critical role in accurately understanding objects and events in the environment, is responsible for prediction, planning, and decision-making to ensure the safe navigation of the vehicle in various driving scenarios. Since our environment is three-dimensional, understanding the 3D environment is essential for implementing an intelligent system. Therefore, 3D object detection is an important component of the perception system. By using LiDAR, which is rich in spatial and 3D information, it is possible to detect the size, position, orientation, and class of objects surrounding the autonomous vehicle. However, as objects move away from the sensor, LiDAR point clouds become sparse, making detection difficult. To address this issue, 2D images, that provide higher information density, are used as a complementary data source. However, differences in data distribution, operating frequency, and sensor placement result in misalignment between the camera and LiDAR, posing challenges for data fusion. To overcome this challenge, several approaches have been proposed to fuse data from the two sensors. In this research, we introduce the DVDFNet model, which builds on the success of generating virtual point clouds from dense depth maps produced by a depth completion network. However, virtual point clouds are often noisy. To suppress noise and extract local features of objects from both semantic and geometric perspectives, we propose the DLFE module. This module represents virtual point clouds by extending the receptive field to the 2D image space, which makes noise more neighborly and makes denoising much easier. In addition, to improve the detection of distant and small objects, we introduce the DCMAF module, which uses an attention mechanism in a cross-modality discrimination approach. This module uses the information in the bird's eye view feature map to fuse the two modalities effectively. Finally, through experiments on the KITTI dataset, we evaluate the impact of the proposed modules on improving the detection accuracy of distant and small objects compared to other models and approaches. We achieved accuracies of 76.16% and 71.54% on APBEV and AP3D metrics, respectively, on a moderate difficulty level across three classes of cars, pedestrians, and cyclists.