سيدعماد موسويان

عنوان

پياده‌سازي مدل چندوجهي تشخيص اشيا سه‌بعدي مبتني بر ادغام داده‌هاي دوربين-لايدار در خودروهاي خودران

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي برق- الكترونيك ـ ديجيتال

سال تحصيل

1400

تاريخ دفاع

1403/6/28

استاد راهنما

شهريار برادران شكوهي

استاد مشاور

دانشكده

مهندسي برق

چكيده

در ساليان اخير توجه به مسئله ماشين‌هاي خودران به‌خاطر بالابردن ايمني رانندگي و راحتي راننده، افزايش چشمگيري پيدا كرده است. ازآنجايي‌كه محيط اطراف ما سه‌بعدي مي‌باشد، پياده‌سازي يك سيستم هوشمند نياز به درك سه‌بعدي محيط دارد. به همين دليل يكي از بخش‌هاي مهم سيستم ادراك ماشين‌هاي خودران، تشخيص سه‌بعدي اشيا مي‌باشد كه به كمك داده‌هاي ابرنقاط سنسور لايدار، اندازه، موقعيت، جهت حركت و كلاس اشيا اطراف را تشخيص مي‌دهد. اين داده‌ها اطلاعات فضايي و سه‌بعدي غني دارند، اما هر چه اشيا از سنسور دورتر باشند به‌خاطر پراكندگي بيشتر ابرنقاط، كار تشخيص سخت خواهد بود؛ لذا از داده‌هاي دوربين كه تراكم اطلاعاتي بيشتري دارند به‌عنوان داده مكمل لايدار استفاده مي‌شود. ولي به دليل متفاوت‌بودن توزيع دامنه اطلاعاتي، فركانس كاري و محل قرارگيري سنسورها با مشكل عدم تطابق يك‌به‌يك داده‌هاي دوربين و لايدار روبرو هستيم كه فرايند ادغام را با چالش روبرو مي‌كند. به‌اين‌ترتيب رويكردهاي متفاوتي براي ادغام داده‌هاي دو سنسور معرفي شدند. در اين پژوهش باتوجه‌به موفقيت ابرنقاط مجازي توليد‌شده از نقشه عمق‌هاي متراكم شبكه تكميل عمق، مدل DVDFNet را پيشنهاد داديم. چون ابرنقاط مجازي نويز بالايي دارند، بنابراين براي سركوب نويز و استخراج ويژگي‌هاي محلي اشيا از دو منظر معنايي و هندسي، ماژول DLFE را ارائه داديم تا با بازنمايي دوبعدي ابرنقاط مجازي باعث شود نويزها در همسايگي هم قرار گيرند. همچنين به‌منظور تشخيص بهتر اشيا دور و كوچك، ماژول DCMAF را معرفي كرديم كه از سازوكار توجه در يك رويكرد مبتني بر تمايز بين وجهي استفاده مي‌كند و بر اساس مفيدبودن اطلاعات موجود در نقشه ويژگي نماي چشم پرنده، كار ادغام دو وجه را انجام مي‌دهد. در آخر با انجام آزمايش‌هايي بر روي مجموعه‌داده KITTI، تأثير ماژول‌هاي پيشنهادي معرفي شده را از نظر افزايش دقت تشخيص اشيا دور و كوچك در قياس با ساير مدل‌ها و رويكردها، ارزيابي كرديم. به‌طوري‌كه توانستيم در معيار APBEV و AP3D به ترتيب به‌دقت 76.16% و 71.54% در طول سطح دشواري معتدل سه كلاس ماشين، عابرپياده و دوچرخه‌سوار دست پيدا كنيم.

تاريخ ورود اطلاعات

1403/08/01

عنوان به انگليسي

Implementation of Multi-Modal 3D Object Detection based on Camera-Lidar Data Fusion in Autonomous Driving

تاريخ بهره برداري

9/18/2025 12:00:00 AM

دانشجوي وارد كننده اطلاعات

سيدعماد موسويان

Name: سيدعماد موسويان
Author: سيدعماد موسويان

چكيده به لاتين

In recent years interest in autonomous vehicles has increased significantly due to their potential to improve driving safety and comfort. The perception system of autonomous vehicles, which plays a critical role in accurately understanding objects and events in the environment, is responsible for prediction, planning, and decision-making to ensure the safe navigation of the vehicle in various driving scenarios. Since our environment is three-dimensional, understanding the 3D environment is essential for implementing an intelligent system. Therefore, 3D object detection is an important component of the perception system. By using LiDAR, which is rich in spatial and 3D information, it is possible to detect the size, position, orientation, and class of objects surrounding the autonomous vehicle. However, as objects move away from the sensor, LiDAR point clouds become sparse, making detection difficult. To address this issue, 2D images, that provide higher information density, are used as a complementary data source. However, differences in data distribution, operating frequency, and sensor placement result in misalignment between the camera and LiDAR, posing challenges for data fusion. To overcome this challenge, several approaches have been proposed to fuse data from the two sensors. In this research, we introduce the DVDFNet model, which builds on the success of generating virtual point clouds from dense depth maps produced by a depth completion network. However, virtual point clouds are often noisy. To suppress noise and extract local features of objects from both semantic and geometric perspectives, we propose the DLFE module. This module represents virtual point clouds by extending the receptive field to the 2D image space, which makes noise more neighborly and makes denoising much easier. In addition, to improve the detection of distant and small objects, we introduce the DCMAF module, which uses an attention mechanism in a cross-modality discrimination approach. This module uses the information in the bird's eye view feature map to fuse the two modalities effectively. Finally, through experiments on the KITTI dataset, we eva‎luate the impact of the proposed modules on improving the detection accuracy of distant and small objects compared to other models and approaches. We achieved accuracies of 76.16% and 71.54% on APBEV and AP3D metrics, respectively, on a moderate difficulty level across three classes of cars, pedestrians, and cyclists.

كليدواژه هاي فارسي

سيستم ادراك خودرو خودران , تشخيص سه‌بعدي اشيا , تكميل عمق , لايدار , ابرنقاط , ادغام مبتني بر توجه

كليدواژه هاي لاتين

Autonomous Vehicle Perception system , 3D object detection , depth completion , LiDAR , point cloud , Attention mechanism Fusion

Author

Emad Moosavian

SuperVisor

Dr. Shahriyar Baradaran Shokoohi

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=31368&Field=0&DTC=6