فاطمه مهدوي گلميشه

عنوان

كنترل ربات‌هاي همكار غيرهمگن در محيط ناشناخته با قيد عدم برخورد مبتني بر يادگيري تقويتي ايمن

مقطع تحصيلي

دكتري تخصصي (PhD)

رشته تحصيلي

مهندسي برق- كنترل

سال تحصيل

1397

تاريخ دفاع

1403/9/28

استاد راهنما

سعيد شمقدري

استاد مشاور

دانشكده

مهندسي برق

چكيده

در اين رساله، هدف طراحي كنترل آرايش گروهي ايمن و مقاوم براي ربات‌هاي همكار غيرهمگن در محيط ناشناخته و در قالب يك سيستم چندعاملي است. از آنجا كه براي دست‌يابي به عملكرد بهتر، عامل‌ها نياز به مشاهده نتيجه اعمال خود دارند، استفاده از الگوريتم يادگيري تقويتي چندعاملي پيشنهاد مي‌شود. در اين پژوهش، ديناميك عامل‌ها غيرخطي و افاين نسبت به ورودي فرض شده است. به‌علاوه، براي اين‌كه در طراحي كنترل‌كننده بهينه نياز به اطلاعاتي دقيق درباره ديناميك سيستم نداشته باشيم، الگوريتم يادگيري تقويتي مستقل از مدل مورد استفاده قرار گرفت تا طراحي فيدبك حالت بهينه، تنها با استفاده از داده‌هاي جمع‌آوري شده انجام پذيرد. در گام اول، با فرض شناخته‌شده بودن محيط براي عامل‌ها و با ادغام توابع مانع محلي و الگوريتم يادگيري تقويتي چندعاملي، مسئله بهينه‌سازي مقيد به مسئله بهينه‌سازي بدون قيد تبديل شده است. از اين طريق ورودي كنترل عامل‌هاي پيرو هم‌زمان و با استفاده از الگوريتم يادگيري تقويتي چندعاملي ايمن پيشنهادي به‌دست آمده است. در اين مرحله از رساله، تحليل پايداري و تضمين ايمني انجام گرفته است. در گام دوم و پس از طي مراحل فوق براي محيط شناخته‌شده، با پيشنهاد رويكرد چندلايه‌اي به حل اين مسئله در محيط ناشناخته دو بعدي با موانع محدب پرداخته شده است. اين رويكرد از سه لايه اصلي تشكيل شده و هر لايه با دريافت اطلاعات از لايه‌هاي پيشين، مسئوليت خود را انجام مي‌دهد. به‌طوري‌كه در لايه اول با استفاده از مفهوم تابع مانع به طراحي مسير ايمن پيشرو پرداخته‌ايم. لايه دوم وظيفه طراحي پارامترهاي بهينه براي آرايش گروهي ايمن را برعهده دارد. در نهايت، استفاده از الگوريتم يادگيري تقويتي چندعاملي در لايه سوم به طراحي كنترل‌كننده بهينه براي عامل‌هاي پيرو منجر شده است. در گام آخر نيز رويكرد لايه‌اي پيشنهادي در گام دوم به سيستم داراي اغتشاش و براي انجام عمليات در محيط سه بعدي تعميم يافته است. همچنين براي بهبود عملكرد رديابي در لايه سوم، از الگوريتم يادگيري تقويتي معكوس استفاده شده است. بدين ترتيب مسئله كنترل بهينه، ايمن و مقاوم آرايش گروهي براي ربات‌هاي همكار غيرخطي و غيرهمگن بدون نياز به اطلاعاتي از پارامترهاي ديناميك سيستم براي همه عامل‌ها و بدون نياز به اطلاعاتي از محيط و تابع هزينه براي عامل‌هاي پيرو در محيط سه بعدي حل گشته است. براي ارزيابي رويكردهاي پيشنهادي و مشاهده نتايج عددي حاصل از آن‌ها، سيستم‌هاي چندعاملي متشكل از ربات‌هاي پرنده و ربات‌هاي شناور به‌عنوان مثال شبيه‌سازي در نظر گرفته شده‌اند.

تاريخ ورود اطلاعات

1404/08/01

عنوان به انگليسي

Heterogeneous Cooperative Robots Control in an Unknown Environment with Collision Avoidance Constraints based on Safe Reinforcement Learning

تاريخ بهره برداري

12/19/2025 12:00:00 AM

دانشجوي وارد كننده اطلاعات

فاطمه مهدوي گلميشه

Name: فاطمه مهدوي گلميشه
Author: فاطمه مهدوي گلميشه

چكيده به لاتين

This thesis aims to design safe an‎d robust formation control for heterogeneous cooperative robots in an unknown environment as a multi-agent system. Since agents need to see the results of their actions to achieve better performance, multi-agent reinforcement learning is suggested. In this research, the dynamics of agents were assumed to be nonlinear an‎d affine to the input. Additionally, we used model-free reinforcement learning, so detailed information about the system dynamics was not needed to design the optimal control. Therefore, the optimal state feedback design could only be determined based on the collected data. In the first step, by assuming that the environment is known for the agents an‎d by integrating the local barrier functions an‎d the multi-agent reinforcement learning algorithm, the constrained optimization problem became an unconstrained optimization problem. In this way, the control inputs of the follower agents were obtained simultaneously using the proposed safe multi-agent reinforcement learning algorithm. Stability analysis an‎d safety assurance have been done in this thesis section. In the second step, after completing the abovementioned steps for the known environment, we proposed a multilayer approach to tackle this problem in an unknown two-dimensional environment with convex obstacles. This approach comprised three main layers, each carrying out its responsibility by receiving information from the preceding layers. Therefore, we used the concept of barrier function to design the leader’s safe path in the first layer. The second layer was responsible for designing the optimal parameters for a safe formation. Finally, using the multi-agent reinforcement learning algorithm in the third layer led to designing the optimal control for the follower agents. Finally, the proposed layered approach in the second step was extended to the disturbed system to perform operations in the three-dimensional environment. In addition, the inverse reinforcement learning algorithm was used to improve the tracking performance in the third layer. As a result, the problem of optimal, safe, an‎d robust formation control for nonlinear an‎d heterogeneous cooperative robots was solved without the need for information about all agents’ dynamic parameters an‎d without the need for information about the environment an‎d the cost function for followers in the three-dimensional environment. Multi-agent systems consisting of UAVs an‎d USVs have been considered simulation examples to eva‎luate the proposed approaches an‎d observe their numerical results.

كليدواژه هاي فارسي

يادگيري تقويتي ايمن , تابع مانع , سيستم چند عاملي غير خطي و غير همگن , كنترل بهينه , يادگيري تقويتي معكوس , كنترل 𝐻∞ , محيط ناشناخته

كليدواژه هاي لاتين

Barrier function, convex obstacles, formation, 𝐻∞ control, inverse optimal control, inverse reinforcement learning, model-free reinforcement learning, nonlinear an‎d heterogeneous multi-agent system, optimal control, robotics, robust reinforcement learning, safe reinforcement learning, unknown environment

Author

Fateme Mahdavi

SuperVisor

Dr. Shamaghdari

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=33853&Field=0&DTC=6