مليكا عطاالهي

عنوان

طراحي مسير بهينه براي ربات‌هاي چرخ‌دار زميني در محيط‌هاي ناشناخته با استفاده از يادگيري تقويتي و روش ميدان پتانسيل

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

كنترل

سال تحصيل

1397-99

تاريخ دفاع

1399/11/06

استاد راهنما

دكتر محمد فرخي

دانشكده

برق

چكيده

در اين پايان نامه، راهكاري براي طراحي مسير كه يكي از مهمترين چالشهاي رباتهاي چرخدار است، معرفي شده است. محيط عملياتي رباتها به صورت ناشناخته است. بنابراين واحد طراحي مسير بايد در هر گام، قادر به طراحي كوتاه ترين مسير تا هدف با توجه به اطلاعات محلي ربات و ديگر اعضاي گروه، به صورت بيدرنگ باشد. بنابراين منظور از مسير بهينه، كوتاه ترين مسير تا هدف است. روش ميدان پتانسيل ويژگي هايي همچون ساختار مبتني بر روشهاي بهنيه سازي براي حل مساله و توانايي يافتن كوتاه ترين مسير را دارد. اما مشكل اساسي آن وجود كمينه هاي محلي است. يادگيري Q توانايي يادگيري تنها با دريافت پاداش و جزا را دارد كه اين امر، ساختار حل مساله را ساده ميكند. همچنين يادگيري Q توانايي يادگيري و تصميم گيري به صورت بي درنگ را هم دارا مي باشد. بنابراين با توجه به ويژگيهاي هر يك از اين دو روش، يادگيري Q براي اجتناب از برخورد با موانع و روش ميدان پتانسيل براي جهتگيري و حركت به سمت هدف درنظر گرفته شده اند. براي رفع كند بودن فرايند يادگيري Q، از يادگيري Dyna-Q استفاده شده است. از مهمترين دلايل برخورد با موانع، نبود آموزش در شروع فرايند يادگيري، وجود حالت هاي تعريف نشده و مستقل بودن حالتها از يكديگر است. براي رفع اين مشكلات، در روش پيشنهادي تصميم گرفته شد كه ابتدا صلاحيت عمل انتخاب شده توسط يادگيري Q بررسي و درصورت اشتباه بودن عمل انتخاب شده، روش ميدان پتانسيل به عنوان ناظر، عمل مناسب را انتخاب كند و عمل مناسب به يادگيري Q آموزش داده و سپس اجرا شود. در اين پاياننامه، براي نزديك شدن به شرايط تضمين همگرايي يادگيري Q، ماتريسي تحت عنوان ماتريس ويژگي معرفي شدهاست كه امكان يادگيري حالت هاي تجربه نشده از حالتهاي تجربه شده را فراهم ميكند. در روش پيشنهادي، براي رفع مشكل كمينه ي محلي در روش ميدان پتانسيل، از مفهومي به نام مانع مجازي استفاده شده است تا پتانسيل موقعيت ربات به گونه اي تغيير يابد كه همواره مسير و جهتي مناسب براي خارج شدن از كمينه ي محلي ايجاد شود. بنابراين روشي تحت عنوان يادگيري تقويتي بهبوديافته با حضور ناظر روش ميدان پتانسيل تطبيقي، براي طراحي مسير رباتها ارائه شده است. همچنين در اين پاياننامه، از دو سيستم فازي براي نرمتر كردن حركت، در هر يك از رباتها استفاده شده است. نتايج به دست آمده براي روش ميدان پتانسيل تطبيقي، حاكي از آن است كه روش ارائه شده قادر به خارج شدن از كمينه ي محلي در محيط هايي با هدف و موانع ثابت و پويا است. يادگيري تقويتي بهبوديافته با حضور ناظر، حساسيتي در قبال حالتهاي تعريفنشده نشان نميدهد و در طي آموزش (اولين تا آخرين رخداد) هيچ برخوردي با مانع رخ نميدهد. تعداد درايه-هايي از جدول Q كه پر شدهاند، بهطور قابل توجهاي بيشتر از زماني است كه يادگيري Q بدون حضور ماتريس ويژگي استفاده شدهبود. در نتيجه، مشخص شد كه استفاده از ماتريس ويژگي، افزايش سرعت همگرايي و نزديكشدن به شرايط تضمين همگرايي در يادگيري تقويتي را در پي دارد.

تاريخ ورود اطلاعات

1399/11/18

عنوان به انگليسي

Optimal Path Planning for Ground Wheeled Robots in Unknown Environments Using Reinforcement Learning and Potential Field Method

تاريخ بهره برداري

1/25/2021 12:00:00 AM

دانشجوي وارد كننده اطلاعات

مليكا عطاالهي

Name: مليكا عطاالهي
Author: مليكا عطاالهي

چكيده به لاتين

In this thesis, a path planning method which is one of the most important challenges of wheeled robots, is introduced. It is assumed that the operating environment of the robots is unknown. Therefore, the path planner unit must be able to design the path immediately at each step according to their local information and that of the other members of the group. Therefore, the optimal path means the shortest path to the goal. The potential field has properties such as structure based on optimization methods to solve the problem and the ability to find the shortest path. However the main problem is the existence of local minima. The Q-learning method has the ability to learn only by receiving rewards and punishments, which simplifies the problem-solving structure. Moreover, the Q-learning method also has the ability to learn and make decisions instantly. Therefore, according to the characteristics of each of these two methods, the Q-learning is considered for collision avoidance with obstacles, while the potential field method is intended directing and moving the robot towards the target. In this thesis, the Dyna-Q learning method has been used to improve the Q-learning convergence rate. One of the most important reasons for collision with obstacles is the lack of training at the beginning of the learning process, the existence of undefined states, and the independence of the defined states from each other. To solve these problems, it was decided in the proposed method to first check the competence of the action selected by the Q-learning. After that, if the selected action is wrong, the potential field method selects the appropriate action as an observer. Then potential field method teaches the appropriate action of the Q-learning and performs it. In this thesis, in order to approach the conditions for ensuring convergence of the Q-learning, a matrix called “feature matrix” is introduced that allows learning unexperienced states from the experienced states. In the proposed method, to solve the problem of local minima in the potential field method, a concept called “virtual obstacle” is used to change the potential of the robot position in such a way that there is always a suitable path and direction to get out of the local minima. Therefore, a method called “improved reinforcement learning” with the presence of adaptive potential field has been proposed for the path planning of the robots. Moreover in this thesis, two fuzzy systems are used to smooth the movement of each robots. Using the PD controller, the performance of the path planning unit in an unknown environment with the presence of fixed and dynamic obstacles was investigated. The results obtained for the adaptive potential field indicate that the proposed method is able to get out the local minima in the environments with fixed and dynamic targets and obstacles. The improved reinforcement learning method with the presence of an observer does not show sensitivity to the undefined states and no collision with the obstacles occurs during the training (from the first to the last episode). In this way, the number of elements in the Q table that are updated is significantly higher than when the Q-learning was used without the feature matrix. As a result, it was found that using the feature matrix increases the rate of the convergence and approaches convergence guarantee conditions in the reinforcement learning.

كليدواژه هاي فارسي

گروه رباتهاي چرخدار زميني , مسير بهينه , طراحي مسير , محيطهاي ناشناخته , روش ميدان پتانسيل , يادگيري تقويتي

كليدواژه هاي لاتين

Ground Wheeled Robots Group , Optimal Path , - Path Planning , Unknown Environments , Artificial Potential Field , Reinforcement Learning

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=23290&Field=0&DTC=6