سيد مهدي موسوي

عنوان

كنترل ارتعاش و رديابي مسير بهينه غيرخطي ربات بازويي انعطاف‌پذير با استفاده از يادگيري تقويتي

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي برق

سال تحصيل

1397

تاريخ دفاع

1401/6/30

استاد راهنما

دكتر سيد مجيد اسماعيل زاده

استاد مشاور

دكتر سيد مجيد اسماعيل زاده

دانشكده

مهندسي برق

چكيده

در سال‌هاي اخير كنترل تطبيقي پيشرفته مرز تحقياتي رباتيك و هوش مصنوعي مي‌باشد، از طرفي يادگيري عميق پلي بين كنترل بهينه و كنترل تطبيقي است. ربات‌هاي انعطاف‌پذير در سال‌هاي اخير تا به امروز، به دليل كاربردهايي همچون صنايع فضايي، پزشكي، زيست پزشكي، عمليات نجات و ويژگي‌هايي همچون، سرعت عملياتي بالا در كنار وزن كم و مصرف انرژي پايين مورد توجه بسياري از پژوهشگران بوده‌است. اصلي‌ترين هدف در كنترل ربات بازوي انعطاف‌پذير ميرايي نوسان‌هاي بازوي انعطاف‌پذير مي‌باشد، اين لرزش‌ها به دليل خاصيت ارتجاعي بازوي ربات‌هاي انعطاف‌پذير است، هدف ديگركنترل سيستم ربات بازويي انعطاف‌پذير با دو بازو دستيابي به موقعيت‌يابي دقيق مي‌باشد. چالش اصلي در پياده‌سازي كنترل‌كننده‌ بر روي ربات بازوي‌انعطاف‌پذير با دو بازو، پيچيدگي سيستم و همچنين نامعيني‌هاي سيستم در مدل رياضي ربات مي‌باشد. در اين پايان‌نامه راهكاري براي رديابي مسير مطلوب و سركوب لرزش به شكل برخط، مبتني بر يادگيري Q، توسط ربات بازويي انعطاف‌پذير با دو بازو كه به عنوان يك سيستم غيرخطي داراي پيچيدگي زياد ديناميكي مي‌باشد ارائه شده‌است. تمركز اين پايان‌نامه بر توسعه يك الگوريتم يادگيري ماشين تطبيقي بر خط با ساختار يادگيري Q، براي رسيدن به اهداف كنترلي ذكر شده از سيستم‌ ربات بازويي انعطاف‌پذير با دو بازو با پويايي نامشخص است. روش يادگيريQ قادر است بدون نياز به مدل سيستم و تنها با استفاده از داده‌هاي اندازه‌گيري شده از محيط پاسخ معادله بهينه‌سازي هميلتون – ژاكوبي – بلمن را براي سيستم غيرخطي به‌صورت برخط و در زمان واقعي بيابد. به منظور عدم نياز به مدل از يك شبكه عصبي به نام شبكه عصبي نقاد (سياست‌گذار) به جهت تخمين تابع Q بهره مي‌بريم. وزن‌هاي شبكه عصبي نقاد با استفاده از روش حداقل مربعات بازگشتي بر اساس داده‌هاي دريافتي از سيستم محاسبه مي‌شوند. سياست كنترلي بر اساس تابع Q تقريب‌زده شده، تعيين مي‌شود و به عنوان ورودي به ربات بازويي انعطاف‌پذير با دو بازو اعمال شده تا بتواند مسير مطلوب را دنبال نمايد. در هر گام تابع Q و سياست كنترلي تا زمان همگرايي وزن‌هاي شبكه عصبي، به روش تكرار سياست و روش برون - سياست محاسبه مي‌شوند. به اين صورت كه ربات با يك سياست كنترلي اوليه پايدار ساز و در شرايط تحريك دائم در مسير شروع به حركت مي‌كند، تا زمان همگرايي وزن‌ها، ربات با اين شرايط مسير مطلوب را دنبال مي‌نمايد، پس از همگرايي، سياست كنترلي محاسبه شده جايگزين سياست كنترلي اوليه مي‌شود و ربات ادامه مسير را با استفاده از سياست كنترلي استخراج شده از روش يادگيري Q دنبال مي‌كند. پايداري حلقه بسته با يادگيري پارامترها از طريق روش‌هاي طراحي لياپانوف تضمين شده است. روش پيشنهادي با دو روش كنترل‌كننده مبتني بر يادگيري تقوتي معمول و كنترل‌كننده PID مقايسه شده و نتايج شبيه سازي گواه بر آن است كه عملكرد كنترل‌كننده مبتني بر يادگيري Q در مدت زمان همگرايي وزن‌هاي شبكه عصبي، گشتاور كنترلي و خطاي دنبال كردن نسبت به دو كنترل‌كننده ديگر عملكرد بهتري از خود نشان مي‌دهد.

تاريخ ورود اطلاعات

1402/02/04

عنوان به انگليسي

Vibration control of flexible two link manipulator Based on Reinforcement learning

تاريخ بهره برداري

9/21/2023 12:00:00 AM

دانشجوي وارد كننده اطلاعات

سيد مهدي موسوي

Name: سيد مهدي موسوي
Author: سيد مهدي موسوي

چكيده به لاتين

In recent years, advanced adaptive control is the frontier of robotics and artificial intelligence, on the other hand, deep learning is a bridge between optimal control and adaptive control. Flexible robots have attracted the attention of many researchers in recent years due to applications such as space, medicine, biomedical, rescue operations and features such as high operating speed along with low weight and low energy consumption. The main goal in control of the flexible arm robot is vibration suppression of the flexible arm, these vibrations are due to the elasticity of the flexible robot arm, the other goal is to control the flexible arm robot system with two arms to achieve precise positioning.The main challenge in implementing the controller on a flexible arm robot with two arms is the complexity of the system as well as the system's uncertainty in the mathematical model of the robot. In this thesis, a solution for tracking the optimal path and suppressing vibration online, based on Q learning, is presented by a flexible arm robot with two arms which is dynamically complex as a nonlinear system. The thesis focuses on developing an adaptive machine learning algorithm based on Q learning structure to achieve the mentioned control goals of a flexible arm robot system with two arms with uncertain dynamics. The Q learning method is able to find the Hamilton-Jacobi-Bellman optimization equation for the nonlinear system online and in real time without the need for a system model and only using the data measured from the response environment. In order to avoid the need for a model, we use a neural network called critic neural network (policymaker) to estimate the Q function, the weights of the critical neural network are calculated using the recursive least squares method based on the data received from the system, the control policy is determined based on the approximated Q function , and as an input to the flexible arm robot with two arms applied to be able to. Follow the desired path. In each step, the Q function and control policy until the convergence of the weights of the neural network are calculated by the method of repetition of the policy and the external-policy method, so that the robot starts with a stable initial control policy and in the conditions of constant stimulation in the path, until the convergence of the weights, the robot follows the desired path with these conditions, after convergence, the calculated control policy replaces the control policy. It is initialized and the robot follows the continuation of the path using the control policy extracted from the Q learning method Closed-loop stability is ensured by learning parameters through Lyapanov design methods. The proposed method is compared with two methods of controller based on conventional reinforcement learning and PID controller and simulation results show that the performance of Q-based controller during convergence time of neural network weights, control torque and follow-up error show better performance than the other two controllers.

كليدواژه هاي فارسي

ربات بازويي انعطاف‌پذير با دو بازو , كنترل‌كننده مبتني بر يادگيري Q , رديابي مسير مطلوب , سركوب لرزش

كليدواژه هاي لاتين

flexible arm robot with two arms , Q learning-based controller , optimal path tracking , vibration suppression

Author

seyyed mahdi mousavi

SuperVisor

seyed majid esmaeilzadeh

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=28292&Field=0&DTC=6