حسين قلي زاده ساعي قره چبق

عنوان

كنترل سيستمهاي خطي مقيد با استفاده از يادگيري تقويتي تمام-عم

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي برق كنترل

سال تحصيل

1400

تاريخ دفاع

1400/8/9

استاد راهنما

سعيد شمقدري

دانشكده

برق

چكيده

حل مساله مقيد يكي از بزرگترين چالشهاي الگوريتمهاي يادگيري تقويتي ميباشد. در مساله مقيد، چنانچه قيد بر روي حالتهاي سيستم باشد به آن مسئله ايمني نيز ميگويند. اكثر روشهايي كه بهينگي قانون كنترل را تضمين ميكنند در مورد ايمني آن ادعايي ندارند و بالعكس. در اين پاياننامه الگوريتمي براي طراحي كنترلكننده امن و بهينه براي يك سيستم غيرخطي پيوسته ارائه ميشود. در اين روش، الگوريتم تكرار سياست با استفاده از مفهوم تابع مانع به الگوريتم تكرار سياست امن تبديل ميشود. براي حل اين مسئله نيازمند داشتن ناحيه امن و ناحيه داراي تضمين پايداري، در حضور قيدهاي مسئله هستيم. عالوه بر اين يك الگوريتم تكراري ديگر ارائه ميشود كه بزرگترين ناحيه امن و با تضمين پايداري را ارائه ميكند. اين تضمين داده ميشود كه مسير حالتهاي سيستم از اين ناحيه خارج نخواهند شد. در الگوريتم تكرار سياست از برنامهريزي مجموع مربعات استفاده شده است كه روشي موثر براي حل مسائل بهينهسازي با قيود چند جملهاي ميباشد. نهايتا كارايي كنترلكننده پيشنهاد شده با استفاده از شبيهسازي نشان داده شده است

تاريخ ورود اطلاعات

1400/10/11

عنوان به انگليسي

Control of constrained linear systems using lifelong reinforcement learning

تاريخ بهره برداري

1/1/1900 12:00:00 AM

دانشجوي وارد كننده اطلاعات

حسين قلي زاده ساعي قره چبق

Name: حسين قلي زاده ساعي قره چبق
Author: حسين قلي زاده ساعي قره چبق

چكيده به لاتين

Solving a constrained problem in optimal iterative solution algorithms such as reinforcement learning is one of the most important and fundamental challenges. Most methods that guarantee optimality do not guarantee safety, and those that solve the safety problem lose optimality. This research presents an algorithm for designing a safe and optimal controller for a continuous nonlinear system that satisfies state constraints. In this method, the optimal policy iteration using the barrier function becomes the optimal safe policy iteration. To solve this problem, we need to have a safe area and an area with a guarantee of stability, in the presence of the problem constraints. In addition, another iterative algorithm is presented that provides the largest safe and stable area. It is guaranteed that the system states will not deviate from this area. To solve the problem, the optimization of the sum of squares has been used. The performance of this controller is shown using simulation

كليدواژه هاي فارسي

يادگيري تقويتي امن , تكرار سياست امن , تابع مانع

كليدواژه هاي لاتين

safe reinforcement learning , safe policy iteration, , Barrier function

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=25793&Field=0&DTC=6