سجاد روشن روان

عنوان

طراحي كنترل‌ تحمل‌پذير عيب يكپارچه بهينه براي سيستم‌هاي غيرخطي افاين با استفاده از يادگيري تقويتي

مقطع تحصيلي

دكتري

رشته تحصيلي

مهندسي برق كنترل

سال تحصيل

1398

تاريخ دفاع

1403/3/26

استاد راهنما

سعيد شمقدري

استاد مشاور

دانشكده

مهندسي برق

چكيده

در اين رساله مسئله طراحي كنترل‌ ردياب تحمل‌پذير عيب بهينه براي سيستم‌هاي غيرخطي اقاين كه در معرض عيوب عملگر و فرآيند قرار دارند در دو حالت حضور و عدم حضور قيود ورودي و حالت مورد بحث قرار گرفته است. كنترل تحمل‌پذير عيب بهينه پيشنهادي مبتني بر يادگيري تقويتي است و بدون نياز به شناخت قبلي از ديناميك سيستم و به صورت يكپارچه طراحي مي‌شود. بدين معني كه آشكارسازي ‌عيب و بازطراحي كنترل‌كننده را به‌ طور همزمان مورد بحث قرار مي‌دهد. در اين راستا به منظور حل برخط معادله هميلتون-ژاكوبي-بلمن (HJB) بدون نياز به شناخت ديناميك سيستم و مقدار عيب، از يك ساختار تخمين با دو شبكه عصبي شناساگر و نقاد استفاده شده است. در اين ساختار شبكه شناساگر وظيفه تخمين ديناميك سيستم و شبكه نقاد وظيفه تخمين تابع هزينه بهينه را بر عهده دارد كه به صورت همزمان آموزش داده مي‌شوند. ساختار شناساگر به صورت عصبي تطبيقي در نظر گرفته شده و بر تخمين سيستم بر حسب توابع پايه فيلتر شده مبتني است. با استفاده از قضيه لياپانوف نشان داده مي‌شود كه علاوه بر همگرايي خروجي شناساگر به حالات سيستم، وزن‌هاي شبكه شناساگر نيز به مقادير واقعي خود همگرا مي‌شوند كه يك شرط لازم براي همگرايي آموزش به قانون كنترل بهينه در اين ساختار محسوب مي‌شود. در قانون به‌روزرساني وزن‌هاي شبكه شناساگر از روش پاسخ تجربه استفاده مي‌شود و ضريب فراموشي در آن به صورت متغير در نظر گرفته شده است كه باعث افزايش سرعت همگرايي و مقاومت نسبت به نويز اندازه‌گيري و كاهش خطاي تخمين مي‌شوند. سپس حل مسئله كنترل تحمل‌پذير ردياب بهينه عيب براي سيستم مورد نظر در حالات مقيد و نامقيد با حل دو مسئله‌ پايدارسازي بهينه نامقيد براي يك سيستم افزوده معادل مي‌شود كه شامل ديناميك‌هاي خطاي رديابي و مسير مرجع است. بدين منظور در حالت مقيد، محدود بودن ورودي كنترلي با انتخاب تابع هزينه مناسب بر سيگنال ورودي و ايمن بودن حالات با تعريف توابع كنترل مانع مناسب تضمين داده مي‌شوند. از سوي ديگر زماني كه عيب در سيستم اتفاق مي‌افتد، تا زماني كه وقوع عيب آشكارسازي شود، سيستم تحت كنترل‌كننده نامناسب عمل مي‌كند كه لزوما پايداري را تضمين نمي‌دهد. از اين رو به منظور امكان شروع فرآيند آموزش از كنترل‌كننده پيش از وقوع عيب، از يك جمله پايدارساز در قانون به‌روزرساني شبكه نقاد استفاده شده است. در اين ساختار، آشكارسازي عيب بدون نياز به هيچگونه بانكي از مدل، مبتني بر رويكرد آزمايش معقوليت انجام مي‌پذيرد كه در آن وقوع عيب جديد صرفا بر اساس مقدار لحظه‌اي باقي‌مانده معادله HJB تشخيص داده مي‌شود. پايداري فراگير يكنواخت وزن‌هاي شبكه شناساگر و نقاد و در نتيجه همگرايي قانون كنترل به پاسخ بهينه با استفاده از قضيه لياپانوف اثبات و با استفاده از نتايج شبيه‌سازي صحت عملكرد آن نشان داده شده است.

تاريخ ورود اطلاعات

1403/06/17

عنوان به انگليسي

Optimal integrated fault-tolerant control design for affine nonlinear systems using reinforcement learning

تاريخ بهره برداري

6/15/2025 12:00:00 AM

دانشجوي وارد كننده اطلاعات

سجاد روشن روان

Name: سجاد روشن روان
Author: سجاد روشن روان

چكيده به لاتين

In this thesis, the problem of integrated fault-tolerant control design for affine nonlinear systems subject to component and actuator faults is investigated. The proposed integrated method covers the both detection and compensation of the faults in the presence and absence of the input and state constraints and guarantees the tracking of the reference states while minimizing the cost function desired by the designer. In this regard, to eliminate the need for knowledge of system dynamics and estimation of fault magnitudes, the proposed optimal method is developed based on reinforcement learning with a dual neural network (NN) approximation structure of identifier-critic. The structure of the identifier is considered as a single-layer adaptive NN, which is based on estimating the system in terms of filtered basis functions. Using the Lyapunov stability theory, it is shown that in addition to the convergence of the identifier outputs to the system states, the weights of the identifier NN also converge to their true values, which is a necessary condition for the convergence of training process to the optimal control policy in this structure. the identifier NN weight update law, the experience replay method is used, and the forgetting factor is considered variable in it, leading to increased convergence rate and robustness to measurement noise, as well as reducing estimation error. In this method, solving the optimal fault-tolerant tracking control problem for the main system in both constrained and unconstrained cases is equivalent to solving the optimal unconstrained stabilization problems for an augmented system that consists of the dynamics of the tracking errors and the reference path. The cost function in stabilization problems is considered in a discounted form, where the boundedness of control input and the safety of system states are guaranteed, respectively, by selecting an appropriate cost function on the input signal and suitable control barrier functions on the states. The critic NN is responsible for approximating the cost function and is trained simultaneously with the identifier NN. On the other hand, when a fault occurs in the system, it operates under inappropriate control until the fault occurrence is detected, which does not necessarily guarantee stability. Therefore, to enable training initiation from the controller before the fault occurrence, a stabilizing term is included in the critic NN update law. In this structure, fault detection is performed solely based on the instantaneous value of the residual error of the Hamilton-Jacobi-Bellman (HJB) equation, without the need for any model. The Uniformly Ultimately Boundedness (UUB) of identifier and critic NN weight errors and, as a result, the convergence of the control input to the neighborhood of the optimal solution are all proved by Lyapunov theory. The simulation results are given to validate the effectiveness of the developed methods.

كليدواژه هاي فارسي

كنترل تحمل‌پذير عيب يكپارچه , عيوب عملگر و فرآيند , سيستم‌هاي غيرخطي افاين , يادگيري تقويتي ايمن , شناسايي سيستم

كليدواژه هاي لاتين

Integrated fault-tolerant control , Actuator and component faults , Affine nonlinear systems , Safe reinforcement learning , System identification

Author

Sajad Roshanravan

SuperVisor

Dr. Saeed Shamaghdari

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=31151&Field=0&DTC=6