محمد حيدري

عنوان

بهينه‌سازي تركيبي زمان‌بندي و تركيب قطارهاي مترو با استفاده از يادگيري تقويتي عميق و الگوريتم بهينه‌سازي سياست مجاور

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي حمل و نقل ريلي

سال تحصيل

1401

تاريخ دفاع

1404/7/27

استاد راهنما

دكتر مسعود يقيني

استاد مشاور

دكتر مسعود يقيني

دانشكده

راه آهن

چكيده

بهينه‌سازي بهره‌برداري در سيستم‌هاي حمل‌ونقل ريلي شهري، به‌ويژه در خطوط مترو، يكي از چالش‌هاي اساسي در مديريت تقاضاي پويا و مصرف انرژي است. در اين پژوهش، يك مدل هوشمند براي زمان‌بندي تطبيقي حركت قطارها و تركيب واگن‌ها ارائه شده است كه بر ¬پايه‌ي الگوريتم يادگيري تقويتي عميق با روش بهينه‌سازي سياست مجانبي (PPO) توسعه يافته است. در اين مدل، مسأله‌ي زمان‌بندي و تركيب قطار به‌صورت يك فرايند تصميم‌گيري ماركوفي (MDP) فرموله شده است، كه در آن عامل يادگيرنده با توجه به وضعيت جاري سيستم، شامل سطح تقاضاي مسافران، ظرفيت قطارها و فواصل حركت، اقدام به انتخاب بهينه‌ي دو متغير كليدي «فاصله‌ي حركت» و «تعداد واگن‌ها» مي‌كند. تابع پاداش مدل بر اساس تركيبي از هزينه‌هاي انتظار مسافر، مصرف انرژي و تغيير تركيب قطار طراحي شده است تا توازن ميان كيفيت خدمات و هزينه‌هاي بهره‌برداري برقرار گردد. نتايج شبيه‌سازي نشان مي‌دهد كه مدل پيشنهادي توانسته است در مقايسه با روش‌هاي سنتي، منجر به كاهش زمان انتظار، افزايش نرخ سرويس‌دهي و بهبود كارايي انرژي شود. اين پژوهش با ارائه‌ي چارچوبي نوين براي استفاده از الگوريتم‌هاي يادگيري تقويتي در بهينه‌سازي عمليات مترو، گامي مؤثر در جهت هوشمندسازي سيستم‌هاي حمل‌ونقل شهري برداشته است.

تاريخ ورود اطلاعات

1404/08/19

عنوان به انگليسي

Hybrid optimization of metro train scheduling an‎d composition using deep reinforcement learning an‎d proximal policy optimization algorithm

تاريخ بهره برداري

10/19/2026 12:00:00 AM

دانشجوي وارد كننده اطلاعات

محمد حيدري

Name: محمد حيدري
Author: محمد حيدري

چكيده به لاتين

Optimization of operation in urban rail transportation systems, especially in metro lines, is one of the fundamental challenges in managing dynamic deman‎d an‎d energy consumption. In this research, an intelligent model for adaptive train scheduling an‎d wagon composition is presented, which is developed based on a deep reinforcement learning algorithm with asymptotic policy optimization (PPO) method. In this model, the train scheduling an‎d composition problem is formulated as a Markov decision process (MDP), in which the learning agent, considering the current state of the system — including passenger deman‎d level, train capacity an‎d travel distances — makes an optimal choice of two key variables: Headway an‎d number of wagons (Train Composition). The model reward function is designed based on a combination of passenger waiting costs, energy consumption an‎d train composition changes to establish a balance between service quality an‎d operating costs. Simulation results show that the proposed model has been able to reduce waiting time, increase service rate, an‎d improve energy efficiency compared to traditional methods. This research has taken an effective step towards making urban transportation systems smarter by providing a new framework for using reinforcement learning algorithms in optimizing subway operations.

كليدواژه هاي فارسي

يادگيري تقويتي عميق، بهينه‌سازي سياست مجانبي (PPO)، زمان‌بندي تطبيقي مترو، تركيب قطار، بهره‌برداري هوشمند، تقاضاي پويا

كليدواژه هاي لاتين

Deep reinforcement learning, asymptotic policy optimization (PPO), adaptive subway scheduling, train composition, smart operation, dynamic demand

Author

mohammad heydary

SuperVisor

masoud yaghini

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=33967&Field=0&DTC=6