محمد اميني

شماره ركورد
31346
پديد آورنده
محمد اميني
عنوان
تخصيص پوياي پهناي باند در يادگيري تقويتي عميق چندعامله توزيع شده با در نظر گرفتن محدوديت هاي پهناي باند كانال مخابراتي
مقطع تحصيلي
كارشناسي ارشد
رشته تحصيلي
برق مخابرات سيستم
سال تحصيل
1400
تاريخ دفاع
1403/7/02
استاد راهنما
دكتر شاهرخ فرهمند
استاد مشاور
-
دانشكده
برق
چكيده
ﻳﺎﺩﮔﻴﺮﻱ ﺗﻘﻮﻳﺘﻲ ﭼﻨﺪ ﻋﺎﻣﻠﻲ MARL ﻳﻜﻲ ﺍﺯ ﺍﻟﮕﻮﺭﻳﺘﻢﻫﺎﻱ ﻣﻮﺛﺮ ﻳﺎﺩﮔﻴﺮﻱ ﻣﺎﺷﻴﻦ ﺍﺳﺖ ﻛﻪ ﺍﻣﺮﻭﺯﻩ ﺗﻮﺟﻪ ﺑﺴﻴﺎﺭ ﺯﻳﺎﺩﻱ ﺭﺍ ﺩﺭ ﺣﻮﺯﻩﻫﺎﻱ ﻣﺨﺘﻠﻒ ﺑﻪ ﺧﻮﺩ ﺟﻠﺐ ﻛﺮﺩﻩ ﺍﺳﺖ. MARL ﺯﻣﺎﻧﻲ ﺍﻫﻤﻴﺖ ﭘﻴﺪﺍ ﻣﻲﻛﻨﺪ ﻛﻪ ﺗﻌﺪﺍﺩﻱ ﻋﺎﻣﻞ ﺑﺮﺍﻱ ﺭﺳﻴﺪﻥ ﺑﻪ ﻫﺪﻑ ﺧﻮﺩ ﻧﻴﺎﺯ ﺑﻪ ﻫﻤﻜﺎﺭﻱ ﻭ ﻫﻤﺎﻫﻨﮕﻲ ﺑﺎ ﻳﻜﺪﻳﮕﺮ ﺩﺍﺭﻧﺪ. ﺩﺭ ﭼﻨﻴﻦ ﺷﺮﺍﻳﻄﻲ، ﺑﺮﻗﺮﺍﺭﻱ ﺍﺭﺗﺒﺎﻁ ﻣﻮﺛﺮ ﻣﻴﺎﻥ ﻋﺎﻣﻞﻫﺎ ﺿﺮﻭﺭﻱ ﺍﺳﺖ. ﻣﻘﺎﻻﺕ ﻣﻮﺟﻮﺩ ﺩﺭ ﺍﻳﻦ ﺯﻣﻴﻨﻪ ﺑﻴﺸﺘﺮ ﺩﻳﺪ ﺍﻳﺪﻩ ﺁﻟﻲ ﻧﺴﺒﺖ ﺑﻪ ﻛﺎﻧﺎﻝ ﻣﺨﺎﺑﺮﺍﺗﻲ ﺑﻴﻦ ﻋﺎﻣﻠﻬﺎ ﻭ ﻳﺎ ﻋﺎﻣﻠﻬﺎ ﻭ ﻣﺮﻛﺰ ﻫﻤﺎﻫﻨﮕﻲ ﺩﺍﺷﺘﻪ ﺍﻧﺪ. ﺍﻳﻦ ﺑﺪﻳﻦ ﻣﻌﻨﻲ ﺍﺳﺖ ﻛﻪ ﻣﺤﺪﻭﺩﻳﺖ ﻫﺎﻱ ﻛﺎﻧﺎﻝ ﻣﺨﺎﺑﺮﺍﺗﻲ ﺩﺭ ﺍﻳﻦ ﻣﻘﺎﻻﺕ ﺩﺭ ﻧﻈﺮ ﮔﺮﻓﺘﻪ ﻧﺸﺪﻩ ﺍﺳﺖ. ﺩﺭ ﻧﺘﻴﺠﻪ ﺍﻟﮕﻮﺭﻳﺘﻢ ﻫﺎﻱ ﺣﺎﺻﻞ ﺩﺭ ﻣﻘﺎﻻﺕ ﻣﻮﺟﻮﺩ ﻣﻤﻜﻦ ﺍﺳﺖ ﺩﺭ ﻋﻤﻞ ﻗﺎﺑﻞ ﭘﻴﺎﺩﻩ ﺳﺎﺯﻱ ﻧﺒﺎﺷﻨﺪ. ﺑﺮﺍﻱ ﺭﻓﻊ ﺍﻳﻦ ﻣﺤﺪﻭﺩﻳﺖﻫﺎ، ﺩﺭ ﺍﻳﻦ ﭘﺎﻳﺎﻥﻧﺎﻣﻪ ﺩﻭ ﺍﻟﮕﻮﺭﻳﺘﻢ ﺟﺪﻳﺪ ﺍﺭﺍﺋﻪ ﻣﻲﺷﻮﺩ ﻛﻪ ﺍﺯ ﭘﻬﻨﺎﻱ ﺑﺎﻧﺪ ﺑﻪ ﺻﻮﺭﺕ ﻧﺰﺩﻳﻚ ﺑﻪ ﺑﻬﻴﻨﻪ ﺍﺳﺘﻔﺎﺩﻩ ﻣﻲ ﻛﻨﻨﺪ ﻭ ﺗﺎ ﺣﺪ ﺍﻣﻜﺎﻥ ﺍﺯ ﻓﺮﺳﺘﺎﺩﻥ ﭘﻴﺎﻡﻫﺎﻱ ﺑﻴﺶ ﺍﺯ ﺣﺪ ﻧﻴﺎﺯ ﺧﻮﺩﺩﺍﺭﻱ ﻣﻲ ﻛﻨﻨﺪ. ﺍﻟﮕﻮﺭﻳﺘﻢﻫﺎﻱ ﭘﻴﺸﻨﻬﺎﺩﻱ ﺩﺭ ﺍﻳﻦ ﭘﺎﻳﺎﻥﻧﺎﻣﻪ ﺗﻼﺵ ﻣﻲﻛﻨﻨﺪ ﺗﺎ ﺑﺎ ﺣﻔﻆ ﻛﺎﺭﺍﻳﻲ ﻭ ﺩﻗﺖ ﺍﻟﮕﻮﺭﻳﺘﻢ ﻳﺎﺩﮔﻴﺮﻱ ﺑﺎ ﻣﺨﺎﺑﺮﻩ ﭘﻴﺎﻡ ﻫﺎﻱ ﺑﺴﻴﺎﺭ ﻣﻬﻢ ﻭ ﺩﺍﺭﺍﻱ ﺍﻫﻤﻴﺖ، ﻣﺼﺮﻑ ﭘﻬﻨﺎﻱ ﺑﺎﻧﺪ ﺭﺍ ﺑﻪ ﺣﺪﺍﻗﻞ ﺑﺮﺳﺎﻧﻨﺪ. ﺷﺒﻴﻪ ﺳﺎﺯﻱ ﻫﺎﻱ ﺍﻟﮕﻮﺭﻳﺘﻢ ﻫﺎﻱ ﭘﻴﺸﻨﻬﺎﺩﻱ ﺑﺮﺍﻱ ﺩﻭ ﻛﺎﺭﺑﺮﺩ ﻣﺘﻔﺎﻭﺕ ﻛﻪ ﻳﻜﻲ ﺑﺎﺯﻱ ﺷﻜﺎﺭ ﻭ ﺷﻜﺎﺭﭼﻲ ﻭ ﺩﻳﮕﺮﻱ ﺣﺪﺍﻛﺜﺮ ﺳﺎﺯﻱ ﻧﺮﺥ ﺍﺭﺗﺒﺎﻃﺎﺕ V2I ﺩﺭ ﺷﺒﻜﻪ ﻫﺎﻱ ﺧﻮﺩﺭﻭﻳﻲ ﺍﺳﺖ ﺍﻧﺠﺎﻡ ﺷﺪﻩ ﺍﺳﺖ. ﻧﺘﺎﻳﺞ ﻧﺸﺎﻥ ﻣﻲ ﺩﻫﺪ ﻛﻪ ﺍﻳﻦ ﺍﻟﮕﻮﺭﻳﺘﻢ ﻫﺎ ﺩﺭ ﺷﺮﺍﻳﻂ ﭘﻮﻳﺎ ﻭ ﻣﺘﻐﻴﺮ ﺷﺒﻜﻪﻫﺎﻱ ﻭﺳﺎﻳﻞ ﻧﻘﻠﻴﻪ ﺑﻪ ﺧﻮﺑﻲ ﻋﻤﻞ ﻣﻲ ﻛﻨﻨﺪ. ﻫﻤﭽﻨﻴﻦ ﻧﺘﺎﻳﺞ ﺣﺎﺻﻞ ﺍﺯ ﺷﺒﻴﻪﺳﺎﺯﻱﻫﺎ ﻧﺸﺎﻥ ﻣﻲﺩﻫﻨﺪ ﻛﻪ ﺍﻟﮕﻮﺭﻳﺘﻢﻫﺎﻱ ﭘﻴﺸﻨﻬﺎﺩﻱ ﻧﻪ ﺗﻨﻬﺎ ﺍﺯ ﻧﻈﺮ ﺗﺌﻮﺭﻱ ﺑﻠﻜﻪ ﺩﺭ ﻋﻤﻞ ﻧﻴﺰ ﻛﺎﺭﺁﻣﺪ ﻫﺴﺘﻨﺪ ﻭ ﻣﻲﺗﻮﺍﻧﻨﺪ ﺑﻪ ﻋﻨﻮﺍﻥ ﺭﺍﻩﺣﻞﻫﺎﻱ ﻣﺆﺛﺮﻱ ﺑﺮﺍﻱ ﺑﻬﺒﻮﺩ ﻋﻤﻠﻜﺮﺩ ﺷﺒﻜﻪﻫﺎﻱ ﻣﺨﺎﺑﺮﺍﺗﻲ ﺩﺭ ﺳﻨﺎﺭﻳﻮﻫﺎﻱ ﭘﻴﭽﻴﺪﻩ ﻭ ﻭﺍﻗﻌﻲ ﻣﻮﺭﺩ ﺍﺳﺘﻔﺎﺩﻩ ﻗﺮﺍﺭ ﮔﻴﺮﻧﺪ.
تاريخ ورود اطلاعات
1403/07/28
عنوان به انگليسي
Dynamic bandwidth allocation in distributed multi-agent deep reinforcement learning under total bandwidth constraints
تاريخ بهره برداري
9/23/2025 12:00:00 AM
دانشجوي وارد كننده اطلاعات
محمد اميني
چكيده به لاتين
Multi-Agent Reinforcement Learning (MARL) provides a general framework for RL agents to cooperate and coordinate with each other in order to reach their shared or individual goals. Many existing MARL algorithms require that agents transmit some amount of information at each time step, to enable the ensuing coordination. However, most of these references do not take the bandlimited nature of wireless communication channels between agents into account. This leads to inefficient bandwidth utilization as even uninformative messages are transmitted and results in a waste of transmit power and scarce available spectrum. Recently, attempts are made to mitigate this shortcoming by somehow limiting the communicated messages. All of these approaches are either static that allocate a fixed equal bandwidth to all agents, or they are semi-dynamic in the sense that they allocate varying bandwidth to different agents but this allocation is not completely flexible and is still partially static. To address these limitations, we propose two novel MARL algorithms which we refer to as Dynamic Network (DyNet) 1 and 2. DyNet 1 utilizes a deep neural network (DNN) per agent that generates a scalar weight to represent the importance of that particular agent’s observation at every time instant. All agents’ weights are transmitted to the scheduler which is another DNN and allocates each portion of the total fixed bandwidth to different agents dynamically. Same architecture is utilized in DyNet 2, but the total utilized bandwidth is assumed time-varying. To enforce efficient communication in DyNet2, large message lengths are penalized in the reward forcing the agents to perform well on the task at hand while avoiding any unnecessary transmissions. An agent sends a message only when the improvement it brings to the task accomplishment reward outweighs the message overhead penalty. Simulations are carried out under two different scenarios: A predator-prey grid world, and simultaneous Vehicle to Vehicle (V2V) fast payload delivery and Vehicle to Infrastructure (V2I) link optimization. Simulation results reveal that both of the proposed algorithms can perform very well while either limiting or minimizing the needed communication. Indeed, they outperform existing counterparts given similar (average) bandwidth limitations. As a result, both DyNets demonstrate a great potential for deployment in practical applications with severely bandwidth-constrained channels such as those encountered in certain internet of things (IoT) applications.
كليدواژه هاي فارسي
يادگيري تقويتي , يادگيري تقويتي چند عاملي , پهناي باند , شبكه وسايل نقليه , يادگيري ماشين
كليدواژه هاي لاتين
Reinforcement learning , Multi agent reinforcement learning , Machine learning , Vehicular networks , Bandwidth
Author
Mohammad amini
SuperVisor
dr shahrokh farahmand
لينک به اين مدرک :
http://dl.iust.ac.ir/dL/search/default.aspx?Term=31346&Field=0&DTC=6

کلیه حقوق این اثر برای شرکت مهندسی ارتباطات پيام مشرق محفوظ می باشد