-
شماره ركورد
31346
-
پديد آورنده
محمد اميني
-
عنوان
تخصيص پوياي پهناي باند در يادگيري تقويتي عميق چندعامله توزيع شده با در نظر گرفتن محدوديت هاي پهناي باند كانال مخابراتي
-
مقطع تحصيلي
كارشناسي ارشد
-
رشته تحصيلي
برق مخابرات سيستم
-
سال تحصيل
1400
-
تاريخ دفاع
1403/7/02
-
استاد راهنما
دكتر شاهرخ فرهمند
-
استاد مشاور
-
-
دانشكده
برق
-
چكيده
ﻳﺎﺩﮔﻴﺮﻱ ﺗﻘﻮﻳﺘﻲ ﭼﻨﺪ ﻋﺎﻣﻠﻲ MARL ﻳﻜﻲ ﺍﺯ ﺍﻟﮕﻮﺭﻳﺘﻢﻫﺎﻱ ﻣﻮﺛﺮ ﻳﺎﺩﮔﻴﺮﻱ ﻣﺎﺷﻴﻦ ﺍﺳﺖ ﻛﻪ ﺍﻣﺮﻭﺯﻩ ﺗﻮﺟﻪ ﺑﺴﻴﺎﺭ ﺯﻳﺎﺩﻱ ﺭﺍ ﺩﺭ ﺣﻮﺯﻩﻫﺎﻱ ﻣﺨﺘﻠﻒ ﺑﻪ ﺧﻮﺩ ﺟﻠﺐ ﻛﺮﺩﻩ ﺍﺳﺖ. MARL ﺯﻣﺎﻧﻲ ﺍﻫﻤﻴﺖ ﭘﻴﺪﺍ ﻣﻲﻛﻨﺪ ﻛﻪ ﺗﻌﺪﺍﺩﻱ ﻋﺎﻣﻞ ﺑﺮﺍﻱ ﺭﺳﻴﺪﻥ ﺑﻪ ﻫﺪﻑ ﺧﻮﺩ ﻧﻴﺎﺯ ﺑﻪ ﻫﻤﻜﺎﺭﻱ ﻭ ﻫﻤﺎﻫﻨﮕﻲ ﺑﺎ ﻳﻜﺪﻳﮕﺮ ﺩﺍﺭﻧﺪ. ﺩﺭ ﭼﻨﻴﻦ ﺷﺮﺍﻳﻄﻲ، ﺑﺮﻗﺮﺍﺭﻱ ﺍﺭﺗﺒﺎﻁ ﻣﻮﺛﺮ ﻣﻴﺎﻥ ﻋﺎﻣﻞﻫﺎ ﺿﺮﻭﺭﻱ ﺍﺳﺖ. ﻣﻘﺎﻻﺕ ﻣﻮﺟﻮﺩ ﺩﺭ ﺍﻳﻦ ﺯﻣﻴﻨﻪ ﺑﻴﺸﺘﺮ ﺩﻳﺪ ﺍﻳﺪﻩ ﺁﻟﻲ ﻧﺴﺒﺖ ﺑﻪ ﻛﺎﻧﺎﻝ ﻣﺨﺎﺑﺮﺍﺗﻲ ﺑﻴﻦ ﻋﺎﻣﻠﻬﺎ ﻭ ﻳﺎ ﻋﺎﻣﻠﻬﺎ ﻭ ﻣﺮﻛﺰ ﻫﻤﺎﻫﻨﮕﻲ ﺩﺍﺷﺘﻪ ﺍﻧﺪ. ﺍﻳﻦ ﺑﺪﻳﻦ ﻣﻌﻨﻲ ﺍﺳﺖ ﻛﻪ ﻣﺤﺪﻭﺩﻳﺖ ﻫﺎﻱ ﻛﺎﻧﺎﻝ ﻣﺨﺎﺑﺮﺍﺗﻲ ﺩﺭ ﺍﻳﻦ ﻣﻘﺎﻻﺕ ﺩﺭ ﻧﻈﺮ ﮔﺮﻓﺘﻪ ﻧﺸﺪﻩ ﺍﺳﺖ. ﺩﺭ ﻧﺘﻴﺠﻪ ﺍﻟﮕﻮﺭﻳﺘﻢ ﻫﺎﻱ ﺣﺎﺻﻞ ﺩﺭ ﻣﻘﺎﻻﺕ ﻣﻮﺟﻮﺩ ﻣﻤﻜﻦ ﺍﺳﺖ ﺩﺭ ﻋﻤﻞ ﻗﺎﺑﻞ ﭘﻴﺎﺩﻩ ﺳﺎﺯﻱ ﻧﺒﺎﺷﻨﺪ. ﺑﺮﺍﻱ ﺭﻓﻊ ﺍﻳﻦ ﻣﺤﺪﻭﺩﻳﺖﻫﺎ، ﺩﺭ ﺍﻳﻦ ﭘﺎﻳﺎﻥﻧﺎﻣﻪ ﺩﻭ ﺍﻟﮕﻮﺭﻳﺘﻢ ﺟﺪﻳﺪ ﺍﺭﺍﺋﻪ ﻣﻲﺷﻮﺩ ﻛﻪ ﺍﺯ ﭘﻬﻨﺎﻱ ﺑﺎﻧﺪ ﺑﻪ ﺻﻮﺭﺕ ﻧﺰﺩﻳﻚ ﺑﻪ ﺑﻬﻴﻨﻪ ﺍﺳﺘﻔﺎﺩﻩ ﻣﻲ ﻛﻨﻨﺪ ﻭ ﺗﺎ ﺣﺪ ﺍﻣﻜﺎﻥ ﺍﺯ ﻓﺮﺳﺘﺎﺩﻥ ﭘﻴﺎﻡﻫﺎﻱ ﺑﻴﺶ ﺍﺯ ﺣﺪ ﻧﻴﺎﺯ ﺧﻮﺩﺩﺍﺭﻱ ﻣﻲ ﻛﻨﻨﺪ. ﺍﻟﮕﻮﺭﻳﺘﻢﻫﺎﻱ ﭘﻴﺸﻨﻬﺎﺩﻱ ﺩﺭ ﺍﻳﻦ ﭘﺎﻳﺎﻥﻧﺎﻣﻪ ﺗﻼﺵ ﻣﻲﻛﻨﻨﺪ ﺗﺎ ﺑﺎ ﺣﻔﻆ ﻛﺎﺭﺍﻳﻲ ﻭ ﺩﻗﺖ ﺍﻟﮕﻮﺭﻳﺘﻢ ﻳﺎﺩﮔﻴﺮﻱ ﺑﺎ ﻣﺨﺎﺑﺮﻩ ﭘﻴﺎﻡ ﻫﺎﻱ ﺑﺴﻴﺎﺭ ﻣﻬﻢ ﻭ ﺩﺍﺭﺍﻱ ﺍﻫﻤﻴﺖ، ﻣﺼﺮﻑ ﭘﻬﻨﺎﻱ ﺑﺎﻧﺪ ﺭﺍ ﺑﻪ ﺣﺪﺍﻗﻞ ﺑﺮﺳﺎﻧﻨﺪ. ﺷﺒﻴﻪ ﺳﺎﺯﻱ ﻫﺎﻱ ﺍﻟﮕﻮﺭﻳﺘﻢ ﻫﺎﻱ ﭘﻴﺸﻨﻬﺎﺩﻱ ﺑﺮﺍﻱ ﺩﻭ ﻛﺎﺭﺑﺮﺩ ﻣﺘﻔﺎﻭﺕ ﻛﻪ ﻳﻜﻲ ﺑﺎﺯﻱ ﺷﻜﺎﺭ ﻭ ﺷﻜﺎﺭﭼﻲ ﻭ ﺩﻳﮕﺮﻱ ﺣﺪﺍﻛﺜﺮ ﺳﺎﺯﻱ ﻧﺮﺥ ﺍﺭﺗﺒﺎﻃﺎﺕ V2I ﺩﺭ ﺷﺒﻜﻪ ﻫﺎﻱ ﺧﻮﺩﺭﻭﻳﻲ ﺍﺳﺖ ﺍﻧﺠﺎﻡ ﺷﺪﻩ
ﺍﺳﺖ. ﻧﺘﺎﻳﺞ ﻧﺸﺎﻥ ﻣﻲ ﺩﻫﺪ ﻛﻪ ﺍﻳﻦ ﺍﻟﮕﻮﺭﻳﺘﻢ ﻫﺎ ﺩﺭ ﺷﺮﺍﻳﻂ ﭘﻮﻳﺎ ﻭ ﻣﺘﻐﻴﺮ ﺷﺒﻜﻪﻫﺎﻱ ﻭﺳﺎﻳﻞ ﻧﻘﻠﻴﻪ ﺑﻪ ﺧﻮﺑﻲ ﻋﻤﻞ ﻣﻲ ﻛﻨﻨﺪ. ﻫﻤﭽﻨﻴﻦ ﻧﺘﺎﻳﺞ ﺣﺎﺻﻞ ﺍﺯ ﺷﺒﻴﻪﺳﺎﺯﻱﻫﺎ ﻧﺸﺎﻥ ﻣﻲﺩﻫﻨﺪ ﻛﻪ ﺍﻟﮕﻮﺭﻳﺘﻢﻫﺎﻱ ﭘﻴﺸﻨﻬﺎﺩﻱ ﻧﻪ ﺗﻨﻬﺎ ﺍﺯ ﻧﻈﺮ ﺗﺌﻮﺭﻱ ﺑﻠﻜﻪ ﺩﺭ ﻋﻤﻞ ﻧﻴﺰ ﻛﺎﺭﺁﻣﺪ ﻫﺴﺘﻨﺪ ﻭ ﻣﻲﺗﻮﺍﻧﻨﺪ ﺑﻪ ﻋﻨﻮﺍﻥ ﺭﺍﻩﺣﻞﻫﺎﻱ ﻣﺆﺛﺮﻱ ﺑﺮﺍﻱ ﺑﻬﺒﻮﺩ ﻋﻤﻠﻜﺮﺩ ﺷﺒﻜﻪﻫﺎﻱ ﻣﺨﺎﺑﺮﺍﺗﻲ ﺩﺭ ﺳﻨﺎﺭﻳﻮﻫﺎﻱ
ﭘﻴﭽﻴﺪﻩ ﻭ ﻭﺍﻗﻌﻲ ﻣﻮﺭﺩ ﺍﺳﺘﻔﺎﺩﻩ ﻗﺮﺍﺭ ﮔﻴﺮﻧﺪ.
-
تاريخ ورود اطلاعات
1403/07/28
-
عنوان به انگليسي
Dynamic bandwidth allocation in distributed multi-agent deep reinforcement learning under total bandwidth constraints
-
تاريخ بهره برداري
9/23/2025 12:00:00 AM
-
دانشجوي وارد كننده اطلاعات
محمد اميني
-
چكيده به لاتين
Multi-Agent Reinforcement Learning (MARL) provides a general framework for RL agents to cooperate and
coordinate with each other in order to reach their shared or individual goals. Many existing MARL algorithms
require that agents transmit some amount of information at each time step, to enable the ensuing coordination.
However, most of these references do not take the bandlimited nature of wireless communication channels between
agents into account. This leads to inefficient bandwidth utilization as even uninformative messages are transmitted
and results in a waste of transmit power and scarce available spectrum. Recently, attempts are made to mitigate this
shortcoming by somehow limiting the communicated messages. All of these approaches are either static that allocate
a fixed equal bandwidth to all agents, or they are semi-dynamic in the sense that they allocate varying bandwidth to
different agents but this allocation is not completely flexible and is still partially static. To address these limitations,
we propose two novel MARL algorithms which we refer to as Dynamic Network (DyNet) 1 and 2. DyNet 1 utilizes
a deep neural network (DNN) per agent that generates a scalar weight to represent the importance of that particular
agent’s observation at every time instant. All agents’ weights are transmitted to the scheduler which is another
DNN and allocates each portion of the total fixed bandwidth to different agents dynamically. Same architecture is
utilized in DyNet 2, but the total utilized bandwidth is assumed time-varying. To enforce efficient communication
in DyNet2, large message lengths are penalized in the reward forcing the agents to perform well on the task at hand
while avoiding any unnecessary transmissions. An agent sends a message only when the improvement it brings to
the task accomplishment reward outweighs the message overhead penalty. Simulations are carried out under two
different scenarios: A predator-prey grid world, and simultaneous Vehicle to Vehicle (V2V) fast payload delivery and
Vehicle to Infrastructure (V2I) link optimization. Simulation results reveal that both of the proposed algorithms can
perform very well while either limiting or minimizing the needed communication. Indeed, they outperform existing
counterparts given similar (average) bandwidth limitations. As a result, both DyNets demonstrate a great potential
for deployment in practical applications with severely bandwidth-constrained channels such as those encountered
in certain internet of things (IoT) applications.
-
كليدواژه هاي فارسي
يادگيري تقويتي , يادگيري تقويتي چند عاملي , پهناي باند , شبكه وسايل نقليه , يادگيري ماشين
-
كليدواژه هاي لاتين
Reinforcement learning , Multi agent reinforcement learning , Machine learning , Vehicular networks , Bandwidth
-
Author
Mohammad amini
-
SuperVisor
dr shahrokh farahmand
-
لينک به اين مدرک :