محمّد غضنفري

عنوان

ارائه يك مكانيزم يادگيري تقويتي مناسب براي استفاده در شبيه‌سازي فوتبال دوبعدي

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

هوش مصنوعي و رباتيك

تاريخ دفاع

اسفند ماه 1394

استاد راهنما

دكتر ناصر مزيني - دكتر محمد رضا جاهد مطلق

دانشكده

كامپيوتر

چكيده

چكيده امروزه موفقيت در محيط¬هاي چندعامله به يكي از چالش¬هاي هوش مصنوعي مبدل گشته است. اين¬گونه محيط¬ها به دليل پويايي ناشي از وجود چند عامل خودمختار بسيار پيچيده¬تر از محيط¬هاي تك عامله مي¬باشند. شبيه¬سازي فوتبال دوبعدي به عنوان يك محيط چند عامله بسيار پيچيده مورد توجه محققان حوزه يادگيري ماشين قرار دارد. موفقيت در اين محيط به واسطه دارا بودن خواصي از جمله پويايي، پيوستگي، عدم قطعيت، داراي خطا بودن، نيمه مشاهده پذيري و ابعاد بسيار بالايي كه دارد، بسيار پيچيده و دشوار مي¬باشد. يادگيري تقويتي يا RL يادگيري بر اساس تعامل پيوسته عامل با محيط، و استفاده از سعي و خطاي هدايت شده به وسيله بازخورد محيط (پاداش يا تنبيه) مي¬باشد. در اين روش عامل در محيط اعمالي انجام مي¬دهد و به تبع آن¬ها از محيط بازخورد (پاداش) مثبت يا منفي مي¬گيرد. عامل بر اساس اين¬كه در هر حالت از محيط و به ازاي انجام هر عمل چه پاداشي گرفته است، سياست بهينه را در آن محيط مي آموزد. عليرغم موفقيت¬هاي چشمگير RL در مسائل مختلف، اين روش با چند چالش جدي از جمله عدم مقياس پذيري در برخورد با محيط¬هاي با ابعاد بالا (نحسي ابعاد) روبروست. اين مسئله استفاده از RL را در محيط¬هاي بزرگ و پيچيده مثل شبيه¬سازي فوتبال دشوار مي¬سازند. از طرفي در بسياري از مسائل نياز به همكاري چند ربات براي انجام دادن يك وظيفه مشترك احساس مي¬شود. متاسفانه با افزايش كارخانجات و آزمايشگاه¬هايي كه اين ربات¬ها را توليد مي¬كنند، هميشه امكان تطبيق اين تيم از عامل¬ها از قبل وجود نخواهد داشت؛ لذا ايجاد توانايي همكاري با عامل¬هاي ناشناس براي يك عامل هوشمند بسيار مفيد است. در اين پژوهش با پيشنهاد يك مكانيزم يادگيري تقويتي دو لايه سعي در يادگيري همكاري با يك هم¬تيمي ناشناس براي عامل شبيه¬سازي فوتبال دوبعدي به منظور موفقيت در مسئله Half Field Offense را داشته¬ايم. عامل در يك لايه به يادگيري همكاري با هم¬تيمي¬هاي مختلف مي¬پردازد و سپس در يك لايه بالاتر يادگيري تطبيق سريع با عامل ناشناس بر پايه يادگيري¬هاي لايه اول را انجام مي¬دهد. اصلي¬ترين چالش در اين كار تطبيق يادگيري تقويتي با محيطي به بزرگي و پيچيدگي شبيه¬سازي فوتبال است. مكانيزم پيشنهاد شده در اين پژوهش به كمك تعريف مناسب ويژگي¬هاي محيط و خلاصه¬سازي اطلاعات توانسته بر ابعاد بالاي محيط غلبه كرده و نتايج آن بهبود چشمگيري نسبت به كارهاي ارائه شده در اين زمينه تا كنون داشته است. واژه‌هاي كليدي: يادگيري تقويتي، شبيه¬سازي فوتبال دوبعدي، محيط چندعامله، همكاري با عامل ناشناس، خلاصه سازي اطلاعات، Q-Learning، Half Field Offense

تاريخ ورود اطلاعات

1395/12/04

تاريخ بهره برداري

1/1/1900 12:00:00 AM

دانشجوي وارد كننده اطلاعات

اعظم صادقي

Name: اعظم صادقي
Author: محمّد غضنفري

چكيده به لاتين

Abstract Nowadays, being successful in multi agent environments is one of the main challenges in artificial intelligence. These multi agent environments are much more difficult and more complex than single agent environments due to their dynamism, caused by interactions of Autonomous agents together. 2D soccer simulation platform as a complex multi agent environment has played an important role as a tool for researchers in the field of machine learning through time. This environment have some aspects that made it a very realistic and complex tool for the learning processes to succeed in. These aspects are: dynamicity, continuity, uncertainty, noisy and partial observability. Reinforcement learning (RL) is a machine learning method which is based on continuous interactions between the agent and the environment. It uses a guided try and error method with some feedbacks from environment (rewards or punishments). In this method the agent chooses to perform a specific action in a specific state of the environment and gets a positive or negative feedback (reward) based on the result of that action. The agent learns the optimum policy regarding to the received rewards for every chosen action in any specific state of the environment. In spite of all the successes of RL in different problems and environments, this method has some serious difficulties in environments with many dimensions. (curse of dimensionality). This problem makes it challenging to use RL in big and complex environments such as 2D soccer simulation. Furthermore, in many situations/problems it's needed for some robots to cooperate with each other in order to achieve a shared goal. Unfortunately with increasing number of the manufacturers and labs with different robots it gets more and more difficult for them to cooperate with another; hence it would be really helpful to find a way for helping a robot to cooperate with other unknown robot(s). In this paper a two layer reinforcement learning method is proposed for a 2D soccer simulation agent to learn how to cooperate with an unknown and strange teammate in order to succeed in the "half field offence" problem. In one layer the agent tries to learn how to cooperate with each and every different well known teammate, then in an upper layer it tries to quickly adapt itself with an unknown strange agent based on the first layer learnings. The main challenge in this work is to adapt the reinforcement learning method with the big and complex environment of 2D soccer simulation. The proposed mechanism in this paper was able to take over the very multi-dimensional environment of the 2D soccer simulation by defining appropriate features for the environment's states and data abstraction. The results show a really good improvement comparing the other proposed methods so far. Keywords: Reinforcement Learning, RoboCup 2D Soccer Simulation, Multi-Agent Environment, Cooperation with unknown agent, Abstraction, Q-Learning, Half Field Offense

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=16751&Field=0&DTC=6