ميلاد حاج فتحعلي

عنوان

پياده‌سازي روش يادگيري تقويتي جهت قيمت‌گذاري پويا در سامانه حمل‌ونقل آنلاين-مطالعه موردي الوپيك

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

سيستم‌هاي كلان اقتصادي و اجتماعي

تاريخ دفاع

1399/7/19

استاد راهنما

دكتر مهدي غضنفري

دانشكده

صنايع

چكيده

در محيط كسب‌وكار، ابزارهاي جديدي براي درك نيازهاي بازار به وجود آمده‌اند كه ازجمله مي‌توان به روش‌هاي قيمت‌گذاري پويا اشاره كرد. با ظهور تراكنش‌هاي الكترونيكي، قيمت‌گذاري پويا جايگاه ويژه‌اي پيدا نموده است، به‌طوري‌كه در بعضي كسب‌وكارهاي اينترنتي كوچك و متوسط هم‌اكنون قيمت بعضي محصولات و خدمات به‌صورت خودكار و هم‌زمان با تغيير بازار تغيير مي‌كند. در اكثر بازارها، تقاضا و عرضه نوسان مي‌يابد و يك محيط دائماً در حال تغيير در بازار ايجاد مي‌كند. پيش‌بيني همه شرايط ممكن آينده از چنين بازاري غيرممكن است و اطلاعات موجود محدود است. درنتيجه، جريان قابل‌توجهي از مقالات در مورد قيمت‌گذاري پويا از دانش كامپيوتر و جامعه هوش مصنوعي ظاهرشده است. اين مدل‌ها شركت‌ها را قادر مي‌سازد تا داده‌هاي موجود را در ديدگاه و چشم‌انداز خود قرار دهند و استراتژي قيمت‌گذاري خود را تغيير دهند تا بهترين انطباق با محيط بازار را داشته باشند. با بررسي ادبيات تحقيق و دسته‌بندي مقالات مرتبط، مشاهده شد كه اكثر مقالات و تحقيقات صورت گرفته در حوزه قيمت‌گذاري پويا در كسب‌و‌كارهاي آنلاين، با استفاده از رويكردهاي يادگيري ماشين، معطوف به بازارهاي كالا محور مانند فروشگاه‌هاي اينترنتي كالا مي‌باشد. هدف اين مطالعه، به‌كارگيري روش يادگيري تقويتي جهت پياده‌سازي قيمت‌گذاري پويا در سامانه‌ حمل‌ونقل آنلاين الوپيك با تمركز بر شهر تهران و حمل‌ونقل پيك موتور مي‌باشد. براي حل مسئله قيمت‌گذاري در اين مطالعه از الگوريتم يادگيري كيو استفاده مي‌شود. به دليل بزرگ بودن فضاي حالت‌ها در اين مطالعه از ايده‌ي شبكه‌هاي عميق كيو (DQN) جهت تقريب مقادير كيو استفاده مي‌شود. DQN با استفاده از پكيج‌هاي كراس و كراس آر-ال در زبان برنامه‌نويسي پايتون پياده‌سازي شده‌اند. در ادامه توسعه مدل، بعد از تقسيم‌بندي شهر تهران، به هر قسمت در هر بازه زماني توسط عامل تصميم‌گيرنده قيمت‌گذاري خودكار ضريبي نسبت داده مي‌شود. اين ضريب‌ها در قيمت پايه الوپيك ضرب شده و قيمت نهايي را حاصل مي‌كند. با استفاده از شبيه‌سازي بازار با استفاده از داده‌هاي واقعي الوپيك و پياده‌سازي روش يادگيري تقويتي در اين محيط و مقايسه نتايج حاصل از آن با محيط واقعي، شاهد 20 درصد بهبود در درصد سفارشات تكميل شده، 35 درصد بهبود در مدت زمان معطلي، 30 درصد بهبود در نرخ منقضي شدن سفارشات، 20 درصد بهبود در نرخ دريافت يك سفارش توسط سفيران مختلف و نزديك به 20 درصد بهبود در نسبت تغييرات تقاضا به عرضه هستيم. از اين‌رو مي‌توان نتيجه گرفت رويكرد پيشنهادي عملكرد مناسبي را جهت برقراري تعادل در مؤلفه‌هاي سامانه‌هاي حمل‌ونقل آنلاين به‌وسيله قيمت‌گذاري پويا نشان داد.

تاريخ ورود اطلاعات

1399/09/05

عنوان به انگليسي

Implementing the reinforcement learning method for dynamic pricing in ride hailing platform- AloPeyk case study

تاريخ بهره برداري

10/10/2020 12:00:00 AM

دانشجوي وارد كننده اطلاعات

ميلاد حاج فتحعلي

Name: ميلاد حاج فتحعلي
Author: ميلاد حاج فتحعلي

چكيده به لاتين

In the business environment, new tools have been developed to understand market needs, including dynamic pricing methods. With the advent of e-commerce, dynamic pricing has found a special place, so that in some small and medium-sized Internet businesses, the prices of some products and services are now automatically changing as the market changes. In most markets, demand and supply fluctuate, creating a constantly changing market environment. It is impossible to predict all possible future conditions of such a market, and the available information is limited. As a result, a significant amount of articles has emerged about the dynamic pricing of computer science and the artificial intelligence community. These models enable companies to put existing data into a vision and change their pricing strategy to best adapt to the market environment. A review of the literature related to research and classification of related articles found that most articles and research in the field of dynamic pricing in online businesses, using machine learning approaches, focused on commodity markets such as online commodity stores. The aim of this study is to use the reinforcement learning approach to strengthen the implementation of dynamic pricing in the AloPeyk on-demand delivery system. The Q-learning algorithm is used to solve the pricing problem in this study. Due to the large space of the states in this study, the idea of deep Q networks (DQN) is used to approximate the values of Q. Therefore, after dividing the city of Tehran, each part is assigned a coefficient by the surge multiplier decision-making agent at each time period. These coefficients are multiplied by the base price of the AloPeyk and then the final price calculated. Using market simulation using real-time AloPeyk data and implementing reinforcement learning methods in this environment and comparing the results with the real environment, the proposed approach demonstrates good performance to balance the components of on-demand delivery systems through dynamic pricing.

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=22568&Field=0&DTC=6