سيد علي ولايتي پاكرو

عنوان

بهينه سازي تنظيمات قيمت گذاري بيمه نامه با استفاده از الگوريتم VQQL يادگيري تقويتي:بررسي اثرات بر مشتري و شركت بيمه

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

آمار رياضي

سال تحصيل

1402

تاريخ دفاع

1404/11/07

استاد راهنما

رحمان فرنوش

استاد مشاور

رحمان فرنوش

دانشكده

رياضي و علوم كامپيوتر

چكيده

مسئله اصلي در اين پايان‌نامه، تعيين قيمت تمديد بيمه براي مشتريان شركت است كه دو هدف متعارض دارد: افزايش درآمد و حفظ مشتري . افزايش قيمت ممكن است منجر به از دست دادن مشتريان شود، در حالي كه كاهش قيمت مي‌تواند درآمد را كاهش دهد. اين مسئله را به عنوان يك فرآيند تصميم‌گيري ماركوف (MDP) و يا يك MDPمقيد (CMDP) مدل‌سازي كرده‌اند. در اولين رويكرد، فقط بهبود درآمد مورد نظر است، در حالي كه دردومين رويكرد، درآمد بهينه‌سازي مي‌شود به طوري كه سطح حفظ مشتري زير آستانه مشخصي نبرد. هدف پيدا كردن يك سياست بهينه براي شركت‌هاي بيمه براي سود بيشتر مي‌باشد. براي حل اين مدل‌ها، از الگوريتم يادگيري تقويتي بدون مدل (Model-Free RL) و گسسته شده‌اي به نام VQQL استفاده شده است. اين روش روي داده‌هاي بيمه شخص ثالث خودرو فرانسه اعمال شده است؛ كه شامل 680 هزار ركورد مي باشد. اين الگوريتم، فضاي حالت‌ پيوسته يادگيري تقويتي را با الگوريتم خوشه بندي K-Means را به صورت گسسته تبديل كرده و سياست‌هاي بهينه را ياد مي‌گيرد. در اين راستا، ما دو رويكرد بهينه‌سازي را بررسي و مقايسه مي‌كنيم:  بهينه سازي درآمد (با تحليل مدل احتمال پذيرش مشتري)  بهينه سازي درآمد مشروط (CMDP) به اينكه نرخ حفظ مشتري از يك آستانه مشخص پايين‌تر نيايد. نتايج نشان مي‌دهد كه سياست‌هاي بهينه‌شده توسط الگوريتم VQQL، يك روش تصميم گيري مؤثر را براي تعيين قيمت تمديد هر بيمه‌نامه ارائه مي‌دهد. اين سياست‌ها امكان ايجاد توازن پويا ميان منافع متضاد را فراهم كرده و تأثير مستقيم تنظيمات قيمت را هم بر سودآوري شركت بيمه و هم بر تمايل مشتري به تمديد قرارداد به طور هم‌زمان تحليل و بهينه مي‌كند. شايان ذكر است كه اين پژوهش شامل يك نوآوري تحليل پوششي داده‌ها و پيشنهاد يادگيري تقويتي بيزي براي بهتر شدن پژوهش در اهداف آتي مي باشد.

تاريخ ورود اطلاعات

1404/11/26

عنوان به انگليسي

Optimizing insurance policy pricing settings using reinforcement learning VQQL algorithm: Examining the effects on the customer an‎d the insurance company

تاريخ بهره برداري

2/10/2026 12:00:00 AM

دانشجوي وارد كننده اطلاعات

سيدعلي ولايتي پاكرو

Name: سيدعلي ولايتي پاكرو
Author: سيد علي ولايتي پاكرو

چكيده به لاتين

The main problem in this thesis is determining the renewal price fo‎r the company’s customers, which has two conflicting objectives: increasing revenue an‎d retaining the customer. Increasing the price may lead to losing customers, while decreasing the price may reduce the revenue. This problem is modeled as a Markov Decision Process (MDP) o‎r a constrained MDP (CMDP). In the first approach, only the revenue improvement is desired, while in the second approach, the revenue is optimized so that the customer retention level does not fall below a certain threshold. The goal is to find an optimal policy fo‎r the insurance companies to maximize profits. To solve these models, a model-free RL an‎d discrete reinfo‎rcement learning algo‎rithm called VQQL is used. This method is applied to the French third-party automobile insurance data; which consists of 680 thousan‎d reco‎rds. This algo‎rithm discrete-shifts the continuous state space of reinfo‎rcement learning with the K-Means clustering algo‎rithm an‎d learns optimal policies. In this regard, we examine an‎d compare two optimization approaches: • Revenue optimization (by analyzing the customer acceptance probability model) • Revenue optimization conditional on the customer retention rate not falling below a certain threshold (CMDP). The results show that the policies optimized by the VQQL algo‎rithm provide an effective decision-making method fo‎r determining the renewal price of each insurance policy. These policies allow fo‎r a dynamic balance between conflicting interests an‎d simultaneously analyze an‎d optimize the direct impact of price adjustments on both the profitability of the insurance company an‎d the customerʹs willingness to renew the contract. It is wo‎rth noting that this research includes a data envelopment analysis innovation an‎d Bayesian reinfo‎rcement learning proposal to improve the research in future goals.

كليدواژه هاي فارسي

بهينه سازي قيمت گذاري , يادگيري تقويتي , الگوريتم Q-Learning , الگوريتم VQQL , فرايند تصميم گيري ماركف , الگوريتم K-means , يادگيري تقويتي بيزي

كليدواژه هاي لاتين

Pricing Optimaization , Reinforcement Learning , Q-Learning Algorithm , VQQL Algorithm , Markov Decision Process , K-Means Algorithm , Bayesian RL

Author

Seyyed Ali Velayati Pakrow

SuperVisor

Rahman Farnosh

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=34526&Field=0&DTC=6