سجاد عليزاده فرد

عنوان

شناسايي تقلبات مالي با استفاده از روش‌هاي يادگيري گروهي

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي كامپيوتر- نرم افزار

سال تحصيل

1399

تاريخ دفاع

1402/4/28

استاد راهنما

دكتر حسين رحماني

دانشكده

مهندسي كامپيوتر

چكيده

تقلب در داده‌هاي مالي همواره يك نگراني جدي براي سازمان‌هاي تجاري و افراد است. اِعمال بررسي‌هاي دستي براي شناسايي تقلب زمان‌بر و پرهزينه است.كشف مناسب تقلب به بازرسان اجازه مي‌دهد اقدامات به‌موقع انجام دهند و از تقلبات بيش‌تر و خسارات مالي جلوگيري كنند. از مراحل اصلي در فرايند كشف تقلب، مرحله انتخاب ويژگي‌ها است كه تاثير مهمي بر دقت و زمان اجراي مدل‌ها دارد. تراكنش‌هاي كارت‌هاي اعتباري به طور معمول تعداد زيادي ويژگي دارند. برخي از ويژگي‌ها ممكن است براي رده‌بندها معني‌دار نباشند يا منجر به بيش‌برازش شود. علاوه بر اين، داشتن ويژگي‌هاي تكراري كم‌تر، منجر به درك بهتر تصميم رده‌بند مي‌شود. همچنين، انتخاب ويژگي‌ مي‌تواند «سرعت رده‌بندها» را به دليل كاهش اندازه مجموعه ويژگي‌ها و «عملكرد آن‌ها» را به دليل جلوگيري از بيش‌برازش افزايش دهد. از چالش‌هاي اصلي هنگام استفاده از مدل‌هاي پيچيده در شناسايي تقلب، عدم تفسيرپذيري در مورد نحوه كار و چرايي تصميم‌گيري مدل‌ها است. به طور خاص، هنگام كار با داده‌هاي حساس در دامنه‌هاي امنيتي، ارائه توضيحات مؤثر به كاربران سيستم از اهميت بالايي برخوردار است و به يك الزام اخلاقي و قانوني در بسياري از حوزه‌هاي كاربردي تبديل شده‌است. ما در اين پژوهش، از طريق به‌كارگيري الگوريتم‌هاي تفسيرپذيري SHAP و LIME ، به ارائه «يك چارچوب انتخاب ويژگي تفسيرپذير» با رويكرد گروهي مي‌پردازيم. در اين پژوهش سعي شد، چارچوب پيشنهادي بر روي تركيبات مختلف از بهترين مدل‌ها در كارهاي پيشين اِعمال و نتايج آن با ساير الگوريتم‌هاي انتخاب ويژگي به صورت كمي و كيفي مقايسه گردد. ارزيابي كمي چارچوب «X-SHAoLIM» بر روي تركيبات مختلف از مدل‌هاي منتخب در كارهاي پيشين، نشان داد به‌كارگيري چارچوب پيشنهادي در مرحله انتخاب ويژگي‌ها، به صورت ميانگين باعث افزايش دقت مدل‌ها، براساس معيارهاي درستي (+5/6)، فراخواني (+5/1)، معيار F (+5/3) و AUC (+75/6) شده و در مقايسه با ساير الگوريتم‌هاي انتخاب ويژگي بهترين عملكرد را به ارمغان مي‌آورد. در كنار افزايش دقت مدل‌ها، چارچوب پيشنهادي، به دليل به‌كارگيري الگوريتم‌هاي SHAP و LIME، قابليت بيش‌تري در تفسيرپذيري و تحليل «نوع اثر ويژگي‌ها» داشته و امكان ارائه توضيحات مؤثر به كاربران سيستم را فراهم مي‌آورد.

تاريخ ورود اطلاعات

1402/06/16

عنوان به انگليسي

Fraud detection in financial data by using of ensemble learning methods

تاريخ بهره برداري

7/18/2024 12:00:00 AM

دانشجوي وارد كننده اطلاعات

سجاد عليزاده فرد

Name: سجاد عليزاده فرد
Author: سجاد عليزاده فرد

چكيده به لاتين

Fraud in financial data is always a serious concern for business organizations and individuals. Applying manual checks to detect fraud is time-consuming and expensive. Proper fraud detection allows investigators to take timely action and prevent further fraud and financial losses. One of the main step in the fraud detection process is the feature selection step, which has an important impact on the accuracy and execution time of the models. Credit card transactions typically have many features. Some features may not be meaningful to the classifiers or lead to overfitting. In addition, having fewer duplicated features leads to a better understanding of the classifier decision. Also, feature selection can increase the "speed of classifiers" due to reducing the size of the feature set and "classifier performance" due to avoiding overfitting. One of the main challenges when using complex models in fraud detection is the lack of “explainability” about how the models work and why they make decisions. In particular, when working with sensitive data in security domains, providing effective explanations to system users is of great importance and has become an ethical and legal requirement in many applied fields. In this research, we present "an explainable feature selection framework" based on ensemble approach. In this work, we applyed the proposed framework on different combinations of best models in the previous works and compared its results with other feature selection algorithms quantitatively and qualitatively. Quantitative eva‎luation of the "X-SHAoLIM" framework on different combinations of best models in previous works showed that the use of the proposed framework in the feature selection step, on average, increases the accuracy of the models, based on precision(+5.6), recall (+ 1.5), F-Score (+3.5) and AUC (+6.75) and compared to other feature selection algorithms, it brings the best performance. In addition to increasing the accuracy of the models, the proposed framework, due to the use of explainable algorithms like SHAP and LIME, has more capability in interpretibility and analysis of features importance on the model predictions and provides effective explanations to the system users.

كليدواژه هاي فارسي

كشف تقلب , يادگيري ماشين , انتخاب ويژگي‌ها , يادگيري گروهي , تفسيرپذيري , داده‌كاوي

كليدواژه هاي لاتين

fraud detection , machine learning , feature selection , ensemble learning , explainability , data mining

Author

sajjad alizadeh fard

SuperVisor

hossein rahmani

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=28680&Field=0&DTC=6