فاطمه حسين پورگرايلي

عنوان

ارائه يك رويكرد جديد مبتني بر روشهاي يادگيري ماشين به منظور كشف تقلب در فرايند وام دهي بانكها

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

سيستم هاي كلان

سال تحصيل

1399

تاريخ دفاع

1402/07/01

استاد راهنما

علي رضا علي احمدي جشفقاني

دانشكده

صنايع

چكيده

چكيده امروزه، پيش‌بيني تقلب در فرآيند وام‌دهي و شناسايي متقاضيان وام با ريسك بالا يكي از مسائل مهم و چالش برانگير در صنعت بانكداري محسوب مي‌شود. اين امر موجب پيدايش سيستم‌هاي كشف تقلب در فرآيند وام‌دهي بانك‌ها شده‌است. تاكنون انواع مختلفي از سيستم‌هاي كشف تقلب معرفي شده‌است كه معمولاً در آن‌ها از رو‌ش‌هاي يادگيري ماشين جهت تشخيص تقلب استفاده شده‌اند. در سيستم‌هاي كشف تقلب مبتني بر الگوريتم‌هاي يادگيري ماشين، از مجموعه‌داده‌هاي نسبتاً بزرگي جهت آموزش مدل‌هاي طبقه‌بندي استفاده مي‌شود كه معمولاً شامل تعداد زيادي ويژگي هستند. از آن‌جا كه برخي از اين ويژگي‌‌‌ها زائد هستند، استفاده از كل ويژگي‌ها براي آموزش و يادگيري مدل‌هاي طبقه‌بندي نه تنها باعث افزايش دقت آن نمي‌گردد بلكه موجب افزايش پيچيدگي مدل و زمان اجراي آن و همچنين كاهش كارايي آن مي‌گردد. بنابراين، همانطور كه در مطالعات صورت گرفته در اين زمينه هم قابل مشاهده است، انتخاب ويژگي‌هاي مهم و شناسايي ويژگي‌هاي زائد و حدف آن‌ها از مجموعه‌داده‌هاي با ابعاد بالا در عملكرد مدل‌هاي طبقه‌بندي بسيار موثر مي‌باشد و باعث مي‌گردد زمان يادگيري مدل كاهش و دقت طبقه‌بندي آن افزايش يابد. انتخاب ويژگي، يك تكنيك پيش پردازش مهم و تاثيرگذار در حل مسائل طبقه‌بندي است كه هدف اصلي آن انتخاب يك زيرمجموعه بهينه از كل ويژگي‌هاي مجموعه‌داده است به طوري كه بيشترين دقت طبقه‌بندي را با كمترين تعداد ويژگي داشته باشد. بنابراين، در اين تحقيق، يك رويكرد جديد براي كشف تقلب در فرآيند وام‌دهي پيشنهاد مي‌شود كه مبتني بر انتخاب ويژگي و الگوريتم‌هاي يادگيري تركيبي است. رويكرد پيشنهادي از چهار مرحله اصلي تشكيل شده‌است: در مرحله اول، مجموعه‌داده بارگذاري و پيش‌پردازش‌هاي اوليه صورت مي‌پذيرد. در مرحله دوم، ويژگي‌هاي مهم توسط يك الگوريتم بهينه‌سازي فراابتكاري به‌نام Improved-BNNA-SA از كل ويژگي‌هاي مجموعه‌داده انتخاب مي‌شوند. سپس در مرحله سوم، مدل‌‌‌‌هاي طبقه‌بندي نظير ماشين بردارهاي پشتيبان(SVM)، شبكه هاي عصبي مصنوعي(ANNs) و درخت تصميم توسط ويژگي‌هاي انتخاب شده در مرحله قبل آموزش داده مي‌شوند. در پايان كشف تقلب و شناسايي متقاضيان وام با ريسك بالا توسط مدل‌هاي طبقه‌بندي آموزش داده شده انجام مي‌پذيرد. از روش رأي‌گيري براي تجميع پيش‌بيني‌هاي حاصل از سه مدل طبقه‌بند مذكور در مدل طبقه‌بندي تركيبي استفاده مي‌شود. عملكرد الگوريتم پيشنهادي Improved-BNNA-SA براي انتخاب ويژگي، ابتدا روي تعداد محدودي از مجموعه‌داده پايگاه UCI مورد ارزيابي قرار گرفته و نتايج حاصل از آن با ساير الگوريتم‌هاي مشابه انتخاب ويژگي، مقايسه مي‌شود. پس از اطمينان از عملكرد بهتر الگوريتم پيشنهادي، اين الگوريتم روي مجموعه داده بانكي كه مسئله اصلي تحقيق مي‌باشد براي انتخاب ويژگي به كار گرفته شده‌است. از مجموعه ويژگي‌هاي انتخاب شده توسط اين الگوريتم جهت آموزش مدل‌هاي طبقه‌بندي استفاده شده‌است كه نتايج به‌دست آمده حاكي از عملكرد بهتر سيستم پيشنهادي براي كشف تقلب در فرآيند وام‌دهي مي‌باشد.

تاريخ ورود اطلاعات

1402/10/01

عنوان به انگليسي

Presenting a new approach based on machine learning methods to detect fraud in the lending process of banks

تاريخ بهره برداري

9/22/2024 12:00:00 AM

دانشجوي وارد كننده اطلاعات

فاطمه حسين پورگرايلي

Name: فاطمه حسين پورگرايلي
Author: فاطمه حسين پورگرايلي

چكيده به لاتين

Abstract: Today, predicting fraud in the lending process and identifying high-risk loan applicants is one of the most important and challenging issues in the banking industry. This has led to the emergence of fraud detection systems in the lending process of banks. So far, various types of fraud detection systems have been introduced, in which machine learning methods are usually used to detect fraud. In fraud detection systems based on machine learning algorithms, relatively large data sets are used to train classification models, which usually include a large number of features. Since some of these features are redundant, using all features for training and learning classification models not only does not increase its accuracy, but also increases the complexity of the model and its execution time, as well as reducing its efficiency. Therefore, as can be seen in the studies conducted in this field, the selection of important features and the identification of redundant features and their targets from high-dimensional datasets are very effective in the performance of classification models and reduce the learning time of the model and increase its classification accuracy. find Feature selection is an important and effective pre-processing technique in solving classification problems, the main purpose of which is to select an optimal subset of the entire dataset features so that it has the highest classification accuracy with the least number of features. Therefore, in this research, we propose a new approach to fraud detection in the lending process, which is based on feature selection and hybrid learning algorithms. The proposed approach consists of four main steps: In the first step, the dataset is loaded and pre-processed. In the second step, the important features are selected by a meta-heuristic optimization algorithm called Improved-BNNA-SA from the entire set of features. Then, in the third step, classification models such as support vector machines (SVM), artificial neural networks (ANNs) and decision trees are trained by the features selected in the previous step. At the end, fraud detection and identification of high-risk loan applicants is done by trained classification models. The voting method is used to aggregate the predictions obtained from the three classification models mentioned in the combined classification model. The performance of the proposed Improved-BNNA-SA algorithm for feature selection has been eva‎luated on a limited number of UCI database datasets and the results have been compared with other similar feature selection algorithms. After ensuring the better performance of the proposed algorithm, this algorithm has been applied to the bank dataset which is the main problem of this research for feature selection. The set of features selected by this algorithm has been used to train classification models, and the obtained results indicate the better performance of the proposed system to detect fraud in the lending process.

كليدواژه هاي فارسي

يادگيري ماشين , داده‌هاي كلان , داده‌كاوي , انتخاب ويژگي , الگوريتم‌هاي فراابتكاري , كشف تقلب , يادگيري عميق

كليدواژه هاي لاتين

Machine learning , big data , data mining , feature selection , meta-heuristic algorithms , fraud detection , deep learning

Author

Fatemeh Hosseinpourgraily

SuperVisor

Alireza Aliahmadi

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=30253&Field=0&DTC=6