محمدحسن عليزاده سرشوري

عنوان

تشخيص تقلب در سيستم هاي سلامت با استفاده از الگوريتم هاي يادگيري ماشين (مطالعه موردي : سازمان تامين اجتماعي)

مقطع تحصيلي

كارشناسي ارشد ناپيوسته

رشته تحصيلي

مهندسي صنايع- مديريت مهندسي

سال تحصيل

1400

تاريخ دفاع

1403/07/14

استاد راهنما

دكتر مهدي غضنفري

استاد مشاور

دكتر محمدرضا رسولي

دانشكده

واحد نور

چكيده

سيستم هاي سلامت هر كشور يكي از مهم ترين بخش هايي مي باشد كه امكان تمايز آن كشور از سايرين را دارد هم چنين براي كشورهايي كه از سيستم هاي پزشكي با كيفيتي برخوردار هستند يكي از جاذبه هاي توريستي نيز مي باشد. صنعت پزشكي و صنايع مرتبط به آن داراي پيچيدگي هايي مي باشند كه آن ها را از ساير صنايع مجزا مي كند و با توجه به اينكه همواره دو ذي نفع اصلي آن يعني دولت و مردم بودجه و هزينه هاي زيادي را هر ساله به موضوعات مربوط به پزشكي تخصيص مي دهند اين صنعت براي متقلبين و كلاهبرداران داراي جذابيت است. با توجه به اينكه نياز به سيستم هاي پزشكي بخشي از درآمد افراد را براي پيشگيري، تشخيص و درمان صرف مي شود كه اين هزينه براي افراد مسن، افراد داراي انواع بيماري ها و افراد ناتوان نسبت به افراد جوان تر رو به افزايش مي باشد. تقلب هاي موجود در اين سيستم باعث افزايش هزينه هاي درمان براي مردم مي شود و همچنين افزايش فساد در بخش دولتي و خصوصي را در پي دارد. با توجه به پيشرفت تكنولوژي و ورود به دوره فناوري اطلاعات و ظهور علومي همچون علم داده، هوش مصنوعي و همچنين توسعه پايگاه هاي داده امكان كشف تقلب را فراهم كرده است. در اين پژوهش انواع گوناگون تقلب هاي موجود در بيشتر كشور ها از جمله ايران مورد بررسي قرار گرفته و همچنين انواع الگوريتم هاي نظارت شده، نظارت نشده، نيمه نظارت شده و تركيبي به كار رفته در پژوهش هاي كشف تقلب در سيستم سلامت داشته و همچنين به بررسي داده هاي مورد نياز و انواع معيار هاي ارزيابي مدل هاي يادگيري ماشين پرداخته و به شرح فرآيند و موضوعات مرتبط و چالش هاي موجود در اين حوزه نيز پرداخته شده است. در اين پژوهش به منظور وجود شكاف تحقيقاتي در استفاده از روش هاي انتخاب ويژگي جديد و همچنين انجام بيشتر تحقيقات با تمركز بر تقلب هاي صورت گرفته از سمت مردم در اين تحقيق با تمركز بر موارد مطرح شده در شكاف تحقيقاتي پس از آماده سازي و تميز كردن داده ها با تكنيك هاي متنوع داده كاوي به دليل عدم وجود برچسب در داده ها مي بايست از روش هاي نظارت نشده استفاده شود روش پياده سازي و نوآوري در نظر گرفته شده براي اين پژوهش به اين صورت است كه ابتدا با روش هاي Isolation Forest ،ّPCA، FSFS، ElasticNetبه انتخاب ويژگي بين دسته داده پرداخته و سپس با استفاده از الگوريتم هاي نظارت نشده K-means ,K-modes و DBSCAN به تحليل يافتن تعداد موارد مشكوك پرداخته و نهايتا با معيار هايDavies Bouldin Index, Silhouttee Score, Calinski-Harabasz به مقايسه و بررسي عملكرد بين الگوريتم هاي نظارت نشده مي پردازد وسپس با توجه به ليست استخراج شده از رفتارهاي مشكوك با كمك دانش كارشناسان سازمان تامين اجتماعي به مقايسه بين ليست مذكور وخوشه بندي هاي انجام شده مي پردازيم هدف نهايي بررسي و مقايسه سيستم هاي خبره و خوشه بندي به منظور عملكرد بهتر در كشف تقلب مي باشد.

تاريخ ورود اطلاعات

1403/09/26

عنوان به انگليسي

Fraud Detection in Healthcare Systems Using Machine Learning Algorithms(Case Study: Social Security Organization)

تاريخ بهره برداري

1/1/1900 12:00:00 AM

دانشجوي وارد كننده اطلاعات

محمدحسن عليزاده سرشوري

Name: محمدحسن عليزاده سرشوري
Author: محمدحسن عليزاده سرشوري

چكيده به لاتين

Health systems of each country are among the most critical sectors that can distinguish a nation from others. Additionally, for countries with high-quality medical systems, they serve as a tourist attraction. The medical industry and its related sectors are characterized by complexities that differentiate them from other industries. Given that the two primary stakeholders, namely the government and the public, allocate substantial budgets annually to healthcare-related matters, this industry becomes attractive to fraudsters and scammers. As the need for medical systems requires individuals to spend part of their income on prevention, diagnosis, and treatment, these costs tend to rise for the elderly, individuals with various illnesses, and disabled persons compared to younger populations. Fraud within these systems leads to increased treatment costs for the public and heightens corruption in both public and private sectors. With advancements in technology, the advent of the information age, and the emergence of fields such as data science and artificial intelligence, along with the development of databases, fraud detection has become more feasible. This study examines various types of fraud present in most countries, including Iran. It also reviews supervised, unsupervised, semi-supervised, and hybrid algorithms used in fraud detection research in the health sector. Furthermore, it investigates the necessary datasets, eva‎luation metrics for machine learning models, and related processes, challenges, and topics in this domain. Due to the research gap in applying novel feature selection methods and the limited studies focusing on fraud committed by individuals, this research addresses these gaps by employing data preparation and cleaning techniques. Given the lack of labeled data, unsupervised methods are utilized. The proposed implementation and innovation in this study involve selecting features across datasets using methods such as Isolation Forest, PCA, FSFS, and ElasticNet. Subsequently, unsupervised algorithms, including K-means, K-modes, and DBSCAN, are applied to analyze and identify suspicious cases. Finally, the performance of these algorithms is eva‎luated and compared using Davies-Bouldin Index, Silhouette Score, and Calinski-Harabasz metrics. Based on the extracted list of suspicious behaviors, experts from the Social Security Organization compare these findings with the clustering results. The ultimate goal is to eva‎luate and compare expert systems and clustering methods to enhance the performance of fraud detection systems.

كليدواژه هاي فارسي

كشف تقلب , يادگيري ماشين , سيستم سلامت , انتخاب ويژگي ها , الگوريتم هاي نظارت نشده

كليدواژه هاي لاتين

Fraud Detection , Machine Learning , Healthcare System , Feature selection , Unsupervised Algorithms

Author

Mohammad Hassan Alizadeh Sarshory

SuperVisor

Prof. Mehdi Gazanfari

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=31728&Field=0&DTC=6