چكيده به لاتين
Health systems of each country are among the most critical sectors that can distinguish a nation from others. Additionally, for countries with high-quality medical systems, they serve as a tourist attraction. The medical industry and its related sectors are characterized by complexities that differentiate them from other industries. Given that the two primary stakeholders, namely the government and the public, allocate substantial budgets annually to healthcare-related matters, this industry becomes attractive to fraudsters and scammers. As the need for medical systems requires individuals to spend part of their income on prevention, diagnosis, and treatment, these costs tend to rise for the elderly, individuals with various illnesses, and disabled persons compared to younger populations. Fraud within these systems leads to increased treatment costs for the public and heightens corruption in both public and private sectors. With advancements in technology, the advent of the information age, and the emergence of fields such as data science and artificial intelligence, along with the development of databases, fraud detection has become more feasible. This study examines various types of fraud present in most countries, including Iran. It also reviews supervised, unsupervised, semi-supervised, and hybrid algorithms used in fraud detection research in the health sector. Furthermore, it investigates the necessary datasets, evaluation metrics for machine learning models, and related processes, challenges, and topics in this domain. Due to the research gap in applying novel feature selection methods and the limited studies focusing on fraud committed by individuals, this research addresses these gaps by employing data preparation and cleaning techniques. Given the lack of labeled data, unsupervised methods are utilized. The proposed implementation and innovation in this study involve selecting features across datasets using methods such as Isolation Forest, PCA, FSFS, and ElasticNet. Subsequently, unsupervised algorithms, including K-means, K-modes, and DBSCAN, are applied to analyze and identify suspicious cases. Finally, the performance of these algorithms is evaluated and compared using Davies-Bouldin Index, Silhouette Score, and Calinski-Harabasz metrics. Based on the extracted list of suspicious behaviors, experts from the Social Security Organization compare these findings with the clustering results. The ultimate goal is to evaluate and compare expert systems and clustering methods to enhance the performance of fraud detection systems.