چكيده به لاتين
In recent years, due to the rapid progress of e-commerce, the use of credit cards has increased dramatically. Unfortunately, the fraudulent use of credit cards has also become an attractive source of fraud for fraudsters and faces challenges for banks and other financial institutions that export credit cards. Therefore, the use of detection methods in this area is very important. On the other hand, due to the expansion of electronic networks and increased transactions, it is very difficult to handle data by humans. Data mining methods make it somewhat easier by using knowledge recognition in user and customer behavior patterns. Several data mining techniques in the field of fraud detection can be used which, from the perspective of the type of learning, are in the form of two generic groups of supervised (classification) and unsupervised (clustering). According to previous studies, in this research, a combination of both methods has been used in the fraud detection model. The combined model, in addition to utilizing the advantages of both methods, also allows for the leveling of the methods while developing the model. Another issue in the field of credit card fraud detection is the costs of inaccurate detection of transaction type classes (normal and suspected fraud). Hence, in this research, an effective cost-sensitive measure that evaluates the effectiveness of the fraud detection model in terms of the cost of inaccurate detection is presented.
The data mining process was followed in accordance with the CRISP-DM standard process. In the modeling step, the proposed hybrid model was executed on the cards of several Iranian banks on the 19806 transactions. Modeling involves three stages, the first and second stages are based on unsupervised learning, and the last step is based on supervised learning. In the first stage of the model, after clustering the cardholders of two-step, k-means and kohonen methods, kohonen with optimal number of 10 clusters and silhouette index of 0.6 was considered as a Superior clustering method. Then, in the second step, the peer group analysis method was used to determine the data labels on the transactions of each cluster derived from the Kohonen method, and a total of 329 transactions from the entire data set were assigned a suspect tag. Finally, the labeled data set was used to train and test the methods of neural network classification, Bayesian network, decision tree, random forest, and support vector machine. Among the classification methods, the random forest with 99.93% accuracy and success rate of 99.89% in costs achieved the highest performance compared with other methods and considered as the best classifier.