چكيده به لاتين
As a key industry, the insurance industry plays an influential role in the economic lifestyle and quality of life of a country’s people. A considerable percentage of a country’s current assets are allocated to this industry every year to compensate for people’s financial damages and loss of life. A variety of individuals, emphasizing the injured individual, are involved in the process of establishing the insurance damages. On the other hand, the dynamic nature of the insurance fraud approaches facilitates committing fraud in this industry. Thus, the insurance companies are constantly at risk of insurance fraud such that disregarding this risk can even lead to their bankruptcy. In recent years, the development of data storage and information management systems pertinent to each case of damage provided the opportunity to used artificial intelligence and data mining-based approaches to detect various types of fraud. Despite providing disparate approaches in this regard by the researchers, there are several neglected methods that can be investigated and compared. In light of that, this research seeks to investigate and assess several ensemble and non-ensemble classification methods to detect fraud in the auto insurance industry. Therefore, the single methods of the decision tree, support vector machine (SVM), logistic regression, K- nearest neighbor, as well as ensemble methods of Adaboost, LogitBoost and Adaboost, and multilayer perceptron network-based deep learning method were examined. To assess the aforesaid methods, the labeled data of auto insurance damages of Razi Insurance company, Iran, were used. Considering the imbalanced classification of the extracted data, the SMOTE method was used to deal with this issue. Each of the classification methods was assessed once with and once without using this method. Among the non-ensemble algorithms, the SVM was the most accurate method. Among the ensemble learning algorithms, the random forest, Adaboost, and RUSBoost enjoyed equal accuracy. Concerning identification criteria, SVM performed better than logistic regression and K- nearest neighbor among the non-ensemble algorithms. With regard to ensemble learning algorithms, the random forest, Adaboost, and RUSboost algorithms demonstrated equal performance regarding this criterion. The deep-learning algorithm demonstrated poor performance in comparison to ensemble learning and SVM methods. Random forest, Adaboos, and RUSboost algorithms performed better concerning F1-Score and G-Mean criteria. However, the deep neural network algorithm had a better performance than other algorithms. The performance of this algorithm was similar to ensemble learning methods concerning F1-Score and G-Mean criteria. Moreover, it performed better in identification than other methods.