چكيده به لاتين
This dissertation intents to predict the bike sharing customer churn using data mining techniques to improve the conditions of shared transportation, reduce traffic and environmental costs and help businesses providing services related to shared transportation. Due to the fact that in recent years, bike sharing transportation systems have been launched in large cities in Iran, and due to problems, such as air pollution, traffic and high transportation costs in metropolitan areas, customer churn prediction is an important issue. In this research, first, using CRISP-DM methodology, the stages of research were determined, then the requirements were collected based on the opinion of experts and reviewed literature. After clearing the data set in both nominal and numerical modes, a logistic regression model was developed to identify the variables with most effect on customer dropout. The effectiveness of each variable has been tested by Wald statistic.
In the next step, five machine learning algorithm including Neural Networks, Decision Tree, Random Forest, Naive Bayes and Support Vector Machine were used to predict customer churn. Then, with the help of confusion matrix, the performance of each algorithm has been evaluated. In the validation stage, cross-validation was used and then with the help of t-test, the difference in accuracy of the models was tested. In the next phase, the studied urban area was segmented and based on the section obtained, the most geographical area of each user's trip is identified, then the previous models and steps were implemented again.
Results show that the number of successful trips, age and duration of active account are the first three factors that affect the turn away of customers. Also, It has been found that Nueral Network model has the highest accuracy among other models, followed by Decision Tree, Random Forest, Naive Bayes and SVM, and according to the t-test it is clear that the difference between all models' accuracy are considerable except for Decision Tree and Random Forest. In addition, geographic characterization has increased the accuracy of the models, which means that geographical areas are taken into account in predicting customer churn of bike sharing systems. As previous case in this study, Neural Networks outperforms other models, and, again, the the accuracy difference between models is significant except for Decision Tree and Random Forest.