چكيده به لاتين
Passenger Train Delays are among the most important challenges for rail systems around the world. Train delays impose a huge cost on passengers and operators, contributing to the inefficiency of train operations. The aim of this research is to predict passenger train delays in Iranian Railways using Data Mining techniques. The results of this project are used to design train timetables. The CRISP-DM data mining methodology is used for this project.
The data used in this study includes a database of passenger train delays from 2013 to 2017, including 319,081 records.
The data preparation process involves integrating train delays data, correcting mistakes of the database fields, adding new features to the database and removing Outliers. Independent variables for prediction model include year, month, day, day of the week, departure time, axis, train type, car type, origin and destination of the train and the train owner.
In order to model prediction of train delay, two kind of prediction, named Numerical and Classification are used on entire database in Spss Modeler 18.0. Neural network and C5.0 methods are used for classification prediction. The TwoStep clustering method is used to divide the delay field into three labels. Regression, CHAID and neural network methods are used for numerical prediction. To evaluate prediction results, we divide existing passenger train delays data set into two subsets called training set and test set, in which 75% of the data is the training set and 25% is the test set. The results show that in numerical prediction, neural network method and in prediction by classification, C5.0 method has higher accuracy than other methods, therefore, these two techniques have been used to predict the train delays of year 2018. Numerical prediction is used by grouping some database fields. The results show that the prediction by grouping has higher accuracy than the prediction for the entire database.