چكيده به لاتين
In recent years, due to the importance of data-driven decision-making and its impact on industries, data collection and storage in the railway industry, like other industries, has undergone many changes. Nowadays, after every accident, a report is generated in the accident database, which contains 83 columns. These columns include numerical information such as the weight of the train, speed at the time of the accident, the duration of the blockage and the number of cars, and also include classified information such as the type of train, the type of accident, and the factors affecting the accident. In addition, there is an unstructured text column containing an explanation of the accident description in the accident database, which has not been investigated so far. The present study has been carried out in order to improve the safety of rail transportation and reduce railway accidents in the Islamic Republic of Iran using text mining techniques. The purpose of this research is to answer the question of whether it is possible to find a broader view of accidents and increase the accuracy of predicting their severity by using text analysis. For this purpose, first, pre-processing and preliminary analysis of the data of railway accidents in Iran from 2008 to 2018 has been done using statistical methods. Then the important columns of data have been selected and prepared for modeling with the aim of predicting the severity of the accident. Further, the results indicate that the use of the text column and its analysis leads to an increase in the accuracy of predicting the severity of accidents. Modeling and implementation of accident severity prediction algorithms has been done using CRISP_DM data mining methodology and Python programming language.