چكيده به لاتين
Cancer is currently one of the most prevalent diseases that affect a large number of individuals annually and leads to death. This disease is sometimes caused by factors we have been unaware of, and the consequence of this is the enduring pain and suffering of cancer. Prevention, early diagnosis, cost reduction, time and energy saving, and predicting the future status of the patient based on the current situation, are crucial aspects in this field. Integrating machine learning tools with the concept of quality of life, which encompasses various dimensions of everyday life, addresses these objectives when implemented in the cancer domain.
In this research, the impact of 113 quality-of-life features from the Golestan Cohort Study with 50,059 samples on cancer incidence has been investigated using machine learning tools aided by Python and R programming languages, and analyzed in software environments such as Spyder, Jupyter, Google Colab, and RStudio. After exploratory data analysis and employing methods for feature selection and dimensionality reduction, such as stepwise method and principal components, 14 machine learning methods, including logistic regression, decision trees, random forests, support vector machines, and neural networks, were evaluated using four performance metrics: accuracy, precision, sensitivity, and specificity.
Finally, the random forest method with an accuracy rate of 89% ± 0.005, precision of 94.1% ± 0.007, sensitivity of 84% ± 0.009, and specificity of 94.4% ± 0.006 had the best performance among the models and led us to the desired goals. In the analysis of some regression coefficients, various physical activities, increased welfare level, Education and training, reduced consumption of fried foods, and increased consumption of fresh and local fruits and vegetables were identified as important factors that reduce the risk of cancer. Aging is also recognized as a significant and influential factor in cancer incidence.