چكيده به لاتين
Abstract:
Proper and early diagnosis of disease is one of the urgent needs of the medical community for the correct and timely treatment. This issue becomes even more important when faced with different illnesses with similar symptoms. Each of them requires proper diagnosis, and the correct treatment, for the person's actual illness. One of these diseases is thyroid disease, which has similar symptoms to a number of diseases, including cardiovascular disease. The techniques of data mining and machine learning are reliable and valuable methods, that can enhance the ability of physicians to correctly diagnose and treat the illness. The main goal of this research is to extract rules of thyroid disease, create the features and analyze the use of filter-based, wrapper based and the genetic algorithm feature selection to select the most effective features on disease identification and selection of the best approach with balancing the data. The analyse also performed using decision trees models, random forest, bagging, boosting, and stacking methods for diagnosis and improvement of the precision of classes of illness, that including Hypothyroidism and Hyperthyroidism, performance evaluation was performed with 4 metrics accuracy, precision, recall and f-measure. This research was conducted on data from the University of California (UCI), which included 7200 records with 21 features. Experimental results showed that the genetic algorithm produced maximum efficiency in feature selection, and the boosted tree with created features produced maximum f-measure among other classifier.