مجتبي اميري مسكوني

عنوان

الگوريتم هرس خودكار تجميعي مبتني بر خوشه بندي

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

هوش مصنوعي و رباتيك

تاريخ دفاع

۱۳۹۶

استاد راهنما

دكتر محمدرضا كنگاوري

دانشكده

كامپيوتر

تاريخ ورود اطلاعات

1397/02/15

تاريخ بهره برداري

5/5/2018 12:00:00 AM

دانشجوي وارد كننده اطلاعات

مجتبي اميري مسكوني

Name: مجتبي اميري مسكوني
Author: مجتبي اميري مسكوني

چكيده به لاتين

Abstract: In the BigData area, the available data to perform the classification task have grown with the high-speed rate. As a result, there is a lot of needs for algorithms that can make the classifications of the huge data set. One possible solution is the use of parallelization to reduce the amount of time spent. Some of the Ensemble methods has the ability of parallelism in the training phase, which makes it a good tool for managing BigData. Ensemble learning is a machine learning approach where multiple learners are trained to solve a particular problem. Random Forest is an ensemble learning algorithm which comprises numerous decision trees and nominates a class through majority voting for classification and averaging approach for regression. The prior research affirms that the learning time of the Random Forest algorithm linearly increases when the number of trees in the forest augments. This large number of decision trees in the Random Forest can cause certain challenges. Firstly, it can enlarge the model complexity, and secondly, it can negatively affect the efficiency of large-scale datasets. Hence, ensemble pruning methods (e.g. Clustering-based) are devised to select a subset of decision trees out of the forest. The main challenge is that the prior clustering-based models require the number of clusters as input. To solve the problem, we devise an Automatic clustering based pruning model (Auto 􀀀 BC) for Random Forest which can automatically find the proper number of clusters. Our proposed model is able to obtain an optimal subset of trees that can provide the same or even better effectiveness compared to the original set. Auto 􀀀 BC has two components: clustering and selection. First, our algorithm utilizes a new clustering technique to classify homogeneous trees. In selection part, it takes both accuracy and diversity of the trees inside each of the exploited clusters into consideration to choose the best tree. Extensive experiments are conducted on five datasets. The results show that the out-put of our pruning algorithm can perform the classification task more effectively than the state-ofthe- art rival. Keywords: machine learning, ensemble models, random forest, pruning method, ensemble pruning

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=18929&Field=0&DTC=6