بابك نوري مقدم

عنوان

بهبود روش يادگيري دسته‌جمعي براي دسته‌بندي اشيا و به‌كارگيري آن در حوزه كلان داده

مقطع تحصيلي

دكتري تخصصي

رشته تحصيلي

مهندسي فناوري اطلاعات-تجارت الكترونيك

سال تحصيل

1393

تاريخ دفاع

1400/04/19

استاد راهنما

مهدي غضنفري

استاد مشاور

محمد فتحيان

دانشكده

مهندسي صنايع

چكيده

پيشرفت‌هاي دنياي ديجيتال در عصر فناوري اطلاعات سبب تحولات اساسي در زندگي روزمره بشر گشته است. امروز حجم انبوهي از انواع داده‌ها با سرعت سرسام‌آوري توليد و ذخيره مي‌شوند. داده هاي امروزي مانند داده هاي ميكروآرايه DNA داراي ويژگي هاي همچون: نامتوازن بودن، تعداد ويژگي هاي بسيار زياد و ... مي باشند. نحوه مديريت چنين داده‌هاي سبب پيدايش بحثي با عنوان كلان داده‌ شده است. در ميان يكي از روش‌هاي تحليل داده، مدل يادگيري جمعي است كه امروزه به دليل ويژگي‌هاي همچون قابليت تعميم بالا، ساختار منعطف و .... در ميان محققين از اقبال خوبي برخوردار شده و مطالعات بسياري در اين حوزه در حال انجام است. در اين رساله، در راستاي بهبود كارايي يادگيري جمعي در دسته بندي داده ها كه بر بستر كلان داده ها نيز قابليت پياده سازي داشته باشد، ابتدا به مرور ادبيات و شناسايي شكاف هاي موجود در آن پرداخته شد. شناسايي شكاف ها، زمينه مناسبي جهت مطالعه برروي انواع روش هاي ساخت مدل هاي يادگيري جمعي را فراهم آورد. سپس جهت توسعه مدل يادگيري جمعي پيشنهادي، يك رويكرد توسعه اي گام به گام در نظر گرفته شد. نتايج هر گام براساس معيارهاي متنوع سنجش و براي توسعه گام بعدي مورد استفاده قرار گرفته است. ايجاد تنوع در دسته بندهاي پايه و كاهش ابعاد در اولين گام بعنوان چالش در نظر گرفته شد كه براي حل آن يك راهكار لفاف مبتني بر الگوريتم فراابتكاري چندهدفه جنگل ارائه شد. جهت بهبود فرآيند جستجو در روش لفاف يك الگوريتم فراابتكاري جديد چندهدفه جنگل با در نظر گرفتن مفاهيم آرشيو، انتخاب منطقه‌اي نظريه آشوب و... ارائه شده است، كه توانايي رقابت با ساير روش هاي مشابه از نظر اهداف كاهش ابعاد و بهبود كارايي دسته بندي، داراي پيچيدگي زماني كمتري نيز مي باشد. در ادامه براي مواجهه داده هاي با ابعاد بالا، راهكار تركيبي مبتني بر فيلترچندگانه و لفاف چندهدفه ارائه شد. جهت كاهش فضاي جستجوي روش لفاف، يك رويكرد فيلترچندگانه جديد با در نظر گرفتن تركيب روش هاي تك متغيره و چند متغيره ارائه شده است كه با كاهش فضاي جستجو امكان انتخاب ويژگي هاي برجسته و مهم را براي روش لفاف فراهم مي آورد. از طرفي روش لفاف چندهدفه علاوه بر حل مسئله انتخاب ويژگي، پارامترهاي مدل دسته بندي را بهينه سازي مي كند. در سومين گام جهت ساخت مدل يادگيري جمعي، راهكاري نوآورانه اي براي انتخاب اعضاي سازنده مدل يادگيري جمعي از ميان پاسخ هاي جبهه پارتو و تركيب نتايج اعضاي سازنده براي ارائه خروجي نهايي ارائه داده شد. در گام نهايي مدل يادگيري جمعي پيشنهادي براي افزايش مقياس پذيري و توانايي مواجهه با داده هاي حجيم بر بستر اكوسيستم هدوپ پياده سازي شد. ارائه مدل يادگيري جمعي مبتني بر انتخاب ويژگي و كاهش ابعاد بر بستر كلان داده جزء مهمترين نوآوري هاي اين تحقيق مي باشد. نتايج آزمايشات، مويد مزاياي رويكرد پيشنهادي در استفاده از پردازش موازي و همچنين بهبود كيفيت دسته بندي براساس دقت و صحت دسته-بندي مي باشد.

تاريخ ورود اطلاعات

1400/06/22

عنوان به انگليسي

Improving ensemble learning methods for Classification of objects and their application in Big Data Domain

تاريخ بهره برداري

1/1/1900 12:00:00 AM

دانشجوي وارد كننده اطلاعات

بابك نوري مقدم

Name: بابك نوري مقدم
Author: بابك نوري مقدم

چكيده به لاتين

Advances in the digital world and the age of information technology have brought about fundamental changes in human daily life. Today, large volumes of data are generated and stored at a very high speed. Today's data, such as DNA microarray data, have features such as imbalance data, a large number of features, and so on. The way of managing these types of data has given rise to the issue of big data. The ensemble learning model is one of the data analysis methods, which is very popular among researchers today due to high generalizability, flexible structure, etc., and many studies are being conducted in this field. In this dissertation, to improve the efficiency of ensemble learning in data classification, which can also be implemented on the big data platform, the literature was reviewed, and the gaps in it were identified. Identifying gaps provided a good basis for studying various methods of constructing ensemble learning models. Then, a step-by-step development approach was considered to develop the proposed ensemble learning model. The results of each step are measured based on various criteria and used to develop the next step. Creating diversity in base classifiers and reducing dimensions was considered a challenge in the first step. Creating diversity in base classifiers and reducing dimensions was considered a challenge in the first step. To address this challenge, a wrapper method based on a multi-objective metaheuristic forest algorithm was proposed. The proposed new metaheuristic algorithm can compete with other similar methods in terms of dimension reduction and classification efficiency and has less time complexity. For dealing with high-dimensional data, a hybrid solution based on multi-filter and the multi-objective wrapper was presented in the next step. Utilizing a multi-filter as a preprocessing step reduces the search space of the wrapper method, which results in the selection of the most prominent features. On the other hand, in addition to solving the feature selection problem, the multi-objective wrapper method optimizes the parameters of the classification model. In the third step, a new approach for selecting the base classifiers of the ensemble model from the Pareto Front was proposed. Furthermore, a meta-learner scheme was introduced to combine the results of the base classifier and provide the final output of the ensemble model. In the final step, the proposed ensemble learning model was implemented on the Hadoop ecosystem to increase its scalability and ability to deal with large amounts of data. The results of the experiments confirm the advantages of the proposed approach in using parallel processing and improving the quality of classification in terms of accuracy and precision.

كليدواژه هاي فارسي

مدل يادگيري جمعي , كلان داده , مدل دسته بندي , انتخاب ويژگي , الگوريتم فراابتكاري چندهدفه , فيلترچندگانه , روش لفاف

كليدواژه هاي لاتين

Ensemble Learning Model , big data , classification model , Feature selection , Multi-objective metaheuristic algorithms , Multi-filter , Wrapper Methods

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=25210&Field=0&DTC=6