شكيبا شه‌بندگان

عنوان

ارائه يك الگوريتم تكاملي براي انتخاب ويژگي در مجموعه داده‌هاي نامتعادل مقياس بزرگ با نمونه‌هاي كم

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي برق- الكترونيك ـ ديجيتال

سال تحصيل

1399-1400

تاريخ دفاع

1400/04/13

استاد راهنما

هادي شهريار شاه‌حسيني

دانشكده

دانشكده مهندسي برق

چكيده

يكي از مهم‌ترين مراحل پيش‌پردازش داده در زمينه داده‌كاوي و يادگيري ماشين، انتخاب ويژگي است كه هدف آن حذف ويژگي‌هاي غيرمرتبط و زائد احتمالي از مجموعه داده است. انتخاب ويژگي يك مسئله‌ي بهينه‌سازي چند هدفه تلقي مي‌شود كه اهداف آن عبارتند از كاهش تعداد ويژگي‌هاي انتخاب شده و افزايش دقت طبقه‌بندي. انتخاب ويژگي در مجموعه داده‌هايي كه داراي مشخصاتي همچون تعداد زياد ويژگي‌ها و تعداد نمونه‌هاي كم هستند به چالش بزرگ‌تري تبديل مي‌شود. از آن جايي كه استفاده از روش‌هاي قطعي براي حل مسئله انتخاب ويژگي از پيچيدگي محاسباتي بالايي برخوردار است، بسياري از محققان به الگوريتم‌هاي تكاملي روي آورده‌اند. در اين پايان‌نامه يك الگوريتم تكاملي ارائه مي‌شود كه به كمك آن عمليات انتخاب ويژگي در داده‌هاي نامتعادل با ابعاد بالا و تعداد نمونه‌هاي كم با دقتي زياد انجام مي‌شود. اين الگوريتم تركيبي از دو الگوريتم ازدحام سالپ و تكامل تفاضلي است و از مولفه‌هايي همچون جمعيت اوليه نخبه و تابع تبديل نمايي برخوردار است. عملكرد اين الگوريتم بر روي 8 مجموعه داده ريزآرايه آزمايش شده است كه نتايج حاكي از آن است كه الگوريتم ارائه شده در مقايسه با ساير الگوريتم‌هاي تكاملي از جمله الگوريتم ژنتيك، الگوريتم ازدحام سالپ، الگوريتم شاهين هريس و الگوريتم گرگ خاكستري عملكردي مطلوب دارد. به عبارت دقيق‌تر الگوريتم ارائه شده قادر است ميزان ابعاد مجموعه داده‌هاي مورد آزمايش را از 84.95٪ در داده Colon تا 99.13٪ در داده Ovarian كاهش دهد، در حالي كه دقت طبقه‌بندي از 89.76٪ در داده Colon تا 99.26٪ در داده SRBCT متغير است. همچنين در اين پايان‌نامه يك ابزار انتخاب ويژگي در پايتون معرفي مي‌شود كه به وسيله‌ي آن مي‌توان الگوريتم‌هاي تكاملي مختلفي براي حل مسئله‌ي انتخاب ويژگي طراحي نمود. به كمك اين ابزار مي‌توان علاوه‌ بر استفاده از الگوريتم‌هاي پيش‌فرض موجود، الگوريتم‌هاي جديدي براي بخش‌هاي مختلف از جمله نحوه‌ي ايجاد جمعيت اوليه، روش تبديل به دودويي و نحوه‌ي جستجو طراحي نمود و ميزان دقت و ميزان كاهش ابعاد را مشاهده نمود.

تاريخ ورود اطلاعات

1400/05/03

عنوان به انگليسي

An Evolutionary Algorithm for Feature selection in Imbalanced Large-Scale Sparse Data Sets

تاريخ بهره برداري

1/1/1900 12:00:00 AM

دانشجوي وارد كننده اطلاعات

شكيبا شه بندگان

چكيده به لاتين

Feature selection is an important preprocessing step in data mining and machine learning which aims to eliminate repetitive and redundant features from the dataset. Feature selection is considered a multi-objective optimization problem in which the main objectives are reducing the number of features and increasing the classification accuracy. In high dimensional datasets with a large number of features and a small number of samples, feature selection becomes a bigger challenge. Since definitive methods of solving multi-objective problems have high computational costs, many researchers employ evolutionary algorithms for solving these problems. In this thesis, an evolutionary algorithm called SSADE is proposed to solve the feature selection problem in unbalanced datasets with a large number of features and a small number of samples. SSADE is a combination of two well-known evolutionary algorithms, Salp Swarm Algorithm (SSA) and Differential Evolution (DE), and includes important components such as an elite initial population and a quadratic transfer function. To test the performance of the proposed method, 8 microarray datasets are used. Results are then compared with other competitive evolutionary algorithms such as Genetic Algorithm, Grey Wolf Optimizer, Salp Swarm Algorithm and Harris Hawk Optimizer. Experiments show that SSADE is able to reduce the dimensions of the dataset from 84.95% to 99.13% while classification accuracy varies between 98.76% and 99.26%. In addition, in this thesis a feature selection tool box in python, called EvoFS, is proposed using which researchers can design evolutionary algorithms to solve the feature selection problem. With EvoFS, different sections of an evolutionary algorithm such as the initial population, transfer function and search strategy can be designed and the proposed algorithm can be tested in terms of classification accuracy and dimension reduction.

كليدواژه هاي فارسي

كاهش ابعاد داده , داده‌هاي ريزآرايه , يادگيري ماشين , الگوريتم فراابتكاري

كليدواژه هاي لاتين

dimension reduction , microarray datasets , machine learning , metaheuristic algorithm

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=25063&Field=0&DTC=6