سيدعليرضا موسويان اناركي

عنوان

توسعه و اجراي روش‌هاي داده‌كاوي با ايجاد خوشه‌هاي قوي، متعادل و تفسيرپذير جهت اتخاذ سياست‌هاي مناسب

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي صنايع- مديريت مهندسي

سال تحصيل

1398

تاريخ دفاع

1400/8/22

استاد راهنما

عبدالرحمن حائري

دانشكده

مهندسي صنايع

چكيده

داده‌كاوي فرايندي سيستماتيك و ابزاري قدرتمند براي تجزيه وتحليل داده‌ها و استخراج اطلاعات پنهان از ميان داده‌ها جهت حل مسائل كسب و كار است. روش‌هاي طبقه‌بندي و خوشه‌بندي از مهم‌ترين و پركاربردترين روش‌هاي داده‌كاوي هستند. K-means از شناخته‌شده‌ترين روش‌هاي خوشه‌بندي است كه نسبت به اوليه‌سازي مراكز و انتخاب تعداد خوشه‌ها بسيار حساس هست و در ايجاد ساختار متعادل خوشه‌ها (به‌طور خاص براي مجموعه داده‌هايي با ابعاد بالا) نيز مشكل دارد. از آنجا كه رابطه بين روش‌هاي خوشه‌بندي و روش‌هاي كاهش بعد بسيار نزديك است، كاهش بعد منجر به انتخاب مراكز اوليه در فضاي كوچك‌تري مي‌شود. به همين دليل در پژوهش حاضر، مدلي تركيبي و دوطرفه از K-means و PCA ارائه شده است كه خوشه‌هايي با قابليت تفسيرپذيري بالا ايجاد مي‌كند. همچنين از آنجا كه خوشه‌بندي متعادل (اندازه مساوي هر خوشه) در توليد زيرفضاي تصادفي متدهاي طبقه‌بندي تركيبي نقش مهمي دارد، تا به حال بهبودهايي از منظر اندازه، واريانس و تراكم در آن رخ داده است اما هيچ‌گاه به ايجاد خوشه‌هاي قوي (تعداد مساوي داده از هركلاس در هر خوشه) و متعادل‌سازي از منظر كيفي (به‌عنوان يك مفهوم جديد) توجه نشده است. از اين رو بر اساس يك رويكرد خوشه‌بندي با محدوديت تعادل و قدرت، به ايجاد خوشه‌هاي متعادل و قوي جهت بهبود عملكرد طبقه‌بندي و يادگيري عميق تركيبي از منظر دقت و تنوع پرداخته شده است به‌گونه‌اي كه با در نظر گرفتن كلاس داده‌ها، خوشه‌هاي متعادل و قوي را ايجاد كرده و از انحراف و مغرضانه‌شدن نتايج نسبت به يك كلاس خاص جلوگيري كرده است. از منظر متعادل‌سازي كيفي نيز دو روش خوشه‌بندي متعادل كيفي سخت (با محدوديت تعادل) و خوشه‌‌بندي متعادل‌ كيفي تركيبي نرم (بر محور تعادل) و سخت ارائه شده است كه با ايجاد يك معيار ارزش، خوشه‌هايي را ايجاد مي‌كند كه با كمترين اندازه بيشترين ارزش را دارا هستند يا ارزشي برابر با ساير خوشه‌ها دارند. در نهايت در كنار توسعه روش‌هاي داده‌كاوي، تركيب روش‌هاي خوشه‌بندي K-means و PAM نيز حركتي جديد به سمت برچسب‌گذاري خودكار داده‌هاي عددي و پردازش تصوير خوشه‌هاي حاصله با استفاده از SVD را ايجاد كرده است و پياده‌سازي متدهاي توسعه‌داده‌شده و تركيبي در قالب CRISP-DM علاوه بر دستيابي به اهداف داده‌كاوي (بهبود معيارهاي كمي، كيفي، برچسب‌گذاري و تفسيرپذيري در خوشه‌بندي و دقت و تنوع در طبقه‌بندي و يادگيري عميق تركيبي در مقايسه با متدهاي گذشته)، اهداف كسب و كاري در حوزه‌هاي مختلف (به‌طور ويژه منابع انساني و انرژي) را نيز مدنظر قرار داده است.

تاريخ ورود اطلاعات

1400/09/27

عنوان به انگليسي

Development and implementation of data mining methods by creating strong, balanced, and interpretable clusters to adopt appropriate policies

تاريخ بهره برداري

11/13/2022 12:00:00 AM

دانشجوي وارد كننده اطلاعات

سيدعليرضا موسويان اناركي

Name: سيدعليرضا موسويان اناركي
Author: سيدعليرضا موسويان اناركي

چكيده به لاتين

Data mining is a systematic process and powerful tool for analyzing data and extracting latent information, patterns, and useful knowledge from a huge amount of raw data in order to solve business issues. Classification and Clustering are the main data mining techniques. The K-means algorithm is a popular clustering method, which is sensitive to the initialization of samples and selecting the number of clusters. Also, it has consistently failed to produce a balanced cluster structure and its performance on high-dimensional datasets has considerably influenced. Principal component analysis (PCA) is a linear dimensionless reduction method that is closely related to the K-means algorithm. Dimension reduction leads to the selection of initial centers in a smaller space, which is a solution to solve initialization problems. The present study investigates the reciprocal relationship between K-means and PCA and adopts an innovative approach of creating sub-datasets and applying step-by-step labeling. The clusters that are obtained from this approach are of high interpretability. The other application of clustering in generating random subspace has improved the accuracy and diversity of ensemble classification methods. If clusters are not balanced (unequal size of clusters) and not strong (unequal number of data from each class in each cluster), the results will deviate from classes with more samples in each cluster and thereby will be biased. While changes in cardinality, variance, and density have arisen due to the importance of balancing in different fields, balancing has never been viewed from both strong and qualitative viewpoints. Therefore, the present study takes a new look at cluster balancing by presenting: 1. novel strong balance-constrained clustering (SBCC) or hard-strong clustering (HSC), 2. Soft and hard hybrid qualitative balanced clustering (SHHQBC), 3. And an innovative hard balanced (Balance-Constrained) clustering method to establish clusters with the highest value (balancing criterion) with the least cardinality. Finally, the automatic labeling of numerical data by a hybrid of K-means and partitioning around medoids (PAM) clustering algorithms with image-processing of cluster plots by singular value decomposition (SVD) is presented that can revolutionize clustering. The research process is implemented as the CRISP-DM methodology to underline the fact that both business (especially in human resource and energy) and data mining objectives have been achieved successfully.

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=25712&Field=0&DTC=6