هادي محمدزاده عباچي

عنوان

گسسته سازي خصيصه هاي پيوسته در كلان داده وبستر اپاچي اسپارك

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

هوش مصنوعي

تاريخ دفاع

۱۳۹۶/۱۲/۱۴

استاد راهنما

دكتر محمدرضا كنگاوري

دانشكده

كامپيوتر

چكيده

گسسته سازي داده‌هاي پيوسته يكي از مراحل پيش‌پردازش مهم در علم اكتشاف دانش و داده‌كاوي محسوب مي‌شود. مدل گسسته سازي مقادير پيوسته را به گسسته و بازه‌اي تبديل مي‌كند. برخلاف متد‌‌هاي بدون ناظر كه از قوانين ابتدايي براي گسسته سازي استفاده مي‌شود، الگوريتم‌هاي با ناظر با به كاربردن برچسب كلاسي در طول گسسته سازي اغلب به‌دقت بالايي دست پيدا مي‌كنند. الگوريتم‌هاي با ناظر با دو چالش مهم روبه‌رو مي‌شوند. اولاً اينكه برچسب كلاسي نويزي بر روي دقت روند گسسته سازي تأثيرگذار است. ثانياً محاسبات زياد الگوريتم‌هاي گسسته سازي با ناظر در ابعاد بالا باعث كاهش سرعت خواهد بود. از طرفي مهم‌ترين مسئله اين است كه فرايند گسسته سازي در داده‌هاي با مقياس بالا و محيط‌هاي پيچيده ازجمله حوزه كلان داده با چالش‌هايي نظير دقت و سرعت كه در تقابل هم هستند قرار مي‌گيرد. در اين پايان‌نامه براي حل چالش‌هاي پيشرو از يك الگوريتم بدون ناظر با پيچيدگي خطي مبتني بر تحليل‌هاي آماري كولموگروف –اسميرنوف استفاده خواهيم كرد. هدف مدل گسسته سازي طراحي‌شده رسيدن به رويكردهاي گسسته سازي با ناظر با دقت بالا و پيچيدگي كمتر است.

تاريخ ورود اطلاعات

1397/03/08

تاريخ بهره برداري

5/29/2018 12:00:00 AM

دانشجوي وارد كننده اطلاعات

هادي محمدزاده عباچي

Name: هادي محمدزاده عباچي
Author: هادي محمدزاده عباچي

چكيده به لاتين

Discretization of numerical data is one of the most effective data preprocessing tasks both in knowledge discovery and data mining. Discretization models convert numerical values into categorical intervals. Unlike unsupervised methods that use simple rules to discretize continuous attributes, supervised discretization algorithms take the class label of attributes into consideration to achieve high accuracy. Supervised discretization process on continuous features encounters two significant challenges, Firstly, noisy class labels affect the effectiveness of discretization. Secondly, due to the high computational time of supervised algorithms in large-scale datasets and Big Data environment, efficiency would be decreased. Accordingly, to address the challenges, we devise a statistical unsupervised method named as SUFDA. In fact, SUFDA can empirically achieve a low temporal complexity. On the one hand, it can maintain the trade off between accuracy and efficiency concurrently. On the other hand, our discretization model targets high accuracy of supervised approaches. We conduct a comprehensive performance evaluation on multiple publicly available datasets. The results show that our unsupervised system obtains a better effectiveness compared to other discretization baselines. At the same time, we prove that our model gains a better time complexity compared to the supervised approaches.

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=18930&Field=0&DTC=6