سيده غزال مكي

عنوان

بهبود الگوريتم خوشه بندي براي داده هاي بزرگ بر اساس نگاشت-كاهش

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

نرم افزار

تاريخ دفاع

بهمن 1395

استاد راهنما

دكتر عين اله خنجري

دانشكده

كامپيوتر

چكيده

خوشه‌بندي داده‌ها يك تكنولوژي مهم داده‌كاوي است كه نقش مهمي را در برنامه‌هاي علمي متعدد ايفا مي‌كند. اگرچه خوشه‌بندي با رشد روز افزون دادهها خود چالشي مهم است. در همين حال، نگاشت- كاهش، پلت فرمي براي برنامه‌نويسي موازي است كه به طور گسترده‌اي در انواع زمينه‌هاي پردازش داده استفاده مي‌شود. در اين‌جا، ما الگوريتم خوشه‌بندي كارآمد توسط يك نگاشت-كاهش را طراحي مي‌كنيم. در الگوريتم خوشه‌بندي سنتي K-means مقداردهي اوليه براي تعداد خوشه K دشوار بود و مراكز خوشه اوليه به صورت تصادفي انتخاب مي‌شد كه اين موجب مشاهده نتايج خوشه‌بندي بسيار ناپايدار مي‌شود. همچنين، اين الگوريتمها حساس به نقاط نويز بودند. براي اين منظور و حل مشكلات، الگوريتم K-means سنتي را بهبود دادند. در روش بهبود يافته نقاط به شبكه‌ها در فضاي مشابه تقسيم شده‌اند، كه اين تقسيم‌بندي با توجه به اندازه نقاط داده و اختصاص آن به شبكه مربوطه و شمارش تعداد نقاط داده در هر شبكه است. ما در بهبود الگوريتم K-mean به صورت موازي و همراه با چارچوب نگاشت-كاهش عمل مي‌كنيم. تجزيه و تحليل نظري و نتايج تجربي نشان مي‌دهد كه الگوريتم بهبود يافته نسبت به الگوريتم خوشه‌بندي K-means سنتي نتايج با كيفيت بالاتر ، تكرار كمتر و ثبات خوبي را دارا مي‌باشند. نتايج نشان مي‌دهد كه افزايش سرعت و مقياس‌پذيري در الگوريتم‌هاي مورد بررسي كارآمد است. واژه‌هاي كليدي: DBSCAN، K-means، نگاشت-كاهش، سيستم موازي، تجزيه و آناليز خوشه‌بندي، شبكه

تاريخ ورود اطلاعات

1395/12/14

تاريخ بهره برداري

1/1/1900 12:00:00 AM

دانشجوي وارد كننده اطلاعات

سيده غزال مكي

Name: سيده غزال مكي
Author: سيده غزال مكي

چكيده به لاتين

Data clustering is an important data mining technology that plays a crucial role in numerous scientific applications. However, it is challenging due to the size of datasets has been growing rapidly to extra-large scale in the real world. Meanwhile, MapReduce is a desirable parallel programming platform that is widely applied in kinds of data process fields. here, we propose an efficient clustering algorithm by MapReduce paradigm. we adopt a quick partitioning strategy for large scale non-indexed data. The traditional K-means clustering algorithm is difficult to initialize the number of clusters K, and the initial cluster centers are selected randomly, this makes the clustering results very unstable. Meanwhile, algorithms are susceptible to noise points. To solve the problems, the traditional K-means algorithm is improved. The improved method is divided into the same grid in space, according to the size of the data point property value and assigns it to the corresponding grid. And count the number of data points in each grid. We will parallel the improved k-mean algorithm and combined with the MapReduce framework. Theoretical analysis and experimental results show that the improved algorithm compared to the traditional K-means clustering algorithm has high quality results, less iteration and has good stability. Results for algorithms of here reveal that the speedup and scaleupof our work are very efficient. Keywords: DBSCAN; MapReduce; parallel system; Cluster analysis, K-means, Grid

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=16865&Field=0&DTC=6