بشري پيشگو

عنوان

ارايه يك روش تركيبي براي پردازش داده هاي دسته اي - جرياني

مقطع تحصيلي

دكتري تخصصي (PhD)

رشته تحصيلي

مهندسي كامپيوتر- هوش مصنوعي و رباتيك

سال تحصيل

1392

تاريخ دفاع

1401/8/21

استاد راهنما

احمد اكبري ازيراني

استاد مشاور

بيژن راحمي

دانشكده

مهندسي كامپيوتر

چكيده

پردازش كلان داده از يك سو نيازمند كشف الگوهاي رفتاري داده مبتني بر حجم وسيع داده هاي پيشين مي باشد و از سوي ديگر مي بايست وفق پذير بوده و بي درنگ عمل نمايد. عمليات كشف الگو از طريق روش هاي يادگيري دسته اي امكان پذير است اما اين روش ها به دليل نياز به زمان بالاي يادگيري، نمي توانند به شناسايي بي درنگ الگوهاي جديد پرداخته و وفق پذير عمل نمايند. در مقابل، روش هاي جرياني، سابقه محدودي از الگوهاي پيشين را بررسي مي نمايند اما قادر به شناسايي در زمان كوتاه هستند. لذا با تركيب هوشمندانه دو نوع پردازش دسته اي و جرياني در قالب پردازش تركيبي، مي توان مزاياي هر دو روش را تجميع و به پردازش هايي بي درنگ و دقيق بر روي حجم بالاي داده دست يافت. بيشتر تحقيقات صورت گرفته در حوزه تطبيق و سازگار ساختن الگوريتم هاي يادگيري ماشين با پردازش تركيبي، تنها محدود به طراحي زيرساخت مناسب براي تحقق معماري تركيبي و بهره گيري از آن در كاربردهاي مختلف مي باشد و به جنبه هاي الگوريتمي پردازش هاي تركيبي دسته اي-جرياني نظير مشخصات الگوريتم هاي يادگيري دسته اي و جرياني سازگار با اين پردازش ها، مدل هاي ارتباط بين واحدهاي پردازشي دسته اي و جرياني و قواعد تركيب نتايج لايه هاي پردازشي مختلف، پرداخته نشده است. به منظور رفع چالش مذكور در اين رساله راهكاري تركيبي، توزيع شده و سازگار با الگوريتم هاي يادگيري ماشين با عنوان HDBS ارايه مي دهيم و بر جنبه هاي الگوريتمي پردازش هاي تركيبي تاكيد مي نماييم. پس از آن به بررسي چالش انتخاب ويژگي سازگار با پردازش هاي تركيبي دسته اي-جرياني مي پردازيم. از آنجا كه پردازش هاي تركيبي به طور كلي با جريان داده اي سر و كار دارند، لذا بهره گيري از روش هاي انتخاب ويژگي دسته اي براي آنها عملياتي نمي باشد. از سويي ديگر، گرچه روش هاي انتخاب ويژگي جرياني قابليت بكارگيري در پردازش هاي تركيبي را دارند ليكن هيچ يك از روش هاي موجود از ظرفيت پردازش هاي تركيبي براي انتخاب ويژگي استفاده نمي نمايند. لذا در بخش دوم رساله، با بهره گيري از روش انتخاب ويژگي به صورت پويا و سازگار با پردازش هاي تركيبي و نيز انتخاب هوشمندانه مدل هاي پايه جهت تركيب، راهكار EHDBS را به عنوان نسخه توسعه يافته HDBS پيشنهاد مي نماييم. ارزيابي هاي صورت گرفته بيانگر موثر بودن راهكار پيشنهادي در افزايش دقت و سرعت پردازش تركيبي داده ها نسبت به پردازش هاي منفرد دسته اي و جرياني و نيز پردازش هاي تركيبي بدون بهره گيري از انتخاب ويژگي پويا مي باشد.

تاريخ ورود اطلاعات

1401/10/21

عنوان به انگليسي

A Hybrid Solution for Batch-Stream Data Processing

تاريخ بهره برداري

1/1/1900 12:00:00 AM

دانشجوي وارد كننده اطلاعات

بشري پيشگو

Name: بشري پيشگو
Author: بشري پيشگو

چكيده به لاتين

Big data analysis requires features of both categories: it must discover the pattern of data behaviors based on a large volume of previous data; and, at the same time, be adaptive and act operationally in real time. Combining batch and stream processing into a hybrid approach can aggregate the advantages of both methods, creating processes that are capable of real time, high-speed calculations on a large volume of data. The process of data pattern recognition is possible through batch learning techniques but they cannot be adaptive and identify new patterns in real time. In contrast, stream processing operates in real time and its learning techniques are fast and incremental. Stream learning techniques are able to be influenced by recent data but their accuracy are usually less than batch learning methods. Therefore, by combining batch and stream processing in the form of hybrid processing, the advantages of both methods can be aggregated with each other and achieved processes that are capable of real-time and high-speed calculations on the large volume of data. Most of the researches in the field of adapting and making machine learning algorithms compatible with hybrid processing are only limited to the design of suitable infrastructure for the realization of hybrid architecture and its use in various applications. This is despite the fact that the algorithmic aspects of hybrid processing such as the specifications of batch and stream learning algorithms compatible with these processes, communication models between batch and stream processing units, and rules for combining the results of different processing layers have not been addressed. To solve these challenges, in this thesis we provide a combined solution, distributed and compatible with machine learning algorithms called HDBS, and focus on the algorithmic aspects of hybrid processing After that, we will focus on the challenge of selecting features compatible with hybrid processing. Because hybrid processing generally deals with data streams, batch feature selection techniques are not practical for them. On the other hand, although stream feature selection techniques can be used in hybrid processing, none of the existing methods use the capacity of hybrid processing to select appropriate features, so they are not compatible with the nature of hybrid processing. Therefore, in the second part of the thesis, we propose the EHDBS solution as an Enhanced version of HDBS by using a dynamic and compatible feature selection method with hybrid processing, as well as the intelligent selection of basic models for combination. The eva‎luations show that the proposed solution is effective in increasing the accuracy and speed of hybrid processing compared to individual batch and stream processing as well as hybrid processes without using dynamic feature selection.

كليدواژه هاي فارسي

پردازش ها تركيبي دسته اي-جرياني , انتخاب ويژگي پويا , تحليل كلان داده

كليدواژه هاي لاتين

Batch-Stream Hybrid Processing , Dynamic Feature selection , Bigdata Analysis

Author

Boshra Pishgoo

SuperVisor

Dr. Akbari

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=27698&Field=0&DTC=6