محمدرضا فرخ

عنوان

ارائه يك سازوكار آگاه به منابع جهت مديريت كشساني سامانه‌هاي توزيع‌شده‌ي پردازش جريان داده‌ها

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

كامپيوتر

سال تحصيل

1397

تاريخ دفاع

1400/08/30

استاد راهنما

دكتر محسن شريفي

دانشكده

كامپيوتر

چكيده

سامانه‌هاي فناوري اطلاعات با حجم فزاينده‌اي از داده‌ها‌ مواجه هستند كه به‌طور پيوسته توسط كاربردها و رسانه‌هاي ديجيتال توليد مي‌شوند. بخش بزرگي از اين داده‌ها به‌عنوان جريان داده‌ها‌ي متوالي توليد مي‌شوند. پردازش جريان داده‌ها اجازه مي‌دهد تا حجم زيادي از داده‌هاي‌ گذرا درلحظه پردازش شوند. سامانه‌هاي پردازش جريان براي پردازش اين جريان‌ها به‌صورت در لحظه ايجادشده‌اند تا برداشت ارزشمندي از داده‌ها‌ به دست آورند. سامانه‌هاي پردازش جريان داده‌ها با هدف پردازش در لحظه‌ي داده‌هاي جرياني توسعه‌يافته‌اند. پردازش داده‌ها توسط عملگر را مي‌توان به‌صورت گراف‌هاي جهت‌دار يك‌طرفه مدل و موازي¬سازي نموده و بخش¬هاي موازي را توسط گره‌هاي محاسباتي مختلف در يك خوشه‌ي محاسباتي متشكل از چند كامپيوتر اجرا نمود. جهت اطمينان از توان عملياتي بالا و زمان‌پاسخ پايين با حجم انبوه داده‌ها‌، سامانه‌هاي پردازش جريان به سازوكارهاي كارا براي زمان‌بندي عملگرها نياز دارند. پياده‌سازي اين سازوكار با چالش‌هايي همراه است. سازوكارهاي زمان‌بندهاي موجود براي پردازش جريان داده‌ها در خوشه‌هاي محاسباتي متشكل از كامپيوترهاي ناهم‌سان از كارايي ضعيفي برخوردارند. در برخورد با اين مسئله چندين روش وجود دارد. روش اول بدست آوردن زمان‌بندي بهينه باتوجه به ظرفيت‌هاي محاسباتي گره‌ها و توجه به نياز پردازشي عملگرها و ميزان ارتباط ميان آن‌ها و روش دوم تغيير مقياس يا استفاده از ويژگي كشساني است. در روش دوم، اگر زمان‌بندي اوليه زمان‌بندي مناسبي نباشد، با استفاده‌ي بيش ‌از حد از خاصيت كشساني (تغيير مقياس) مواجه مي‌شويم كه باعث افت كارايي سامانه، بالا رفتن زمان‌پاسخ و افزايش ميزان خطا به دليل پيكربندي مجدد مي‌شود. لذا در اين پايان‌نامه با هدف كمينه كردن زمان‌پاسخ سامانه¬هاي پردازش داده‌هاي جرياني، يك سازوكار زمان‌بندي آگاه به منابع بر پايه بهينه‌سازي كلوني مورچگان ارائه شده است. اين سازوكار داراي سه مرحله است كه در مرحله نخست بجهت كاهش ارتباطات بين گره¬هاي محاسباتي و كاهش زمان هم‌گرايي الگوريتم بهينه‌سازي كلوني مورچگان، از الگوريتم بسته‌بندي براي زمان‌بندي بخشي از عملگرها كه داراي بيشترين ميزان ارتباط با يكديگرند استفاده مي‌شود. در مرحله بعد مابقي عملگرهايي را كه توسط الگوريتم بسته‌بندي زمان‌بندي نشده‌اند به‌وسيله الگوريتم كلوني مورچگان زمان‌بندي مي‌كنيم. با توجه به اينكه ماهيت الگوريتم بهينه‌سازي كلوني مورچگان مبتني‌برتكرار است، مرحله دوم به صورت متناوب اجرا مي‌شود و به‌صورت تكاملي زمان‌بندي¬هاي خود را با هدف دستيابي به زمان پاسخ سريع‌تر تعديل مي‌نمايد. در مرحله نهايي اين الگوريتم با رسيدن به هم‌گرايي و يافتن زمان‌بندي سريع‌تر عملگرها متوقف مي‌شود. سازوكار پيشنهادي بر روي 2.1.0 Apache Storm پياده¬سازي شده است. با اجراي كاربرد استاندارد شمارنده كلمات نشان داده‌ شده است كه سازوكار زمان‌بندي پيشنهادي در مقايسه با زمان‌بند پيش¬فرض و زمان‌بند آگاه به منابع موجود در Storm، حدود 50 درصد بهبود در زمان پاسخگويي داشته است.

تاريخ ورود اطلاعات

1400/12/23

عنوان به انگليسي

A Resource-Aware Mechanism for Managing the Elasticity of Distributed Data Stream Processing Systems

تاريخ بهره برداري

11/21/2022 12:00:00 AM

دانشجوي وارد كننده اطلاعات

محمدرضا فرخ

Name: محمدرضا فرخ
Author: محمدرضا فرخ

چكيده به لاتين

Information technology systems are facing an increasing amount of data that is continuously generated by digital applications and media. Much of this data is generated as a continuous data stream. Data stream processing allows large volumes of transient data to be processed instantly. Stream processing systems are designed to process these streams in a real-time manner to obtain valuable insights. Data stream processing systems have been developed for the purpose of real-time data processing. The data processing by the operator can be modeled and parallelized in one-way directional graphs and parallel sections can be executed by different computational nodes in a computational cluster consisting of several computers. The data processing by the operator can be modeled as directed acyclic graph (DAG) and parallel sections can be executed by different computational nodes in a computational cluster consisting of several computers. To ensure high throughput and low response time with large volumes of data, stream processing systems need efficient mechanisms for scheduling operators. Implementing this mechanism is fraught with challenges. Existing scheduling mechanisms for processing data streams in computational clusters consisting of heterogeneous computers have poor performance. There are several ways to deal with this issue. The first method is to obtain the optimal scheduling according to the computational capacities of the nodes and to pay attention to the processing needs of the operators and the degree of communication between them, and the second method is to change the scale or use elastic mechanism. In the second method, if the initial scheduling is not appropriate, we encounter excessive use of elasticity (scaling) which reduces system performance, increases response time and increases the amount of error due to reconfiguration. Therefore, in this dissertation, with the aim of minimizing the response time of stream data processing systems, a resource-aware scheduling mechanism based on ant colony optimization is presented. This mechanism has three stages. In the first stage, in order to reduce the connections between computational nodes and reduce the convergence time of the ant colony optimization algorithm, the bin-packing algorithm is used to schedule the part of the operators that have the most relationship with each other. In the next step, the other operators that are not scheduled by the packing algorithm are scheduled by the ant colony algorithm. Given that the nature of the ant colony optimization algorithm is iterative, the second step is performed iteratively and evolutionarily adjusts its schedules to achieve a faster response time. In the final stage, this algorithm is stopped by achieving convergence and finding faster scheduling of operators. The proposed mechanism is implemented on Apache Storm 2.1.0. By implementing the standard word counter application, it has been shown that the proposed scheduling mechanism has improved by about 50% in response time compared to the default and resource-aware scheduling available in Storm.

كليدواژه هاي فارسي

پردازش جريان داده‌ها , زمان‌بندي , زمان‌پاسخ , آگاه به منابع , خوشه هاي محاسباتي ناهم‌سان

كليدواژه هاي لاتين

Data Stream Processing , Scheduling , Elasticity , Response Time , Resource-Awareness , Resource Heterogeneity

Author

mohammadreza farrokh

SuperVisor

dr. mohsen sharifi

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=26278&Field=0&DTC=6