حميد هاديان

عنوان

ارائه‌ يك سازوكار زمان‌بندي آگاه به كيفيت خدمات جهت جايابي برخط عملگرها در سامانه‌هاي توزيع‌شده پردازش جريان داده‌ها

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

نرم افزار

سال تحصيل

1397

تاريخ دفاع

1399/11/7

استاد راهنما

دكتر محسن شريفي

دانشكده

كامپيوتر

چكيده

سامانه‌هاي پردازش جريان داده‌ها باهدف پردازش در لحظه‌ي داده¬هاي جرياني توسعه‌يافته‌اند. براي دستيابي به زمان پاسخ سريع¬تر، مي¬توان داده¬ها و پردازش داده¬ها توسط عملگرها را به‌صورت گراف‌هاي جهت‌دار يك‌طرفه مدل و موازي¬سازي نموده و بخش¬هاي موازي را توسط گره¬هاي محاسباتي مختلف در يك خوشه‌ي محاسباتي متشكل از چند كامپيوتر اجرا نمود. با توجه به متغير بودن نرخ جريان داده¬ها در زمان اجرا، زمان‌بندي برخط و پويا از طريق تغيير درجه‌ توازي عملگرها، به‌ويژه هنگام مواجهه با نرخ بالاي جريان داده¬ها و متعاقباً افزايش باركاري برخي گره¬هاي محاسباتي متولي اجراي عملگرها، بر كارايي سامانه‌هاي پردازش جريان داده‌ها تأثير بسزايي دارد. زمان‌بندهاي موجود براي پردازش برخط جريان داده¬ها در خوشه‌هاي محاسباتي متشكل از كامپيوترهاي ناهمسان از كارايي ضعيفي برخوردارند. جهت بهبود زمان پاسخ سامانه¬هاي پردازش داده¬ها، در در اين پايان¬نامه يك سازوكار زمان‌بندي برخط كشسان و با هدف افزايش كيفيت خدمت ارائه شده است. زمان پاسخ به عنوان يك معيار مهم كيفيت خدمت در نظر گرفته شده است. همچنين به دليل ناهمسان بودن منابع محاسباتي و نياز به تخمين قدرت پردازشي گره‌ها، سازوكار پيشنهادي آگاه به منابع است. بجهت كاهش ارتباطات بين گره¬هاي محاسباتي در زمان اجرا، نخست از الگوريتم بسته‌بندي براي زمان‌بندي عملگرها استفاده مي¬شود. هنگام مواجهه با باركاري سنگين، ابتدا گره‌هايي كه داراي مصرف بالاي پردازنده (گلوگاه) هستند شناسايي‌شده و جابجايي عملگرها بر اساس اعمال يك سياست سه مرحله¬اي تكثير، انتخاب و جابجايي به‌جاي سياست مرسوم تكثير و جابجايي صورت مي¬پذيرد. به‌جاي جابجايي عملگر گلوگاه، عملگرهايي با مقدار مصرف پردازنده‌ معادل با اين عملگر و با ارتباطات كمتر انتخاب و به گره¬هاي ديگر منتقل مي‌شوند. در ادامه براي انتخاب گره محاسباتي مناسب براي ميزباني عملگر‌ها از يك سياست سه مرحله‌اي اكتشاف، يادگيري و اصلاح استفاده مي¬شود. سازوكار زمان‌بندي پيشنهادي با استفاده از تكنيك يادگيري تقويتي به نحوي پياده‌سازي شده است كه بتواند با اعمال سياست¬هاي يادشده در زمان اجرا به‌صورت تكاملي زمان‌بندي¬هاي مرحله¬اي خود را به هدف دستيابي به زمان پاسخ سريع¬تر پردازش داده¬ها تعديل نمايد. سازوكار ديگري نيز به نام وظيفه‌ي محك براي كاهش زمان همگرايي روش يادگيري تقويتي پيشنهاد شده است. وظايف محك به‌صورت نوبه‌اي بر روي گره‌هاي محاسباتي اجرا شده و توان پردازشي موجود گره‌ها را اندازه¬گيري مي¬كنند. اين اطلاعات به زمان‌بند كمك مي¬كند كه كدام گره‌ها از لحاظ توان پردازشي در يك خوشه‌ي ناهمسان توانايي ميزباني عملگرهاي ديگر را دارند. همچنين بجهت جلوگيري از استفاده‌ي بيش ‌از حد از خاصيت كشساني (تغيير مقياس) كه باعث افت كارايي سامانه و افزايش ميزان خطا به دليل پيكربندي مجدد مي‌شود، از الگوريتم دسته‌بند Naïve Bayes استفاده شده است. سازوكار پيشنهادي بر روي نسخه 2.1.0 نرم‌افزار متن‌باز Apache Storm پياده¬سازي شده است. با اجراي برنامه¬هاي محك¬ استاندارد نشان داده شده است كه سازوكار زمان‌بندي پيشنهادي در مقايسه با زمان‌بند پيش¬فرض و زمان‌بند آگاه به منابع موجود در Storm، در هنگام مواجهه با نرخ بالاي ورود داده، حدود 70 درصد بهبود در زمان پاسخگويي داشته است.

تاريخ ورود اطلاعات

1399/11/21

عنوان به انگليسي

A QoS-Aware Scheduling Mechanism for Online Operator Placement in Distributed Data Stream Processing Systems

تاريخ بهره برداري

1/27/2022 12:00:00 AM

دانشجوي وارد كننده اطلاعات

حميد هاديان

Name: حميد هاديان
Author: حميد هاديان

چكيده به لاتين

Data stream processing (DSP) systems are developed in order to process huge amount of data streams in a real-time manner. In most popular Distributed Stream Processing (DSP) systems, user applications are modeled as directed acyclic graphs that can be parallelized and mapped to distributed cluster computing nodes so as to decrease response time. In the face of varying workloads that impact the overall throughput, online scheduling enables DSP systems to increase the parallelization degree of operators and transfer them to other worker nodes. Adapting the application parallelism at runtime is quite challenging that impacts the overall throughput and user application response time. Most of the schedulers of existing DSP systems perform poorly on clustered heterogeneous nodes. To mitigate this problem, we present an online scheduling mechanism that includes a novel scheduler for elastic and resource-aware scheduling of clustered nodes especially heterogeneous ones to minimize the mean response time. Operators are initially scheduled on computing (worker) nodes using the bin-packing algorithm in such a way to reduce their communication latencies. Upon detection of a bottleneck operator overutilizing a worker node, our proposed scheduler adopts a new 3-step policy for replication, selection, and relocation of operators by which a group of operators running on this node that have the least communications and consume computational resources equal to the bottleneck operator is selected and relocated to other candidate worker nodes, and the bottleneck operator is simultaneously replicated in the worker node, somehow scaling out operators to harness the input workload. For finding a worker node in a cluster to host operators, a 3-phased policy for Discovering, Learning, and Correcting scheduling assignments on worker nodes is adopted, which uses the reinforcement learning technique to enable the scheduler to correct its decisions through evolutionary steps in order to achieve the minimum response time. To lower the high convergence time, a benchmarking mechanism is devised that runs periodically on all worker nodes and gathers their latest load information, providing a near real estimate for selecting candidate worker nodes. The Naïve Bayes classifier is used to avoid too frequent scaling decisions that can result in excessive reconfiguration downtime and system performance degradation. The proposed mechanism is implemented by extending the standard Apache Storm. Running the Storm standard benchmarks, it is shown that the proposed scheduler outperforms the existing resource-aware scheduler and the default scheduler of Storm by at least 70% when encountering heavy workloads in terms of reduction in response time.

كليدواژه هاي فارسي

پردازش جريان داده‌ها , زمان‌بندي برخط , كشساني , كيفيت خدمات , خوشه¬هاي محاسباتي ناهمسان

كليدواژه هاي لاتين

Data Stream Processing , Online Scheduling , Elasticity , QoS , Resource Heterogeneity

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=23180&Field=0&DTC=6