زهرا حياتي

عنوان

راهكاري جديد جهت توليد خودكار مجموعه آزمون با هدف بهبود مكانيابي آماري خطا مبتني بر تحليل علّي-آماري

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

نرم افزار

سال تحصيل

1396-1398

تاريخ دفاع

1399/03/26

استاد راهنما

دكتر سعيد پارسا

دانشكده

كامپيوتر

چكيده

در مسئله مكان¬يابي خطا، ممكن است حضور يك جمله در اجراهاي خطادار كاملاً متأثر از اشكال موجود در جمله ديگر باشد. همبستگي آماري ميان يك جمله و خروجي برنامه بيانگر رابطه علّي-معلولي ميان آن-ها است. بنابراين، مي¬توان به مسئله مكان‌يابي خطا از ديدگاه علّي-معلولي نگاه كرد. پژوهش¬هاي پيشين در زمينه تحليل علّي-آماري در محاسبه امتياز مظنون به خطايي، يك مدل رگرسيون خطي به ازاي هر دستورالعمل از برنامه ايجاد مي¬كنند، اين روش¬ها سربار محاسباتي و حافظه¬ بالايي دارند كه باعث عدم مقياس¬پذيري آن¬ها مي¬گردد. به منظور كاهش اين هزينه¬ها، مي¬توان استنتاج علّي-آماري را صرفاً روي زيرمجموعه كوچكي از جملات برنامه انجام داد. در روش ارائه شده ، در اين پايان¬نامه، ابتدا محدوده جملات مظنون به خطا را در قالب يك شاخه اجرايي مشخص مي¬نمايم. براي اين منظور در مسير اجرايي خطادار از انتها به ابتدا شرط¬ها را نقيض كرده، و با استفاده از حل¬كننده Z3 داده¬ آزمون را براي مسير توليد مي¬¬كنيم. سپس برنامه را مجدداً به¬صورت اجراي نمادين پويا، با داده آزمون به دست آمده اجرا مي¬كنيم. بدين ترتيب با توجه به نتيجه موفّق و يا ناموفّق بودن اجرا، مشخص مي¬كنيم كه كدام شاخه مظنون به خطا است. بنابراين، محدوده جملات براي اعمال روش علّي-آماري را به حداقل ممكن تقليل نموده¬ايم. در واقع براي اولين بار مسئله تعيين خودكار مكان خطا را به سه زير مرحله، يافتن مسير اجرايي خطادار، شاخه مظنون به خطا و بلاخره يافتن جملات مظنون به خطا در شاخه خطادار تبديل نموده¬ايم. داده¬هاي آزمون در اين روش به صورت هدفمند و با حداقلِ ممكن جهت تعيين شاخه خطادار و پس از آن جملات مظنون به خطاي درون شاخه مشخص شده، توليد مي¬گردد. از اين طريق مشكل روش¬هاي آماري كه تحت تأثير داده¬هاي آزمون هستند را توانسته¬ايم براي اولين بار از ميان برداريم. روش ارائه شده، بر روي مجموعه¬ آزمون¬ Defects4j مورد آزمايش قرار گرفت. براي ارزيابي ميزان بهبود روش، از معيارهاي رايجي همچون تعداد جملات مورد بررسي و دقّت مكان¬يابي خطا استفاده كرده¬ايم. نتايج به دست آمده نشان مي¬دهد كه روش پيشنهادي از نظر معيارهاي ارزيابي منتخب در مقايسه با ساير راهكارهاي مرتبط عملكرد بهتري دارد. در نهايت نشان داديم كه استفاده از تحليل علّي- آماري در كنار روش ارائه شده سبب افزايش دقّت در مكان¬يابي خطا خواهد شد.

تاريخ ورود اطلاعات

1399/10/09

عنوان به انگليسي

A new approach to automatic test data generation, targeted at improvement of fault localization based on causal statistical analysis

تاريخ بهره برداري

6/15/2020 12:00:00 AM

دانشجوي وارد كننده اطلاعات

زهرا حياتي

Name: زهرا حياتي
Author: زهرا حياتي

چكيده به لاتين

In fault localization process, the presence of a statement in faulty execution could be affected by the errors in other statements. The Fault Localization must to be investigated from a causal point of view, because statistical correlation between a statement and the Incorrect output can indicate the existence of a causal relationship between the two. Based on the previous research on causal-statistical analysis for estimating fault suspiciousness of a statement, a regression model for each statement from the program has been created. These methods have a high computational overhead and a high capacity memory, which makes them not scalable. Therefore, in order to reduce the cost of analysis, the causal-statistical inference be done solely on a small subset of the program statements. In this thesis, we first specify the range of fault suspiciousness of a statement in the form of an execution branch. For this purpose, we negate conditions of faulty execution path from end to beginning, and t we generate test data using the Z3 solver for the path. Then we execute the program again in the form of a concolic execution with test data. Thus, depending on the pass or fail the program execution, we determine which branch is fault suspiciousness. In this way, we minimize the range of statement for applying the causal-statistical approach. In fact, for the first time, we have convert the problem of automatically fault localization into three sub-steps, finding the faulty execution path, finding the faulty branch and finally finding the fault suspiciousness of a statement in the faulty branch. In this approach, Test data is generated purposefully and with the least possible amount to determine the faulty branch and then the fault suspiciousness of a statement within the faulty branch. In this way, we were able to eliminate for the first time the problem of dependence of statistical approach on test data. The proposed approach, TD-CAFL, was examined on Defects4j test suite. To show the degree of improvement of the approche, we have used common criteria such as the number of statements examined and the accuracy of the fault localization. The results show that the proposed approach performs better in terms of selected evaluation criteria compared to other related strategies. Finally, we showed that the use of causal-statistical analysis along with the proposed approach will increase the accuracy of fault localization.

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=22923&Field=0&DTC=6