مسعود اختيارزاده

عنوان

پيشنهاد نام متد و كلاس بر اساس شباهت كد منبع

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي كامپيوتر- نرم‌افزار

سال تحصيل

1398

تاريخ دفاع

1401/7/18

استاد راهنما

سعيد پارسا

دانشكده

دانشگاه علم و صنعت ايران - واحد نور

چكيده

نام¬هاي بكار رفته در كد منبع نرم‌افزار به‌خصوص كلاس¬ها و متدها نقش بسزايي در خوانايي آن دارند. نام‌گذاري كلاس¬ها و متدهاي موجود دركد منبع برنامه‌ها بايد گوياي مفهوم و هدف آن كلاس و متد باشد. نام‌گذاري متد يكي از مهم¬ترين عوامل در درك برنامه است كه مستقيماً بر ويژگي¬هاي كيفيت مانند خوانايي، آزمايش پذيري و قابل‌فهم بودن تأثير مي¬گذارد. شناسه¬هاي تعيين‌شده توسط توسعه¬دهندگان 70 % از يك برنامه را تشكيل مي¬دهند. در ميان شناسه¬هاي مختلف، نام متدها به توجه بيشتري نياز دارد زيرا متدها كوچك¬ترين قطعات معني¬دار و قابل‌استفاده مجدد از كد هستند. ازاين‌رو،كمك به توسعه¬دهندگان در انتخاب نام متدهاي مناسب به‌صورت خودكار، هزينه و زمان توسعه را كاهش مي¬دهد. اين پايان‌نامه استفاده از معيارهاي نرم‌افزار را به‌عنوان ويژگي¬هايي براي نشان دادن عملكرد و ساختار متد و كلاس¬هايي كه بايد مقايسه شوند، پيشنهاد مي¬كند. ما از معيارهاي كد منبع براي بردار كردن متدها و كلاس¬ها در مجموعه بزرگي از پروژه¬هاي باكيفيت استفاده مي‌كنيم و همچنين با استفاده از الگوريتم TF-IDF شباهت متني را محاسبه مي¬كنيم. بردارها بيشتر به‌عنوان معياري براي پيشنهاد نام¬هاي مناسب براي يك متد يا كلاس معين استفاده مي¬شوند. براي اين منظور، الگوريتم K-Nearest Neighbor براي تعيين k-مشابه‌ترين متدها يا كلاس¬ها به متدها يا كلاس¬هاي موردنظر به شيوه بدون نظارت، استفاده مي¬شود. ما فرمولي را براي گنجاندن شباهت‌هاي متدها در كلاس‌هاي مربوطه به‌عنوان يك عامل مؤثر بر امتياز شباهت نهايي ارائه مي‌كنيم. آزمايش‌ها روي 800 پروژه جاوا با نزديك به 4000000 متد تأييد مي¬كند كه رويكرد پيشنهادي ما، مؤثرتر و كارآمدتر از رويكردهاي پيشرفته، Code2Vec و Code2seq است. در ارزيابي¬هاي دستي، بر اساس نظرات كارشناسان، صحت و پوشش نام متد¬هاي پيشنهادي ما به ترتيب حدود 8.33% و 8.18% بهبود را نشان مي¬دهد. در ارزيابي‌هاي خودكار ما، صحت و پوشش به ترتيب 4.25% و 12.08% بهبود يافتند و در بحث پيشنهاد كلاس با آزمايش بر روي 76037 كلاس امتياز F-، 2.93% بهبود يافت. درنهايت، كل زمان اجراي مدل پيشنهادي براي 420852 متد در مجموعه آزمايش ما، 13186 دقيقه كاهش يافته است.

تاريخ ورود اطلاعات

1401/07/25

عنوان به انگليسي

Method and class name recommendation based on source code similarity

تاريخ بهره برداري

10/10/2023 12:00:00 AM

دانشجوي وارد كننده اطلاعات

مسعود اختيارزاده

Name: مسعود اختيارزاده
Author: مسعود اختيارزاده

چكيده به لاتين

The names used in the software source code, especially the classes and methods, play a significant role in its readability. Class and method names should reflect the concept and purpose of the classes and methods in the source code. Method naming is one of the most important factors in program understanding, which directly affects quality features such as readability, testability, and comprehensibility. Identifiers set by developers make up 70% of an application. Among the different identifiers, method names need more attention because methods are the smallest meaningful and reusable pieces of code. Therefore, helping the developers to choose the appropriate method names automatically reduces the development cost and time. This thesis proposes the use of software criteria as features to show the performance and structure of methods and classes to be compared. We use source code metrics to extract methods and classes in a large collection of high-quality projects and also calculate textual similarity using the TF-IDF algorithm. Vectors are mostly used as criteria for suggesting suitable names for a given method or class. For this purpose, the K-Nearest Neighbor algorithm is used to determine the k-most similar methods or classes to the desired methods or classes in an unsupervised manner. We present a formula to incorporate the similarities of the methods in the respective classes as a factor influencing the final similarity score. Experiments on 800 Java projects with nearly 4,000,000 methods confirm that our proposed approach is more effective and efficient than the state-of-the-art Code2Vec and Code2seq approaches. In manual eva‎luations, based on experts' opinions, the Precision and Recall of our proposed methods shows an improvement of about 8.33% and 8.18%, respectively. In our automated eva‎luations, Precision and Recall improved by 4.25% and 12.08%, respectively, and in the class proposal discussion by testing on 76,037 classes, the F-score improved by 2.93%. Finally, the total runtime of the proposed model for 420,852 methods in our test set is reduced by 13,186 minutes.

كليدواژه هاي فارسي

نامگذاري متد , نامگذاري كلاس , پيشنهاد , شباهت كد منبع , سنجه هاي كد منبع

كليدواژه هاي لاتين

Method naming , class naming , Recommendation , code similarity , source code metrics

Author

Masood Ekhtiyarzade

SuperVisor

Dr. Saeed Parsa

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=27151&Field=0&DTC=6