طاها خاني الموتي

عنوان

يك چارچوب براي خودكارسازي ارزيابي امنيتي كد با استفاده از مدل‌هاي زباني بزرگ

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي كامپيوتر- نرم‌افزار

سال تحصيل

1401

تاريخ دفاع

1404/10/17

استاد راهنما

محمد عبداللهي ازگمي

استاد مشاور

دانشكده

دانشكده مهندسي كامپيوتر

چكيده

با ظهور مدل‌هاي زبان بزرگ (LLM)، حوزه تحليل امنيتي كد وارد مرحله‌اي نوين شده است. با اين حال، علي‌رغم توانايي ذاتي اين مدل‌ها در درك ساختار كد، عملكرد آن‌ها در تشخيص دقيق و جامع آسيب‌پذيري‌ها همچنان محدود است. اين محدوديت‌ها ناشي از وابستگي مدل‌هاي زباني بزرگ به دانش ايستا آموزش‌ديده و فقدان دسترسي به دانش تخصصي ساخت‌يافته امنيت كد، به‌ويژه در رابطه با طبقه‌بندي‌هاي دقيق شمارش نقاط ضعف رايج (CWE) است و باعث ناتواني در تمايز ظرافت‌هاي معنايي ميان انواع ضعف‌هاي امنيتي و نرخ تطابق پايين با شناسه‌هاي استاندارد CWE مي‌شود. هدف اين پژوهش، ارتقاء دقت و قابليت اتكاء مدل‌هاي زباني بزرگ در شناسايي و طبقه‌بندي دقيق آسيب‌پذيري‌هاي كد در سطح CWE و كاهش فاصله ميان توان بالقوه مدل‌هاي زباني بزرگ و دقت مورد انتظار در سامانه‌هاي ارزيابي امنيتي است. انگيزه اصلي تحقيق، غني‌سازي زمينه دانش مدل براي بهبود تحليل و افزايش كارايي مدل‌هاي با توان محاسباتي كمتر است تا امكان استفاده عملي از آن‌ها در تحليل امنيتي فراهم شود. در اين پايان‌نامه بر اساس هدف فوق، يك چارچوب مبتني بر توليد بازيابي افزوده (RAG) با تحليل ساختاري سلسله‌مراتبي CWE ارائه شد. در اين چارچوب، پايگاه دانش شامل تعاريف رسميCWE ، روابط سلسله‌مراتبي و نمونه كدهاي آسيب‌پذير توليد و نمايه‌سازي شده و سازوكار ارزيابي مجدد به مدل اعمال گرديده است تا حدس اوليه مدل با شواهد بازيابي‌شده بازبيني شود. نتايج تجربي نشان داده است كه چارچوب پيشنهادي به‌طور قابل توجه‌اي دقت طبقه‌بندي را افزايش مي‌دهد؛ به‌گونه‌اي كه دقت مدل GPT-4 از 58.7 درصد به 78.8 درصد و دقت DeepSeek-Coder از 57.7 درصد به 76.0 درصد رسيده است. همچنين دقت مدل cf.llama-3-8b-instruct از 25.0 درصد به 60.6 درصد ارتقاء يافته، كه اثربخشي تلفيق RAG و دانش ساختاريافته در ارتقاء دقت و قابليت اتكاء LLMها را به‌وضوح نشان مي‌دهد.

تاريخ ورود اطلاعات

1405/01/26

عنوان به انگليسي

A Framework for Automatic Code Security Assessment Using Large Language Models

تاريخ بهره برداري

1/7/2027 12:00:00 AM

دانشجوي وارد كننده اطلاعات

طاها خاني الموتي

Name: طاها خاني الموتي
Author: طاها خاني الموتي

چكيده به لاتين

With the emergence of Large Language Models (LLMs), the field of code security analysis has entered a new phase. However, despite the inherent ability of these models to understan‎d code structure, their performance in accurate an‎d comprehensive vulnerability detection remains limited. These limitations stem from the reliance of LLMs on static, pre-trained knowledge an‎d the lack of access to structured, domain-specific knowledge in code security—particularly with respect to the fine-grained classifications of the Common Weakness Enumeration (CWE). As a result, LLMs struggle to distinguish subtle semantic differences among various types of security weaknesses an‎d exhibit low alignment with stan‎dard CWE identifiers. The objective of this research is to enhance the accuracy an‎d reliability of LLMs in identifying an‎d precisely classifying code vulnerabilities at the CWE level, thereby narrowing the gap between the potential capabilities of LLMs an‎d the level of accuracy required in security assessment systems. The primary motivation of this study is to enrich the contextual knowledge available to the model in order to improve analysis quality an‎d increase the effectiveness of models with lower computational capacity, enabling their practical use in security analysis. Based on this objective, this thesis proposes a Retrieva‎l-Augmented Generation (RAG)-based framework incorporating hierarchical structural analysis of CWE. In the proposed framework, a knowledge base consisting of official CWE definitions, hierarchical relationships, an‎d vulnerable code examples is generated an‎d indexed, an‎d a re-eva‎luation mechanism is applied to the model to revise its initial predictions using retrieved evidence. Experimental results demonstrate that the proposed framework significantly improves classification accuracy: the accuracy of GPT-4 increases from 58.7% to 78.8%, an‎d that of DeepSeek-Coder rises from 57.7% to 76.0%. Furthermore, the accuracy of cf.llama-3-8b-instruct improves from 25.0% to 60.6%, clearly illustrating the effectiveness of integrating RAG with structured knowledge in enhancing the accuracy an‎d reliability of LLMs.

كليدواژه هاي فارسي

امنيت نرم‌افزار , بررسي كد , مدل‌هاي زباني بزرگ (LLM) , توليد بازيابي افزوده (RAG) , شمارش نقاط ضعف رايج (CWE)

كليدواژه هاي لاتين

Software Security , Code Analysis , Large Language Models (LLM) , Retrieva‎l-Augmented Generation (RAG) , Common Weakness Enumeration (CWE)

Author

Taha Khani Alamooti

SuperVisor

Dr. Abdolahi

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=34663&Field=0&DTC=6