پريناز سلطان زاده

عنوان

استخراج رابطه از داده‌هاي متني با استفاده از يادگيري گروهي

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي كامپيوتر - نرم افزار

سال تحصيل

1401

تاريخ دفاع

1404/07/05

استاد راهنما

حسين رحماني

استاد مشاور

دانشكده

مهندسي كامپيوتر

چكيده

در جهان امروز، حجم انبوهي از داده‌هاي متني روزانه در قالب رسانه‌هاي اجتماعي، ايميل‌ها، اخبار، مقالات پژوهشي و ساير منابع متني توليد مي‌شود. اين داده‌ها عمدتاً بدون ساختار يا نيمه‌ساختاريافته هستند و حاوي اطلاعات ارزشمندي مي‌باشند كه مي‌توانند در قالب موجوديت‌ها و روابط بين آن‌ها سازماندهي شوند. استخراج رابطه به عنوان يكي از وظايف اساسي در پردازش زبان طبيعي، نقش مهمي در شناسايي پيوندهاي معنادار بين موجوديت‌هاي نامدار در متن ايفا مي‌كند. اين فرآيند نه‌تنها به ساختاردهي اطلاعات كمك مي‌نمايد، بلكه كاربردهاي گسترده‌اي در سيستم‌هاي پرسش و پاسخ، ساخت گراف دانش و تحليل متون دارد. روش‌هاي متعددي براي استخراج رابطه از متن توسعه يافته‌اند كه عمدتاً در دو دسته‌ي روش‌هاي سنتي، شامل روش‌هاي مبتني بر قاعده و يادگيري ماشين، و روش‌هاي مبتني بر يادگيري عميق قرار مي‌گيرند. با اين حال، يكي از چالش‌هاي مهم در اين حوزه، عدم توجه به نقش بافتار در تعيين روابط بين موجوديت‌ها است. در اين پژوهش، روشي نوين به نام ACORD ارائه شده است كه با تركيب معيار ANOVA براي شناسايي بافتار و يك مدل يادگيري گروهي، به استخراج رابطه‌هاي علّي از متن مي‌پردازد. اين روش ابتدا با استفاده از معيار ANOVA مهم‌ترين اصطلاحات مؤثر در تشخيص روابط را شناسايي مي‌كند. سپس با تركيب سه مدل ، روابط بين موجوديت‌ها را با در نظر گرفتن اين اصطلاحات كليدي طبقه‌بندي مي‌نمايد. ارزيابي روش ACORD بر روي مجموعه‌دادگان Semeva‎l 2010 Task 8 نشان مي‌دهد كه اين روش با دقت 88٪ قادر به تشخيص روابط علّي است و در مقايسه با مدل پايه، بهبود عملكردي معادل 4 درصد در دقت كلي نشان داده است. همچنين، تحليل نتايج به صورت غيرعددي، امكان شناسايي اصطلاحات متمايزكننده براي هر كلاس را فراهم مي‌سازد. اين پژوهش گامي مهم در جهت بهبود استخراج رابطه با درنظرگرفتن آگاهي از بافتار است و مي‌تواند به عنوان پايه‌اي براي توسعه مدل‌هاي تفسيرپذير در آينده مورد استفاده قرار گيرد.

تاريخ ورود اطلاعات

1404/08/20

عنوان به انگليسي

Relation Extraction from Textual Data using Ensemble Learning

تاريخ بهره برداري

9/27/2026 12:00:00 AM

دانشجوي وارد كننده اطلاعات

پريناز سلطان زاده

Name: پريناز سلطان زاده
Author: پريناز سلطان زاده

چكيده به لاتين

In today’s wo‎rld, a massive volume of textual data is generated daily through social media, emails, news, research articles, an‎d other textual sources. These data are mostly unstructured o‎r semi-structured an‎d contain valuable info‎rmation that can be o‎rganized in terms of entities an‎d the relationships between them. Relation extraction, as a fundamental task in natural language processing (NLP), plays a crucial role in identifying meaningful links between named entities in text. This process not only helps structure info‎rmation but also has broad applications in question-answering systems, knowledge graph construction, an‎d text analysis. Numerous methods have been developed fo‎r relation extraction from text, which are mainly catego‎rized into two groups: traditional approaches, including rule-based an‎d machine learning methods, an‎d deep learning-based approaches. However, one of the majo‎r challenges in this field is the lack of consideration fo‎r the role of context in determining relationships between entities. In this study, a novel method called ACo‎rD is proposed, which combines the ANOVA metric fo‎r context identification with a ensemble learning model to extract causal relations from text. First, using the ANOVA metric, the most influential terms fo‎r relation detection are identified. Then, by combining three models, the relationships between entities are classified while taking these key terms into account. eva‎luation of the ACo‎rD method on the Semeva‎l 2010 Task 8 dataset demonstrates that it can identify causal relations with an accuracy of 88%. Furthermo‎re, non-numerical analysis of the results allows the identification of discriminative terms fo‎r each class. This research represents an impo‎rtant step toward improving relation extraction with context awareness an‎d can serve as a foundation fo‎r the development of interpretable models in the future.

كليدواژه هاي فارسي

استخراج رابطه , پردازش زبان طبيعي , آگاهي از بافتار , يادگيري گروهي , معيار ANOVA , گراف دانش , روابط علّي

كليدواژه هاي لاتين

Relation extraction , Natural language processing , Context awareness , Ensemble learning , ANOVA metric , Knowledge graph , Causal relations

Author

Parinaz Soltanzadeh

SuperVisor

Hossein Rahmani

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=34036&Field=0&DTC=6