امير رادنيا

عنوان

ايجاد پيوند بين متون فارسي مرتبط با استفاده از استخراج موجوديت نام‌گذاري شده و مدل‌سازي موضوعي

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

كامپيوتر-نرم افزار

سال تحصيل

1404

تاريخ دفاع

1404/12/4

استاد راهنما

دكتر حسن نادري

استاد مشاور

ندارم

دانشكده

كامپبوتر

چكيده

با گسترش روزافزون محتواي فارسي در منابعي مانند ويكي‌پديا، خبرگزاري‌ها و شبكه‌هاي اجتماعي، نياز به ابزارهاي هوشمند براي تحليل و كشف ارتباط ميان اين متون بيش از گذشته احساس مي‌شود. يكي از راهكارهاي مناسب در اين زمينه، تركيب روش‌هاي شناسايي موجوديت‌هاي نام‌گذاري‌شده با مدل‌سازي موضوعي است. هدف اين پژوهش، ارائه يك روش نوين و يكپارچه براي ايجاد پيوند ميان متون فارسي و ساخت خودكار گراف دانش از طريق استخراج موجوديت‌ها و مدل‌سازي موضوعي است. بدين منظور، از نسخه كاملي از پيكره ويكي‌پدياي فارسي شامل 1,011,252 سند استفاده شد. در فاز اول، موجوديت‌هاي نام‌گذاري‌شده با استفاده از مدل پيش آموخته Stanza استخراج گرديد كه منجر به شناسايي 19.5 ميليون مورد موجوديت و درنهايت 1.45 ميليون موجوديت منحصربه‌فرد شد. در فاز دوم، براي كشف ساختار موضوعي متون، چهار مدل مختلف شامل LDA، NMF، CTM و BERTopic با يكديگر مقايسه شدند. ارزيابي‌ها با استفاده از معيار امتياز انسجام (Coherence Score) نشان داد كه مدل NMF باسياست وزندهي TF-IDF و تعداد 240 موضوع، با كسب امتياز 0.7694، عملكرد بهتري نسبت به ساير مدل‌ها ازجمله مدل‌هاي پيشرفته‌تر مبتني بر ترنسفورمر داشته است. درنهايت، با تلفيق نتايج دو فاز قبلي و استفاده از قرابت موضوعي به‌عنوان معياري براي استنتاج رابطه، يك گراف دانش عظيم و بومي شامل 1,449,584 گره (موجوديت) و 5,189,205 يال (رابطه معنايي) ساخته شد. تحليل ساختاري اين گراف، چگالي بسيار پايين (نزديك به صفر) و ميانگين درجه 5 را نشان داد كه حاكي از حذف موفق روابط تصادفي و حفظ ارتباطات معنادار و موضوع محور است. نتايج اين پژوهش در حوزه‌هايي مانند موتورهاي جستجوي معنايي، سامانه‌هاي پيشنهاددهنده و تحليل محتواي متني فارسي كاربرد دارد

تاريخ ورود اطلاعات

1405/02/18

عنوان به انگليسي

A Linking Persian Texts through Named Entity Recognition an‎d Topic Modeling Techniques

تاريخ بهره برداري

2/23/2027 12:00:00 AM

دانشجوي وارد كننده اطلاعات

امير رادنيا

Name: امير رادنيا
Author: امير رادنيا

چكيده به لاتين

With the ever-increasing expansion of Persian content in sources such as Wikipedia, news agencies, an‎d social networks, the need for intelligent tools to analyze an‎d discover connections among these texts is felt more than ever. One suitable approach in this context is the combination of Named Entity Recognition methods with topic modeling. This research aims to present a novel an‎d integrated method for establishing links between Persian texts an‎d the automatic construction of a knowledge graph through entity extraction an‎d topic modeling. For this purpose, a complete version of the Persian Wikipedia corpus, comprising 1,011,252 documents, was used. In the first phase, named entities were extracted using the pre-trained Stanza model, which led to the identification of 19.5 million entity mentions an‎d ultimately 1.45 million unique entities. In the second phase, to discover the topical structure of the texts, four different models including LDA, NMF, CTM, an‎d BERTopic were compared. eva‎luations using the Coherence Score metric showed that the NMF model with TF-IDF weighting policy an‎d 240 topics, achieving a score of 0.7694, performed better than other models, including more advanced transformer-based models. Finally, by combining the results of the previous two phases an‎d using topical affinity as a criterion for inferring relationships, a massive an‎d native knowledge graph was constructed, containing 1,449,584 nodes (entities) an‎d 5,189,205 edges (semantic relations). Structural analysis of this graph revealed a very low density (near zero) an‎d an average degree of 5, indicating the successful elimination of ran‎dom relations an‎d the preservation of meaningful, topic-oriented connections. The results of this research are applicable in areas such as semantic search engines, recommender systems, an‎d Persian text content analysis

كليدواژه هاي فارسي

پردازش زبان طبيعي , شناسايي موجوديت نام‌گذاري شده , مدل‌سازي موضوعي , گراف دانش

كليدواژه هاي لاتين

Natural Language Processing , Named Entity Recognition , Topic Modeling , Knowledge Graph

Author

Amir Radnia

SuperVisor

Hassan Naderi

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=34764&Field=0&DTC=6