وسام المعموري

عنوان

بهبود تشخيص موجوديت با نام عربي با استفاده از مدل هاي دنباله به دنباله. مطالعه موردي: احاديث اسلامي

مقطع تحصيلي

دكتري تخصصي (PhD)

رشته تحصيلي

مهندسي كامپيوتر- هوش مصنوعي

سال تحصيل

1399

تاريخ دفاع

1404/10/10

استاد راهنما

بهروز مينايي بيدگلي- سيد صالح اعتمادي

استاد مشاور

دانشكده

پرديس دانشگاهي - دانشكده مهندسي كامپيوتر

چكيده

اين تحقيق مدلي را معرفي مي كند كه مي تواند براي بهبود شناخت موجوديت (NER) متن عربي با اشاره خاص به احاديث اسلام مورد استفاده قرار گيرد. براي تقويت تشخيص موجوديت، مدلهاي توالي به توالي براي دادن تجزيه و تحليل دقيق تر و كامل تر متن. با استفاده از مجموعه داده هاي نور حديث، كه شامل 59 ،430 حديث است، يك طرح برچسب زدن BIO توسعه داده شد، جايي كه متن تميز شد، كلمات متوقف شد و تقسيم بندي انجام شد. يك توزيع راست اسكيد با استفاده از تجزيه و تحليل طول متن نشان داده شد زيرا متون كوتاهتر رايج تر بودند. با انتخاب چهار مدل، يعني AraBERT، BiLSTM، CNN-BiLSTM Hybrid و AraBERT-LSTM Hybrid، كلمات موجود در مجموعه داده ها به هشت دسته تقسيم شدند: شخص، امام، مكان، راوي، كتاب، قبيله، تاريخ و رويداد. ارزيابي عملكرد با استفاده از دقت، فراخواني و نمره F1 از چهار مدل: AraBERT، LSTM، تركيبي CNN-BiLSTM و مدل تركيبي AraBERT-LSTM، مدل تركيبي دوم دقت 0.981 را به دست آورد. مدل تركيبي NER كه در اين مطالعه توسعه يافته است نيز وعده هاي زيادي براي پردازش زبان طبيعي به زبان عربي نشان مي دهد. ما با استفاده از استراتژي هاي مختلف مدل سازي، كه به حل موفقيت آميز وظايف دشوار كمك مي كند، دقت و سازگاري بيشتري را در مدل تركيبي ما به دست مي آوريم. عملكرد مدل تركيبي بر روي متون عربي اسلامي حديث به اثبات رسيده است كه برخي از كاربردهاي هيجان انگيز آينده را در پردازش زبان طبيعي و تحقيقات مشابه باز مي كند. اين تحقيق منبع ارزشمندي از تلاشهاي ديگر براي بهبود چنين مدلهايي و قابل استفاده كردن آنها براي ساير مجموعههاي متن عربي است.

تاريخ ورود اطلاعات

1404/11/13

عنوان به انگليسي

Improving Arabic Named Entity Recognition Using Sequence-to-Sequence Models; Case Study: Islamic Hadiths

تاريخ بهره برداري

1/25/2026 12:00:00 AM

دانشجوي وارد كننده اطلاعات

وسام المعموري

Name: وسام المعموري
Author: وسام المعموري

چكيده به لاتين

The study presents a model that can be employed to enhance the named entity recognition (NER) of Arabic text with specific reference to hadiths of Islam. Using sequence-to-sequence models can better an‎d more precise text analysis to improve entity recognition. With the help of Noor Hadith dataset that consists of 59,430 hadiths a BIO tagging scheme was created, cleaning of text, removal of stop words an‎d breaking up of text occurred. The distribution was shown to be right skewed by the analysis of the text length where short texts were more frequent. The classification of words in the dataset was made to eight categories of entities namely, person, imam, location, narrator, book, tribe, date, an‎d event with a choice of four models, namely, AraBERT, BiLSTM, CNN-BiLSTM Hybrid an‎d AraBERT-LSTM Hybrid. Precision, recall an‎d F1 score of four models AraBERT, LSTM, hybrid CNN-BiLSTM, an‎d hybrid AraBERT-LSTM model were eva‎luated using performance which gave the latter hybrid model an accuracy of 0.981. The hybrid NER model that has been created during the study is also very promising to the natural language processing in Arabic. The application of different modeling strategies to our hybrid model gives us more accuracy an‎d consistency, which lead to the solution of challenging tasks more successfully. The experiment with the hybrid model on the Arabic Islamic texts of Hadith has been fruitful an‎d therefore paves way to some exciting applications in the future in the natural language processing an‎d related studies. The research constitutes a useful resource of other efforts to enhance such models an‎d render it to other collections of Arabic texts.

كليدواژه هاي فارسي

تشخيص موجوديت‌هاي اسمي عربي (NER) , تحليل متن حديث , مدل‌هاي توالي به توالي , مدل‌هاي عصبي تركيبي , طرح برچسب‌گذاري زيستي

كليدواژه هاي لاتين

Arabic Named Entity Recognition (NER) , Hadith Text Analysis , Sequence to Sequence Models , Hybrid Neural Models , BIO Tagging Scheme

Author

Wessam Almamoori

SuperVisor

Dr.Minaei- Dr Etemadi

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=34462&Field=0&DTC=6