هما نصيري

عنوان

شناسايي موضع در متون فارسي به كمك يادگيري انتقالي و افزايش داده

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي كامپيوتر - هوش مصنوعي و رباتيكز

سال تحصيل

1396 - 1400

تاريخ دفاع

1400/12/9

استاد راهنما

دكتر مرتضي آنالويي

دانشكده

مهندسي كامپيوتر

چكيده

با گسترش شبكه‌هاي اجتماعي نظير فيس‌بوك، اينستاگرام و توييتر، روزانه حجم عظيمي از اطلاعات توليد و تكثير شده كه مي‌تواند حاوي محتواي مشكوك و نادرست باشد. اين محتواها با اهدافي نظير جلب مخاطب، تأثيرگذاري بر عقايد و تصميمات افراد، افزايش درآمد حاصل از كليك و تاثيرگذاري بر رويدادهاي مهم مانند انتخابات سياسي توليد مي‌شوند. شناسايي اين اخبار به روش سنتي و دستي معمولا كاري بسيار زمان‌بر، پرهزينه و طاقت‌فرسايي بوده درنتيجه لزوم وجود ابزارهاي كشف اخبار جعلي به يك ضرورت اساسي بدل شده تا مردم از سردرگمي‌هاي ايجاد شده توسط اين محتواهاي مشكوك و نادرست رها شوند. شناسايي مقالات خبري جعلي با درك آنچه ساير پايگاه‌هاي خبري در مورد همان موضوع گزارش مي‌دهند، مي‌تواند اولين قدم ارزشمند باشد. اين مرحله به عنوان تشخيص موضع شناخته مي‌شود. تمركز اين پژوهش در حل مسئله تنك‌بودن و كمبود داده با كيفيت در حوزه تشخيص موضع زبان فارسي است. به كمك روش‌هاي افزايش داده EDA تلاش شد تا با توليد نمونه‌هاي جديد در داده‌ي آموزشي، تنك‌بودن داده‌هاي برخي از كلاس‌هاي اين وظيفه كه باعث عدم شناسايي كلاس‌هاي مربوطه مي‌شد، تا حدي رفع و مقداري توازن به مجموعه داده اضافه‌ شود. به طور معمول در كارهاي انجام‌شده در تشخيص موضع و يا شايعه زبان فارسي، بازنمايي داده‌ها، توسط الگوريتم‌هاي مبتني بر پيش‌بيني بوده كه در اين پژوهش بر خلاف كارهاي موجود، به كمك تعبيه‌ساز پارس‌برت كه جز تعبيه‌سازهاي مبتني بر محتواست، بازنمايي متفاوتي از هر كلمه با توجه به محتواي به كار رفته در آن، در اختيار مدل قرار گرفت تا بتواند برچسب‌هاي هر رده را بهتر شناسايي كند. همچنين با استفاده از مدل از پيش آموزش داده‌شده پارس‌برت، تلاش شد تا كمبود داده‌هاي اين حوزه با اطلاعات و دانشي كه مدل از فاز پيش‌آموزشي بر روي پيكره‌هاي متفاوت كسب كرده، جبران شود. در نهايت هم، به كمك الگوريتم ASHA، فراپارامترهاي مناسب مدل جهت فرآيند آموزشي آن، انتخاب و به كار گرفته‌شد كه به نسبت روش آزمون و خطاي انتخاب فراپارامتر، به انتخاب بهينه‌تر تركيب فراپامترها انجاميد. نتايج حاصل از اين پژوهش بيان‌گر آن است كه با كمك روش‌هاي افزايش داده، بازنمايي محتوايي داده‌هاي به كار رفته و مدل پارس‌برت مي‌توان موضع‌ يك خبر نسبت به ادعاي مطرح‌شده را بهتر از كارهاي موجود شناسايي كرد.

تاريخ ورود اطلاعات

1401/04/06

عنوان به انگليسي

Persian Stance Detection with Transfer Learning and Data Augmentation

تاريخ بهره برداري

2/28/2023 12:00:00 AM

دانشجوي وارد كننده اطلاعات

هما نصيري

Name: هما نصيري
Author: هما نصيري

چكيده به لاتين

With the proliferation of social networks such as Facebook, Instagram, and Twitter, much information is generated and reproduced daily, including questionable and inaccurate content. This content attracts the audience, influences people's beliefs and decisions, increases revenue generated by clicking, and influences major events such as political elections. Identifying this news traditionally and manually is usually a very time-consuming, costly, and tedious task, so the need for fake news detection tools has become a basic necessity to rid people of the confusion created by this questionable and inaccurate content. Identifying fake news articles by understanding what other news organizations are reporting about the same topic could be a valuable first step. This step is known as Stance detection. This research focuses on solving the problem of scarcity and lack of quality data in the field of Persian Stance Detection. With the help of EDA data augmentation methods, we attempt to eliminate the data scarcity of some classes of this task, which caused the non-identification of the relevant classes by generating new samples in the training data and adding some balance to the dataset. Typically, in related work on Persian stance or rumor detection, the data is represented by prediction-based embedding algorithms. In this research, we used ParsBERT content-based embedding to help the model identify each class's labels better. Also, using ParsBERT pre-trained model, an attempt was made to compensate for the lack of data in this field with the information and knowledge that the model gained from the pre-training phase on different datasets. Finally, with the help of the ASHA algorithm, the appropriate hyperparameters of the model for its training process were selected and used, which led to a more optimal selection of the combination of hyper-parameters in terms of trial and error method selection. This study indicates that with the help of data augmentation methods, content-based embedding, and the ParsBERT model, the stance of news with attention to the claim can be detected better than the existing works.

كليدواژه هاي فارسي

تشخيص موضع , تشخيص موضع فارسي , تشخيص اخبار جعلي , افزايش داده , يادگيري انتقالي , پارس برت

كليدواژه هاي لاتين

Stance Detection , Persian Stance Detection , Fake News Detection , Data Augmentation , Transfer Learning , ParsBERT

Author

Homa Nasiri

SuperVisor

Dr. Morteza Analoui

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=26695&Field=0&DTC=6