-
شماره ركورد
26139
-
پديد آورنده
مجيد زرهرن
-
عنوان
تشخيص نيمه نظارتي اخبار جعلي فارسي از طريق متن اخبار
-
مقطع تحصيلي
كارشناسي ارشد
-
رشته تحصيلي
مهندسي كامپيوتر- هوش مصنوعي و رباتيكز
-
سال تحصيل
99-1400
-
تاريخ دفاع
1400/08/30
-
استاد راهنما
سيدصالح اعتمادي
-
دانشكده
مهندسي كامپيوتر
-
چكيده
در دنياي امروز شايعات زيادي در فضاي حقيقي و مجازي وجود دارد. عواقب و خطرات پخش اخبار جعلي در يك جامعه بر كسي پوشيده نيست. جهت تشخيص اخبار جعلي ميتوان از تشخيص موضع استفاده كرد. به فرآيند خودكار (اتوماتيك) درك موضع ديگر منابع يا خبرگزاريها نسبت به يك ادعا، تشخيص موضع ميگويند. ما با بهرهگيري از تشخيص موضع و نگهداري سابقه خبرگزاريهاي مختلف در زمينه پخش شايعات و اخبار جعلي و رتبهدهي به اين خبرگزاريها اخبار جعلي را تشخيص ميدهيم. لازم بذكر است كه مدل مذكور، اولين مدل تشخيص اخبار جعلي فارسي با استفاده از تشخيص موضع در كشور خواهد بود. برچسبگذاري اخبار جعلي مانند اكثر كاربردهاي پردازش زبان طبيعي فرايندي زمانبر است و تعداد نمونههاي برچسبگذاري شده در اين زمينه محدود است. در اين پروژه روش پيشنهادي ما استفاده از روش نيمهنظارتي است. يادگيري نيمهنظارتي دستهاي از روشهاي يادگيري ماشين است كه در آن از دادههاي بدون برچسب و دادههاي برچسبدار به صورت همزمان براي بهبود دقت يادگيري استفاده ميشود. در نتيجه با استفاده از روش يادگيري نيمه نظارتي، ميتوان تعداد نمونههاي پيكره آموزش را افزايش داد زيرا ميتوان از نمونههاي بدون برچسب نيز بهره برد. در نهايت انتظار ميرود افزايش تعداد نمونهها منجر به دقت بالاتر گردد.
-
تاريخ ورود اطلاعات
1400/12/03
-
عنوان به انگليسي
Semi-supervised Persian fake news detection by using news text
-
تاريخ بهره برداري
11/21/2022 12:00:00 AM
-
دانشجوي وارد كننده اطلاعات
مجيد زرهرن
-
چكيده به لاتين
The spread of false information can result in different social and political problems. It would be quite difficult to detect and track false information manually, especially by considering the pivotal role of social media in order to spread rumors. To cite an example, genocide in Myanmar was incited by a campaign of fake news on Facebook in Oct of 2018, according to The New York Times report. Fake news detection is a complex task, even for trained experts. But we can break down the process into some smaller steps. Stance classification or even fact-checking are the first steps in this process and these techniques play an important role in fake news detection. Therefore, in this project, we focus on the fact-checking and stance detection tasks and developing a dataset for both of them. Automatic fact extraction and verification models require large amounts of annotated data which might not be available for low-resource languages such as Persian. In this report, we present our paper which is called ParsFEVER: the first publicly available Farsi dataset for fact extraction and verification. We improve the construction procedure of the standard English dataset (FEVER) for the case of low-resource languages. We trained a model on our dataset that attains 50.0% accuracy on a held-out test set on claim classification, and 28.1% for evidence retrieval. In addition, we present the first Persian dataset for stance detection. We improve our dataset in several stages and achieve better accuracy on each stage due to more instances and new models based on transformers. We experiment with different methods to improve the stance detection result and we enjoy semi-supervised approaches (self-training) in order to reach better accuracy. Our final stance detection model can predict head-to-claim and article-to-claim stance detection with 0.86 and 0.82 F1, respectively (when there is no common claim in the train set and test set). Finally, the proposed fake news model can use our stance detection system as a pre- processing step. As a result, the head-to-claim stance and the article-to-claim stance are two important features in addition to other features like the credibility of the news source to determine the veracity of claims. The extracted stances are combined with the rest of the features. These features are assigned to 3 dense neural network layers. In the end, there is a dense layer with a softmax activation function to specify the proper class (true, false, or unknown).
-
كليدواژه هاي فارسي
پردازش زبان طبيعي , يادگيري عميق , تشخيص اخبار جعلي , يادگيري نيمه نظارتي
-
كليدواژه هاي لاتين
Natural language processing , Deep learning , Fake news detection , Semi- supervised learning
-
Author
مجيد زرهرن
-
SuperVisor
مجيد زرهرن
-
لينک به اين مدرک :