زهرا اخگري زيري

شماره ركورد
25419
پديد آورنده
زهرا اخگري
عنوان
ارائه يك روش بهبوديافته در شناسايي رويداد در توئيتر
مقطع تحصيلي
كارشناسي ارشد
رشته تحصيلي
مهندسي كامپيوتر - نرم افزار
سال تحصيل
1399-1400
تاريخ دفاع
1400/06/29
استاد راهنما
دكتر حسين رحماني
دانشكده
مهندسي كامپيوتر
چكيده
با گسترش اينترنت و گوشي¬هاي هوشمند، استفاده از شبكه¬هاي اجتماعي به عنوان يك ابزار ارتباطي قوي ميان افراد جوامع مختلف رواج يافته¬است. شبكه¬هاي اجتماعي، كاربران را قادر مي-سازند تا محتواهاي متنوعي همچون متون كوتاه، پيوندها، ويديوها وتصاوير را در ميان شبكه¬اي از دنبال¬كنندگان خود به اشتراك بگذارند. توئيتر، يكي از محبوب¬ترين و پرجمعيت¬ترين شبكه¬هاي اجتماعي است كه به سرعت رشد كرده¬است. كاربران توئيتر، در طول روز پيام¬هاي متعددي را توئيت مي¬كنند كه اين پيام¬ها حاوي اتفاقات روزمره زندگي شخصي آن¬ها تا اخبار و وقايع مهم جهان مي¬شود. در بيشتر موارد، رويدادها و اخبار قبل از پوشش توسط رسانه¬هاي خبري، در توئيتر منتشر مي¬شوند. شناسايي اين رويدادها و وقايع، موضوعي چالش¬برانگيز در حوزه¬ي متن-كاوي است كه توجه محققان زيادي را به خود جلب كرده¬است. يادگيري عميق زير شاخه¬اي از يادگيري ماشين است كه سعي دارد بر مبناي مجموعه¬اي از الگوريتم¬ها، مفاهيم انتزاعي سطح بالا را از داده¬ها استخراج كند. اين فرآيند با استفاده از يك شبكه¬ي عصبي عميق صورت مي-گيرد كه داراي چندين لايه تبديلات خطي و غيرخطي است. استفاده از يادگيري عميق در متن، مي¬تواند بسياري از مسائل پردازش زبان طبيعي را آسان¬تر پاسخ دهد. هدف اين پژوهش، شناسايي رويدادهاي شبكه اجتماعي توئيتر با كمك يادگيري عميق است؛ بدين منظور، دو مدل پيشنهادي بر مبناي يادگيري عميق ارائه داديم. مدل پيشنهادي اول (Sem-DED) شامل دو بخش شناسايي رويداد در توئيتر و انطباق خبر با رويدادهاي حاصل است. در بخش شناسايي رويداد، ابتدا به پيش‌پردازش متن، سپس به استخراج تعبيه‌هاي متون با تركيب مدل Sentence Transformer و تكنيك كاهش بعد UMAP، مي‌پردازيم. در نهايت، از بردار نهايي توئيت به عنوان ورودي الگوريتم HDBSCAN استفاده مي‌شود تا موضوعات توئيت‌ها شناسايي شود. ارزيابي نتايج حاصل از خوشه‌بندي نشان مي‌دهد كه روش پيشنهادي نسبت به ديگر پژوهش‌هاي مبتني بر موجوديت، بهبود حداقلي 8 درصد و بهبود حداكثري 32 درصد در معيار F-score داشته‌است. سپس از يك مدل شبكه عصبي براي پيوند اخبار و موضوعات شناسايي‌شده استفاده مي‌كنيم كه نتايج حاصل نشان مي‌دهد كه مدل شبكه عصبي با دقت بالايي (نزديك 97 درصد) نسبت به مدل‌هاي رده‌بندي ديگر، به طرز قابل توجهي بهتر عمل مي‌كند. در مدل پيشنهادي دوم (DeepGraph)، يك روش گرافي برخط مبتني بر يادگيري عميق براي شناسايي رويدادها ارائه شده‌است. در اين مدل، از تركيب مدل Word2vec، روش‌هاي شناسايي جوامع و معيار فاصله WMD استفاده مي‌شود. در اين مدل، گره‌هاي گراف را توئيت‌ها تشكيل مي‌دهند، و در صورتي كه دو توئيت، حداقل يك موجوديت اسمي مشترك داشته باشند، شباهت معنايي با كمك معيار WMD محاسبه مي‌شود و در صورتي كه مقدار آن‌ها بيشتر از يك مقدار آستانه باشد، يالي ميان آن‌ها تشكيل مي‌شود. سپس از يك الگوريتم شناسايي جوامع برخط حريصانه و LDA براي شناسايي رويدادها استفاده مي‌شود. همچنين از تركيب مدل LDA و معيار فاصله WMD، به منظور تركيب رويدادهاي تكراري استفاده مي‌شود. نتايج ارزيابي نشان مي‌دهد كه نسبت به سه پژوهش مبتني بر موجوديت بهبود حداقلي 1 درصد و حداكثري 25 درصد در معيار F-score داشته‌ايم.
تاريخ ورود اطلاعات
1400/08/01
عنوان به انگليسي
Proposing an improved method in event detection on Twitter
تاريخ بهره برداري
9/20/2022 12:00:00 AM
دانشجوي وارد كننده اطلاعات
زهرا اخگري زيري
چكيده به لاتين
With the spread of the Internet and smartphones, the use of social networks as a powerful communication tool has become popular among people in different communities. Social networks enable users to share a variety of content such as text, links, videos, and images across a network of followers. Twitter is one of the most popular and fast-growing social networks. Twitter users tweet a number of messages throughout the day, which includes the events of their daily lives as well as important world news and events. In most cases, events and news are published on Twitter before being covered by news media. Identifying these events is a challenging topic in the field of text-mining that has attracted the attention of many researchers. Deep learning is a sub-branch of machine learning that tries to extract high-level abstract concepts from data based on a set of algorithms. This process is performed using a deep neural network that has several layers of linear and nonlinear transformations. We can answer many natural language processing issues more easily using deep learning in the text. The purpose of this study is to identify Twitter social networking events with the help of deep learning; For this purpose, we proposed two models based on deep learning. The first proposed model (Sem-DED) consists of two parts: Identifying events on Twitter and adapting the news to the resulting events. In the event identification section, we first pre-process the text, then extract the text embeds by combining the Sentence Transformer model and the UMAP dimension reduction technique. Finally, the final tweet vector is used as the input of the HDBSCAN algorithm to identify the topics of the tweets. Evaluation of the clustering results shows that the proposed method has a minimum improvement of 8% and a maximum improvement of 32% in the F-score criterion compared to other existence-based studies. Then we use a neural network model to link the identified news and topics, and the results show that the neural network model performs significantly better with high accuracy (about 97%) than other classification models. In the second proposed model (DeepGraph), an online graphic method based on deep learning to identify events is presented. In this model, a combination of the Word2vec model, community identification methods, and WMD distance criterion are used. In this model, graph nodes are composed of tweets, and if two tweets have at least one common nominal entity, the semantic similarity is calculated using the WMD criterion, and if their value is more than a threshold value, the margin between them Is formed. A greedy online community identification algorithm and LDA are then used to identify events. The combination of the LDA model and the WMD distance criterion is also used to combine repetitive events. The evaluation results show that compared to the three studies based on the existence of improvement, we had a minimum of 1% and a maximum of 25% in the F-score criterion.
كليدواژه هاي فارسي
توئيتر، اخبار، شناسايي رويداد، يادگيري عميق، انتقال دهنده جمله، يومپ.
كليدواژه هاي لاتين
Twitter, News, Event detection, Deep learning, Sentence Transformer, UMAP.
لينک به اين مدرک :
http://dl.iust.ac.ir/dL/search/default.aspx?Term=25419&Field=0&DTC=6

کلیه حقوق این اثر برای شرکت مهندسی ارتباطات پيام مشرق محفوظ می باشد