چكيده به لاتين
With the spread of the Internet and smartphones, the use of social networks as a powerful communication tool has become popular among people in different communities. Social networks enable users to share a variety of content such as text, links, videos, and images across a network of followers. Twitter is one of the most popular and fast-growing social networks. Twitter users tweet a number of messages throughout the day, which includes the events of their daily lives as well as important world news and events. In most cases, events and news are published on Twitter before being covered by news media. Identifying these events is a challenging topic in the field of text-mining that has attracted the attention of many researchers. Deep learning is a sub-branch of machine learning that tries to extract high-level abstract concepts from data based on a set of algorithms. This process is performed using a deep neural network that has several layers of linear and nonlinear transformations. We can answer many natural language processing issues more easily using deep learning in the text. The purpose of this study is to identify Twitter social networking events with the help of deep learning; For this purpose, we proposed two models based on deep learning.
The first proposed model (Sem-DED) consists of two parts: Identifying events on Twitter and adapting the news to the resulting events. In the event identification section, we first pre-process the text, then extract the text embeds by combining the Sentence Transformer model and the UMAP dimension reduction technique. Finally, the final tweet vector is used as the input of the HDBSCAN algorithm to identify the topics of the tweets. Evaluation of the clustering results shows that the proposed method has a minimum improvement of 8% and a maximum improvement of 32% in the F-score criterion compared to other existence-based studies. Then we use a neural network model to link the identified news and topics, and the results show that the neural network model performs significantly better with high accuracy (about 97%) than other classification models.
In the second proposed model (DeepGraph), an online graphic method based on deep learning to identify events is presented. In this model, a combination of the Word2vec model, community identification methods, and WMD distance criterion are used. In this model, graph nodes are composed of tweets, and if two tweets have at least one common nominal entity, the semantic similarity is calculated using the WMD criterion, and if their value is more than a threshold value, the margin between them Is formed. A greedy online community identification algorithm and LDA are then used to identify events. The combination of the LDA model and the WMD distance criterion is also used to combine repetitive events. The evaluation results show that compared to the three studies based on the existence of improvement, we had a minimum of 1% and a maximum of 25% in the F-score criterion.