چکيده
Abstract— The massive volume of comments on websites and social networks has made it possible to raise awareness of people's beliefs and preferences regarding goods and services on a large scale. To this end, sentiment analysis, which refers to the determination of the sentiment of texts, has been proposed as an intelligent solution. From a methodological point of view, the recent combination of words embedding and deep networks has become an effective approach to sentiment analysis. In this approach, words embedding is based on the training of deep networks on large texts corpus that results in the production of corresponding word vectors. In Persian and in previous ways, official corpus such as Wikipedia dumps have been used. The serious difference between official and informal texts in Persian makes the resulting vectors, in the context of users' comments on social networks and websites often written in informal form, not performing well. In order to overcome this weakness, this paper provides a large text corpus of integration of several different sources of informal comments and is constructed and words vectors using the fasttext algorithm are created. To optimize using these vectors, a attention-based LSTM network is suggested; Because this model enables each word to play an important role in determining the sentiment of the text. The proposed method is evaluated on the two “Taaghche” and “Filimo” datasets presented in this paper. The results indicate the significant advantage of using informal vectors in sentiment analysis. The results also show that applying the Attention Model enhances the performance of the deep network in the sentiment analysis of Persian texts.
Keywords— Sentiment Analysis, Words Embedding, LSTM Network, Attention Model