چكيده به لاتين
Understanding colloquial languages is considered as one of important purposes in artificial intelligence, and processing natural languages is developed for this purpose. Some important applications of processing natural languages include searching in Internet, advertisements, emails, customer services, and translation from one language to another language. Traditionally, laws governing data were manually derived and models were worked based on these laws which had a lot of limitations. Nowadays, rather than the above laws, some methods based on machine learning are used which have benefits such as using statistical inference in case of noise, using mass data and focusing more general laws. In this project, applying a recurrent neural network, RNN, with long short-term memory architecture, LSTM, the provided textual views in IMDB datasets including cinematic critics’ critiques are analyzed. Thus, by a movie critique, given training model recognizes positive or negative critique, following pre-processing. In LSTM algorithm in this project, infrastructure vector along with LSTM model were trained together. This makes training process so longer, but instead we won’t limit to available words in stable infrastructure matrix, and suitable infrastructure of given application is obtained. A sequence of LSTM units are situated in layer one in the provided model, which leads to form the corresponding mapping in LSTM units output. In this project, when training LSTM units, random exclusion technique was used. This made the training process a bit longer, but instead the model generalization was increased and over-fitting was prevented which led to increase model final precision. In output layer of LSTM, clustered normalization was used in which suitably changing data arrangement in features space made training process faster and more effectively. In output layer, logistic regression layer was used for ranking input sequences. This caused to train all steps from mapping to ranking integratively by back propagation algorithm. Finally, ultimate precision in ranking IMDB datasets in this model was reached 90% comparative precision.