چكيده به لاتين
Text summarization is the process of extracting salient information from the source text and present that information to a user in the form of a summary. It is very difficult for human beings to manually summarize large documents of text. Taxonomy of summarization methods includes two folds: extractive and abstractive summarization. Extractive summarization uses statistical and linguistic features to determine the important features and fuse them into a shorter version. Whereas abstractive summarization understands the whole document and then generates the summary. Abstractive methods are highly complex, as they need extensive natural language processing. Therefore, the research community is focusing more on extractive summaries, trying to achieve more coherent and meaningful summaries. Over a decade, several extractive approaches have been developed for the automatic summary generation that implements a number of machine learning and optimization techniques.
The proposed method is an extractive single document summarization system for the Persian language that forms informative summaries from texts by applying our auto summarization system. The important input sentences, which are to be inserted in the summary, are identified according to the use of the knowledge graph for automatic summarization. The general idea is to identify the entities in the raw text, and the relations between extracted entities with each other. These relationships helped us to determine important sentences by giving high rank to sentences that have the most related entities in the text. In addition, we use other methods, which are Term frequency - Inverse sentence frequency (TF-ISF), Sentence position and Sentence length. The proposed method has been compared with three text summarisation systems and techniques for the Persian language: FarsiSum, Ijaz, and HTM. Our proposed method achieves significantly better results than others do. The purpose of this research is to develop sentences ranking methods generating the effective summary of text by using a base of knowledge graph with high evaluation results of ROUGE-1, Recall, Precision, and F-measure.