چكيده به لاتين
World Wide Web contains a huge amount of information with different structures. Computer Scientists have been working on systems that extract information from unstructured text; Flexible and reliable systems which convert web pages to machine-readable structures. One of the major approaches to achieve this goal is "Open Information Extraction". Information extraction systems often extract information as a relation with its arguments set. By executing this process on a wide range of data, such as the Web, hundreds of collections can be obtained which contains millions of different relations with their arguments.
Despite the advances in information extraction, there are still issues in the areas of quality, reliability, etc. that have challenged the systems and methods of information extraction in practice.
One of the major issues is that relations that are semantically the same are extracted from different names. In fact, any concept may be expressed in the form of a set of different words in the text. We call the challenge of the existence of synonymous names in the extracted information as "ambiguity" and proposed the issue of identifying synonymous names in the entities as "Disambiguation of extracted information".
This thesis is an effort on developing a system for disambiguation of extracted information in the Persian language. In this research, we examine the challenges ahead of this system as well as the challenges of word sense disambiguation in Persian. Then we attempt to create a system that can disambiguate the arguments extracted from the Persian language and simultaneously use some methods to connect that information to BabelNet knowledge base entities and concepts, which is a multilingual encyclopedic dictionary. At the end, the system is designed to display the results, as well as to perform different queries on the information, in which results are visible.