لادن مداح

عنوان

سازوكاري براي بهينه‌سازي استخراج ويژگي براي تشخيص هم‌مرجعي در زبان فارسي

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

نرم افزار

تاريخ دفاع

بهمن ۱۳۹۶

استاد راهنما

دكتر بهروز مينايي

دانشكده

كامپيوتر

چكيده

تشخيص هم‌مرجعي يكي از پايه هاي مهم در پردازش زبان طبيعي مي‌باشد. كاربردهاي مهمي در حوزه‌هايي مانند پاسخ گويي به سؤال، ترجمه ماشين، خلاصه سازي اتوماتيك و استخراج موجوديت نامدار، دارد. وظيفه‌ي تشخيص هم‌مرجعي حل و فصل عبارت‌هاي اسمي و ضماير در متن است كه به موجوديت يكسان ارجاع مي‌دهد بنابراين به منظور فهميدن سندهاي متني و يا حتي سخن لازم است هم‌مرجعي حل و فصل شود. روش‏هاي تشخيص هم‌مرجعي را مي‏توان به دو دسته‏ي روش‏هاي زبان‏شناسي و روش‏هاي يادگيري ماشين تقسيم نمود. روش‏هاي زبان‏شناسي بيشتر به اطلاعات زبان‏شناسي نياز دارند، البته مشكلي كه اين روش‏ها دارند اين است كه احتمال خطا در آن‌ها بيشتر است هم¬چنين اجراي اين روش‌ها زمانبر مي‏باشند، درحالي كه روش‏هاي يادگيري ماشين كم¬تر به اطلاعات زبان‏شناسي نياز دارند و نتايج به¬دست آمده از آن‌ها قابل اعتماد‏تر است. تشخيص عبارت¬هاي هم مرجع با استفاده از رو¬ش¬هاي يادگيري ماشين و مبتني بر پيكره در سال¬هاي اخير در عمليات مرجع گزيني رونق فراوان پيدا كرده است و نتايج مناسبي را هم به دنبال داشته است. براي استفاده از چنين روش¬هايي نياز به يك پيكره برچسب گذاري شده با حجم مناسب مي¬باشد. در اين پايان‏نامه تلاش مي‏كنيم تا فرآيند تشخيص مرجع‏مشترك را مورد مطالعه قرار دهيم .به همين منظور بايد ركن هاي اساسي كار را كه پيكره نشانه‏گذاري شده و الگوريتم پيشنهادي پيش‏بيني عبارت‏هاي اسمي هم‏‏مرجع است را مبناي كار قرار دهيم. درهمين راستا، در قدم اول با استفاده از پيكره PCAC-2008 كه داراي نشانه‏هاي اشاره و هم‏مرجعي مي¬باشد، سيستمي ارائه مي‏كنيم كه اسم هاي هم¬مرجعي موجود در متن را شناسايي كرده و سپس با درنظرگرفتن ويژگي هاي مشخص شده نمونه‏هاي مثبت و منفي را از پيكره استخراج مي‏كنيم و با تغيير دادن ويژگي¬ها مي خواهيم به اين نتيجه برسيم كه كدام دسته از ويژگي¬هاي مشخص شده تأثير چشم¬گيري در بالا بردن يا پايين آوردن دقت و بازيابي و در نهايت معيار F دارند. در نهايت نيز با استفاده از الگوريتم‏ يادگيري پايه شبكه عصبي نمونه‏هاي حاصله را مورد ارزيابي و مقايسه قرار داديم. نتايج حاصل از اين پژوهش نشان مي‏دهد كه سيستم مورد نظر با به كارگيري شبكه عصبي و با درنظرگرفتن ويژگي¬هاي نحوي خاص، توانسته معيار F را براي فرايند تشخيص مرجع ضمير به 59.4 برساند كه نسبت به سايرين عملكرد بهتري است.

تاريخ ورود اطلاعات

1397/02/16

تاريخ بهره برداري

5/6/2018 12:00:00 AM

دانشجوي وارد كننده اطلاعات

لادن مداح

Name: لادن مداح
Author: لادن مداح

چكيده به لاتين

Coreference resolution is one of the major cores in natural language processing. There are important applications in areas such as answering questions, machine translation, automatic abstraction, and extraction of a well-known entity. The task of detecting homogeneous is the resolution of noun and pronouns in the text that refer to the same entity. Therefore, it is necessary to resolve the reference in order to understand the text documents or even the words. Methods coreference resolution can be divided into two categories of linguistic methods and machine learning methods. Linguistic techniques need more linguistic information, but the problem with these methods is that they are more likely to be error-prone. Also, implementation of these methods take time consuming, while machine learning methods require less linguistic information and their results are more reliable. Recognition of coreference expressions using machine learning and body-based learning algorithms has flourished in recent years in reference selection operations that has received allowable results. You need a well-sized labeled entity for using such methods. In this thesis, we are trying to study the process of joint reference detection. For this task, the base of the work is marked and the proposed algorithm to predict nominal expressions. In the first step, according to using the PCAC-2008 construct which has both pointing and coreference symbols, we present a system that identifies both the coreference names of the text. Next step we enumerate the specified attributes, and extract positive and negative samples from the corpus by changing the attributes. In this step we follow the purpose that which specified features have an impressive effect on raising or lowering precision, recall and F measure. In last step we using the basic learning algorithm of the neural network for comparing the obtained samples. The results show that the learner of the neural network has a better performance than other work by considering specific syntactic features.

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=18772&Field=0&DTC=6