آسيه قنبرپور ليمويي

عنوان

جستجوي كليدواژه در پايگاه داده هاي گرافي با تاكيد بر وزن كليدواژه ها

مقطع تحصيلي

دكتراي تخصصي

رشته تحصيلي

نرم افزار

سال تحصيل

۱۳۹۱

تاريخ دفاع

۱۳۹۷/۸/۲۳

استاد راهنما

دكتر حسن نادري

دانشكده

كامپيوتر

چكيده

جستجوي كليدواژه به عنوان جايگزيني براي زبان‌هاي پرس‌وجوي ساخت‌يافته، يك واسط ساده و كاربرپسند را به منظور جستجو و بازيابي اطلاعات از پايگاه‌داده‌هاي با ساختار گرافي فراهم مي‌كند. اين روش نسبت به روش‌هاي كلاسيك بازيابي اطلاعات در پايگاه‌داده‌ها، انتزاع كاربر از ساختار فضاي جستجو را حفظ مي‌كند. پرس‌وجوهاي كليدواژه به صورت مجموعه‌اي از كليدواژه‌ها بيان¬مي¬شود و پاسخ‌هاي آنها به شكل مجموعه‌اي از ساختارهاي متصل هستند كه روابط بين كليدواژه‌هاي موردپرسش را در گراف نشان مي‌دهند. سادگي بيان پرس‌وجو در اين روش موجب شده است تا پيچيدگي كار با داده‌هاي گرافي تماما به مرحله پردازش پرس‌وجو واگذار شود. در نتيجه، پاسخگويي به پرس‌وجوهاي كليدواژه، نيازمند پردازش پيچيده متني و ساختاري داده‌هاي گرافي مي¬باشد. يكي از چالش‌هاي عمده در پردازش پرس‌وجوي كليدواژه، بازيابي مجموعه پاسخ‌هاي مرتبط به پرس‌وجو است كه عموما به دليل اندازه بزرگ اين مجموعه، نيازمند زمان طولاني پردازش مي¬باشد. در اين رساله، روش‌هايي براي بازيابي پاسخ‌هاي يك پرس‌وجو با تاكيد بر حفظ يك نظم تقريبي از ترتيب نهايي آن‌ها ارائه شده‌است. اين روش‌ها با تخمين تقريبي وزن پاسخ‌هاي كامل-نشده، سعي دارند پاسخ‌هاي برتر را قبل از ديگر پاسخ‌ها بازيابي نمايند. بازيابي پاسخ‌ها در يك نظم تقريبي، امكان ارائه مجموعه پاسخ‌هاي برتر قبل از بازيابي مجموعه كل پاسخ‌هاي مرتبط را فراهم مي‌كند. اين روش‌ها از شاخص‌گذاري، بخش‌بندي و هرس گراف داده در جهت افزايش ميزان بهره‌‌وري سيستم استفاده مي‌كنند. دومين چالش عمده در روش‌هاي جستجوي كليدواژه، تعيين درجه ارتباط پاسخ‌هايي به شكل زيرگراف به پرس‌وجوي كاملا متني متناظر است. درجه اين ارتباط به محتواي متني پاسخ و فشردگي ساختاري آن بستگي دارد. اين چالش در ادبيات موضوع به ندرت مورد بحث و مطالعه قرار گرفته¬است، در حالي كه دقت سيستم جستجوي كليدواژه كاملا به ترتيب ليست پاسخ‌ها وابسته است. در اين رساله، درجه ارتباط پاسخ‌ها به پرس‌وجو بر اساس مدل¬سازي پاسخ و پرس‌وجو و محاسبه نزديكي اين مدل‌ها برآورد مي‌شود. در مدل‌سازي يك پاسخ، ويژگي‌ها ساختاري پاسخ به همراه وزن كليدواژه‌ها در هر گره تا سطح خصيصه در يك مدل واحد تجميع مي‌شوند. اين مدل به طور مستقيم روي زيرگراف‌ها طراحي شده و قادر به حفظ اهميت محلي واژه‌ها در گره‌ها است. پرس‌وجو نيز به دو روش ساده و توسعه‌يافته مدل‌سازي مي‌شود. مدل ساده پرس‌وجو بر اساس كليدواژه‌هاي ورودي كاربر برآورد مي‌شود، در حالي‌كه در مدل توسعه‌يافته، از اطلاعات شبه‌بازخورد براي توسعه پرس‌وجو و تخمين مدل آن استفاده مي‌شود. سيستم‌هاي پيشنهادي در اين رساله در قالب يك چارچوب كلي شامل مدل‌سازي داده‌ها، شاخص‌گذاري داده‌هاي گرافي، جستجوي پاسخ‌هاي مرتبط و رتبه‌بندي ليست پاسخ‌ها طراحي شده‌اند. نتايج ارزيابي تجربي اين سيستم‌ها روي سه مجموعه دنياي واقعي، اثربخشي و بهره‌وري سيستم‌هاي پيشنهادي نسبت به ديگر سيستم‌هاي مطرح در حوزه جستجوي كليدواژه را تاييد مي‌كند.

تاريخ ورود اطلاعات

1397/09/11

عنوان به انگليسي

Keyword Search on Graph Data Focusing on the Weights of Keywords

تاريخ بهره برداري

11/14/2018 12:00:00 AM

دانشجوي وارد كننده اطلاعات

اسيه قنبرپورليمويي

Name: اسيه قنبرپورليمويي
Author: آسيه قنبرپور ليمويي

چكيده به لاتين

Keyword search, as an alternative for structured query languages, provides a simple and user-friendly interface for searching and retrieving information from the graph-structured database. In contrast to the classical retrieval methods in databases, keyword search preserves the user's abstraction from the database structure. Keyword queries are expressed as a set of keywords, and their answers are in the form of a set of connected structures that show the relationships between the queried keywords in the database. The simplicity of querying in this way of search has caused the complexity of working with the graph data has been postponed from the querying stage to the query processing stage. Therefore, answering keyword queries requires sophisticated textual and structural data processing. One of the major challenges in keyword query processing is to retrieve a query-related answer set, which generally requires a long processing time due to the large size of the set. In this thesis, some methods have been developed to retrieve the answers of queries with an emphasis on maintaining an approximate order of their final ranking. These methods, with an approximate estimate of the weight of uncompleted answers, attempt to retrieve superior answers before the other ones. Enumerating answers with an approximate order allows providing a set of top-k answers before retrieving the entire set of answers. These methods also increase the efficiency of the system by limiting the search space using the indexing, partitioning and pruning techniques. The second major challenge in keyword search is to determine the relevance degree of an answer which is in the form of subgraph to a textual query. The degree of this relationship depends on the textual content of the answer and its structural compactness. This challenge is rarely discussed in the literature, while the effectiveness of keyword search system depends entirely on the order of presented answers. In this thesis, the relevance degree of answers to the query is determined based on the modeling of answers and queries and calculating the similarity of these models. In the answer modeling, the structural characteristics of the answer along with the weight of queried keywords in each node to the attribute level are aggregated into a single model. This model is designed directly on the subgraphs and is able to maintain the local importance of the keywords. Query is also modeled in two simple and developed ways. A simple query model is estimated based on the user input keywords, while in the developed model, feedback information is used to develop queries and to provide a more accurate estimate of what the user looking for. The proposed systems in this study are designed in a general framework including data modeling, indexing the graph data, retrieving relevant answers, and ranking the answer list. The results of the experimental evaluation of these systems on three real-world datasets confirm the efficiency and effectiveness of these systems compared to the state-of-the-art systems in the field of keyword search.

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=19745&Field=0&DTC=6