محمدصادق سوهاني

عنوان

راهكاري جهت جستجوي كليدواژه در گراف به صورت توزيع شده با استفاده از فنون پردازشي كلان‌داده

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

نرم افزار

تاريخ دفاع

1398/10/29

استاد راهنما

دكتر حسن نادري

دانشكده

كامپيوتر

چكيده

امروزه علاقه به جستجوي كليدواژه‌ها براي پاسخگويي به نيازهاي اطلاعاتي كاربران در حجم انبوهي از منابع به شدت در حال رشد مي‌باشد. بنابراين ارائه روش‌ها و الگوريتم‌هايي كه كاربران را به‌سادگي قادر سازد كليدواژه مورد نظرشان را فارغ از قواعد نحوي پيچيده در داده‌هاي گرافي بصورت كارا مورد جستجو قرار دهند ضروري مي‌نمايد. در اين حالت تمركز جستجوي كليدواژه بر پيدا كردن زيرساخت‌هاي گرافي شامل كليدواژه‌ها ورودي است. اكثر روش‌هاي موجود در اين زمينه درخت‌هاي كمينه‌ متصل را كه تمام كليدواژه‌ها را پوشش دهند پيدا مي‌كنند. بعضي از مطالعات و تحقيقات اخير يافتن زيرگراف‌ها را به‌جاي درخت‌هاي كمينه به دليل اينكه اطلاعات بيشتري در اختيار كاربران قرار مي‌دهند، پيشنهاد مي‌نمايند. به دليل حجم بالاي داده‌هاي گرافي و پردازش پرهزينه بر روي يك ماشين، ايده‌ي ارائه شده به صورت توزيع شده عمل جستجوي كليدواژه را انجام مي‌دهد به اين معني كه گراف را به بخش‌هايي تقسيم كرده و هر بخش را بر روي يك ماشين منتقل مي‌كند، سپس الگوريتم جستجو در هر ماشين اجرا شده و در نهايت نتايج در يك ماشين جمع ميشوند. الگوريتم جستجو در هر ماشين براساس ايده يافتن كليك‌هاي حاوي كليدواژه بر اساس الگوريتم‌هاي مبتني بر بران-كرباش و لاولر اقدام به يافتن پاسخ‌هاي مناسب مي‌نمايد. علاوه بر اين، با حداقل كردن وزن رئوس متوالي، علاوه بر حداكثر نمودن ارتباط معنايي بين كليدواژه‌ها متوالي، كيفيت پاسخ‌هاي تقريبي توليدي را نيز افزايش ميدهيم چرا كه پاسخهاي توليدي توسط روش‌هاي پيشنهادي ما داراي حداكثر فاصله بين رئوس r مي‌باشند. بنابراين از جمله مزاياي روشهاي پيشنهادي مي‌توان به افزودن قابليت پردازش موازي و توزيع‌شده و افزايش كارايي و كيفيت اشاره نمود.

تاريخ ورود اطلاعات

1398/11/03

عنوان به انگليسي

Distributed keyword search on graph data using big data processing techniques

تاريخ بهره برداري

1/19/2020 12:00:00 AM

دانشجوي وارد كننده اطلاعات

محمدصادق سوهاني

Name: محمدصادق سوهاني
Author: محمدصادق سوهاني

چكيده به لاتين

Nowadays, the interest in searching for keywords to meet the information needs of users is growing rapidly in large volumes of resources. Therefore, it is necessary to provide methods and algorithms that will enable users to easily search their keyword without any complicated syntax in the graphical data. In this case, the keyword search focuses on finding the graphical infrastructure including the input keywords. Most methods found in this field find connected minimal trees that cover all keywords. Some recent studies suggest finding subgraphs rather than minimal trees because they provide more information to users. Due to the high volume of graphical data and costly processing on a machine, the idea of distributed computation a keyword search, meaning that the graph is divided into segments and transferred to each segment on a machine, then the search algorithm is performed on each machine and the results are then aggregated into one machine. The search algorithm in each machine is based on the idea of finding clique of keywords based on Bron-Kerbosch and Lawler. In addition, by minimizing the weight of consecutive vertices, in addition to maximizing the semantic relationship between consecutive keywords, we also increase the quality of approximate production responses because the responses produced by our proposed methods have the maximum distance between vertices r. Therefore, one of the advantages of the proposed methods is the addition of parallel and distributed processing capability and increased efficiency and quality.

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=21654&Field=0&DTC=6