مهدي بگوند

عنوان

يك روش توزيع‌شده براي گمنام‌سازي داده‌هاي كلان

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

نرم افزار

سال تحصيل

1395

تاريخ دفاع

1399/3/17

استاد راهنما

دكتر محمد عبداللهي ازگمي - دكتر محسن شريفي

دانشكده

كامپيوتر

چكيده

با توجه به پيشرفت فناوري اطلاعات، نهادهاي مختلف نيازمند داده‌هاي ديگر نهادها هستند درحالي‌كه بين نهادها ممكن است اعتماد دوطرفه نباشد و نمي‌خواهند كه داده‌هاي شخصي افراد فاش شود به همين دليل نهادها از به اشتراك گذاردن داده‌هاي خود مقاومت مي‌كنند. پس نيازمند روش‌هايي هستيم كه در حين انتشار يا انتقال داده‌ها، حريم خصوصي را حفظ كند و نهادها به كمك اين روش‌ها بتوانند داده‌ها را باهم به اشتراك بگذارند. به همين دليل، روش‌هاي حفظ حريم خصوصي به‌عنوان يكي از جنبه‌هاي مهم در ميان محققان شناخته‌شده است. روش‌هاي گمنام‌سازي، مهم‌ترين روش‌هاي حفظ حريم خصوصي محسوب مي‌شوند. اغلب پژوهش‌هايي كه براي گمنام‌سازي انجام شده است متناسب كلان داده نيست در واقع اغلب الگوريتم‌هاي حاصل از اين پژوهش‌ها، يا سرعت مناسبي ندارند يا اينكه قابليت توزيع‌شدگي روي چندماشين را ندارند. فقط چند الگوريتم محدود، متناسب با كلان داده معرفي شده است به‌عنوان مثال الگوريتم ماندرين و بسط‌هاي آن. در اين پژوهش، يك الگوريتم به‌منظور گمنام‌سازي كلان داده ارائه شده است كه علاوه بر داشتن سرعت مناسب، قابليت توزيع‌شدگي روي چند ماشين را خواهد داشت. الگوريتم‌هاي قبلي به دليل سرعت پايين و عدم قابليت توزيع‌شدگي امكان اجرا بر روي دادگان بزرگ را نداشتند. الگوريتم پيشنهادي، برخلاف الگوريتم‌هاي پيشين، نظير الگوريتم ماندرين و بسط آن (الگوريتم ذاكرزاده)، داده‌هاي رشته‌اي را نيز پشتيباني مي‌كند. الگوريتم پيشنهادي با زبان پايتون و تحت چارچوب اسپارك پياده‌سازي شده و در آزمايش‌هاي متعددي جهت ارزيابي، مورد استفاده قرارگرفته است. نتايج اين آزمايش‌ها نشان مي‌دهد كه سرعت الگوريتم تقريبا سه برابر الگوريتم‌هاي ماندرين و ذاكرزاده بوده، در عين حال معيار سودمندي در سطح نزديك به آن‌ها حفظ شده است.

تاريخ ورود اطلاعات

1399/04/14

عنوان به انگليسي

A Distributed Method for Anonymization of Big Data

تاريخ بهره برداري

6/7/2021 12:00:00 AM

دانشجوي وارد كننده اطلاعات

مهدي بگوند

Name: مهدي بگوند
Author: مهدي بگوند

چكيده به لاتين

Due to the advancement of information technology, different institutions require information from other institutions, while inter-institutions may not have mutual trust and do not want to disclose personal data, which is why institutions refuse to share their data. So we need some solutions that protects privacy while data is being released or transmitted, and that institutions can share data with each other. For this reason, privacy solutions are recognized as one of the important aspects among researchers. Anonymity is one of the most important privacy solutions. Most of the research done for anonymity is not suitable for big data, in fact, most of the algorithms in this research are either not fast enough or they cannot be distributed on multiple machines. Only a few finite algorithms have been introduced to fit the big data, such as the mondarin algorithm and its extensions. In this research, an algorithm for big data anonymization is presented which can be distributed on several machines in addition to having fast speed. The proposed algorithm also supports string data, unlike previous algorithms, such as the Mondrain algorithm and its extension (Zakrzadeh algorithm). The product resulting from the implementation of this algorithm introduces data-driven institutions to tools that enable them to anonymize their big data and share it with other institutions. The proposed algorithm is implemented in Python language under Spark's computational model and has been used in several experiments for evaluation. The results of these experiments show that the performance of the algorithm is higher than the previous algorithms, yet the usefulness criterion is maintained at the level close to them.

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=22160&Field=0&DTC=6