چكيده به لاتين
Due to the advancement of information technology, different institutions require information from other institutions, while inter-institutions may not have mutual trust and do not want to disclose personal data, which is why institutions refuse to share their data. So we need some solutions that protects privacy while data is being released or transmitted, and that institutions can share data with each other. For this reason, privacy solutions are recognized as one of the important aspects among researchers. Anonymity is one of the most important privacy solutions. Most of the research done for anonymity is not suitable for big data, in fact, most of the algorithms in this research are either not fast enough or they cannot be distributed on multiple machines. Only a few finite algorithms have been introduced to fit the big data, such as the mondarin algorithm and its extensions.
In this research, an algorithm for big data anonymization is presented which can be distributed on several machines in addition to having fast speed. The proposed algorithm also supports string data, unlike previous algorithms, such as the Mondrain algorithm and its extension (Zakrzadeh algorithm). The product resulting from the implementation of this algorithm introduces data-driven institutions to tools that enable them to anonymize their big data and share it with other institutions.
The proposed algorithm is implemented in Python language under Spark's computational model and has been used in several experiments for evaluation. The results of these experiments show that the performance of the algorithm is higher than the previous algorithms, yet the usefulness criterion is maintained at the level close to them.