چكيده به لاتين
Entity resolution refers to the process of identifying and integrating records belonging to unique entities. The standard methods are using a rule-based or machine learning models to compare and assign a point, to indicate the status of matching or non-matching the pair of records. However, a comprehensive comparison across all the records pairs leads to quadratic matching complexity. Therefore blocking methods are using before the matching, to group the same entities into small blocks. Then the matching operation is done comprehensively. Several blocking methods provided to efficiently block the input data into manageable groups, which generally categorized into schema-based blocking techniques, schema-agnostic blocking techniques, block processing techniques, and meta-blocking techniques. Most of these methods typically do not consider semantic relationships among records. In this paper, we propose an efficient blocking strategy in entity resolution using deep learning. The proposed method is a semantic-aware meta-blocking approach. It considers the semantic similarity of records by applying locality-sensitive hashing (LSH) based on word embedding to achieve fast and reliable blocking in a large-scale data environment. To improve the quality of the blocks created, it builds a weighted graph of semantically similar records and prunes the graph edges. We extensively compare our proposed method with 18 existing blocking methods, using three real-world data sets. The experimental results show that our proposed method significantly outperforms all 18 methods with respect to two relevant measures, F-measure and pair-quality measure.