چكيده به لاتين
Big data analysis requires features of both categories: it must discover the pattern of data behaviors based on a large volume of previous data; and, at the same time, be adaptive and act operationally in real time. Combining batch and stream processing into a hybrid approach can aggregate the advantages of both methods, creating processes that are capable of real time, high-speed calculations on a large volume of data. The process of data pattern recognition is possible through batch learning techniques but they cannot be adaptive and identify new patterns in real time. In contrast, stream processing operates in real time and its learning techniques are fast and incremental. Stream learning techniques are able to be influenced by recent data but their accuracy are usually less than batch learning methods. Therefore, by combining batch and stream processing in the form of hybrid processing, the advantages of both methods can be aggregated with each other and achieved processes that are capable of real-time and high-speed calculations on the large volume of data. Most of the researches in the field of adapting and making machine learning algorithms compatible with hybrid processing are only limited to the design of suitable infrastructure for the realization of hybrid architecture and its use in various applications. This is despite the fact that the algorithmic aspects of hybrid processing such as the specifications of batch and stream learning algorithms compatible with these processes, communication models between batch and stream processing units, and rules for combining the results of different processing layers have not been addressed. To solve these challenges, in this thesis we provide a combined solution, distributed and compatible with machine learning algorithms called HDBS, and focus on the algorithmic aspects of hybrid processing After that, we will focus on the challenge of selecting features compatible with hybrid processing. Because hybrid processing generally deals with data streams, batch feature selection techniques are not practical for them. On the other hand, although stream feature selection techniques can be used in hybrid processing, none of the existing methods use the capacity of hybrid processing to select appropriate features, so they are not compatible with the nature of hybrid processing. Therefore, in the second part of the thesis, we propose the EHDBS solution as an Enhanced version of HDBS by using a dynamic and compatible feature selection method with hybrid processing, as well as the intelligent selection of basic models for combination. The evaluations show that the proposed solution is effective in increasing the accuracy and speed of hybrid processing compared to individual batch and stream processing as well as hybrid processes without using dynamic feature selection.