چكيده به لاتين
Discretization of numerical data is one of the most effective
data preprocessing tasks both in knowledge discovery and data mining.
Discretization models convert numerical values into categorical intervals.
Unlike unsupervised methods that use simple rules to discretize continuous
attributes, supervised discretization algorithms take the class label
of attributes into consideration to achieve high accuracy. Supervised
discretization process on continuous features encounters two significant
challenges, Firstly, noisy class labels affect the effectiveness of discretization.
Secondly, due to the high computational time of supervised algorithms
in large-scale datasets and Big Data environment, efficiency would be decreased.
Accordingly, to address the challenges, we devise a statistical unsupervised
method named as SUFDA. In fact, SUFDA can empirically achieve
a low temporal complexity. On the one hand, it can maintain the trade off between accuracy and efficiency concurrently. On the other hand, our
discretization model targets high accuracy of supervised approaches. We
conduct a comprehensive performance evaluation on multiple publicly
available datasets. The results show that our unsupervised system obtains
a better effectiveness compared to other discretization baselines. At
the same time, we prove that our model gains a better time complexity
compared to the supervised approaches.