چكيده به لاتين
Software programs are developed and go into the production stage in a daily manner. One of the goals of programmers is to produce flexible programs that can be updated quickly and easily. Unfortunately, various reasons such as the lack of time, high costs, workload, as well as the negligence of the programmers themselves lead to the production of software with inappropriate, complex, and unchangeable structures. These complex structures often make the software development process impossible and ultimately destroy the entire system. Developers seek to identify and refactor these complex structures, known as software anti-patterns in software systems as quickly as possible. To identify and refactor software anti-patterns, a lot of research has been performed with the help of artificial intelligence, especially machine learning approaches. These works aim to identify anti-patterns more accurately, faster, and also automatically. To the best of our knowledge, until now, no comprehensive method has been introduced for identifying and refactoring all possible anti-patterns. Also, in several methods, the problem of identification was considered as a classification problem and hence required the extraction of a lot of features from the code structure. Although some of the previous works have been very accurate in identifying anti-patterns, there is still a need for a method that, in addition to system features, reveals the relationships between different structures. There is also a need for a practical method that is trained according to the characteristics of each program or system and performs the process of anti-pattern identification as well as refactoring accordingly.
In this dissertation, a framework based on probabilistic graphical models is proposed for both the identification and refactoring of anti-patterns. In this model, first, the classes, the relationships between them, and the characteristics of each class are extracted from the source code and then these entities are mapped to a graphical model. This mapping has been implemented using various structures and based on the existing causal relationships. Finally, by having code anti-patterns, a well-proportioned Bayesian network is trained, which determines the probability of the presence or absence of anti-patterns based on the characteristics of neighboring classes. For evaluating the proposed approach, the model is trained on six different anti-patterns and six different Java programs. The proposed model has identified these anti-patterns with an average accuracy of 85.16% and a recall of 79%. Besides, with the help of this model, several methods for refactoring have been introduced and it has been proved that these refactoring methods will eventually lead to the creation of a system with more cohesion and less coupling. The major aim of this work is to provide a model to maintain the relationships between the software classes and to represent a complete mapping of the code. Also, this model can be expanded with other anti-patterns without causing any problems in the learning process which makes the approach easily extendable.