چكيده به لاتين
Nowadays, matrix multiplication is one of the computational operations that are used in many diverse applications. The implementation of matrix multiplication in general processing systems is non-optimal and inefficient, which happens due to the large amount of data transfer between memory and computing resources, and also the significant number of multiplication and addition operations. Utilizing systolic architecture in the design of the matrix multiplier unit can solve many challenges related to data transfer. In other words, in the systolic architecture, a certain number of processing elements are placed next to each other, and the data flows between them in a systolic manner in each cycle. On the other hand, many diverse applications are tolerant to a certain amount of error, so by using approximate computing and compromising the accuracy of calculations, a significant improvement in hardware performance can be made. The main goal of this research is to design the approximate matrix multiplier units based on systolic architecture. In the proposed approximate units, the exact multipliers in the processing elements are replaced with approximate multipliers. Based on this, in the first step, a large number of approximate multipliers based on disregarding carry were designed to be placed in the processing elements. The proposed unsigned approximate multipliers (CDM8) have improved critical path delay, power consumption and area by 29%, 29% and 30% on average compared to the exact multiplier. Furthermore, 35 approximate signed multipliers (SCDM8) were designed, which improved the mentioned criteria by 26.6%, 27.7% and 21%, respectively, compared to the exact signed multiplier. By using SCDM8s in processing elements, 35 units of approximate matrix multipliers based on systolic architecture were designed, which compared to the exact multiplier unit, have improved critical path delay, power consumption and area 29.9%, 14.6%, and 10.1% on average. Moreover, the proposed units were able to improve the criteria of critical path delay, power consumption and MRED by 25.4%, 7.3% and 54%, respectively, compared to the existing approximate systolic multiplier units that were presented in recent years. But as for area, they were almost similar to each other. Thus, the proposed units struck a better balance between accuracy and hardware criteria.