چكيده به لاتين
In recent years, deep neural networks have been used for a wide range of tasks such as image classification, object detection and speech recognition. As the applications of deep neural networks are growing, their capacity and scale are also growing, which requires more processing and memory resources for inference and training such networks. For deep neural networks to work properly, they usually have to be trained with a large data set to adapt the networks to specific applications. Such training usually takes days, weeks or even longer and consumes a significant amount of energy. Therefore, there are high demands for accelerating the training process through developing new algorithms or specialized accelerator architectures. However, there are still not many accelerators for training deep neural networks and most of the existing accelerator architectures focus on inference. The training process generally involves forward propagation, back propagation and weight update operations, which involve repetitive processing of these operations, which are mainly composed of addition and multiplication operations, which consume high memory and energy and create a challenge for hardware implementation. To address the challenges mentioned, recently, most of the efforts to reduce the processing workload and reduce the training time of neural networks are focused on approaches such as distributed training, data compression, and low-precision training.
In this research, we propose a new data-aware compression approach called S2NN to exploit sparsity and similarity in input vectors, weights and gradients in the training phase for different hardware accelerators of deep neural networks and propose a new accelerator architecture to implement data flow to reduce the irregularities created. To evaluate the performance of the proposed method, this method was tested on three deep neural networks. Our results show that the proposed method is 27.1× and 59.8× superior to the Eyeriss accelerator in terms of average performance and energy consumption, respectively. Also, it is 4.7×, 6.8×, 3.9× and 2.6× faster compared to accelerators that are aware of sparsity and similarity. It reduces the average energy consumption compared to them by 3.5×, 7.2×, 3.6× and 2.5×.