چكيده به لاتين
Deep neural networks have been widely used in many research areas. These models have been used in various tasks where they show a better performance in comparison with previous conventional methods. Automatic Speech recognition (ASR) is one of applications that using of deep neural network tends to a superior recognition accuracy for ASR systems.
One of the important issues in ASR is compensation of degraded accuracy because of new speakers. In other words, in real ASR applications, we need to adapt speaker independent model (which has been trained with training set), with a new speaker. Speaker adaptation methods tnes to a higher recognition accuracy for the new speaker.
One of the speaker adaptation methods, developed for GMM models, is called Factor Analysis (FA). In the factor analysis methods, we investigate fundamental construction factors of speech signal and discover intera factor relations.
In this research, we propose to use bottleneck networks in order to extract gender and phoneme factors. After learning factors, we use a factor analysis network to learn intera relationships between two factors. In the other words, in order to improve speaker adaptation method, at the frst step, we extract bottleneck features from two networks with different activation functions. At the second step, we use adapted neurons for factor analysis network.
Evaluation on TIMIT database shows that factor analysis and bottleneck feature concatenation improve average recognition accuracy for monophones by 2% and 0.8%, respectively. In addition, using adapted neurons in factor analysis network, inscreases recognition accuracy for monophones by 0.6%.
Keywords: deep neural network, speaker adaptation, factor analysis, bottleneck features, speech recognition