محمد حسن سوهان آجيني

عنوان

بهبود روش‌هاي تطبيق با گوينده در شبكه‌ي عصبي عميق براي بازشناسي گفتار

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

هوش مصنوعي

تاريخ دفاع

آبان 1395

استاد راهنما

دكتر احمد اكبري

استاد مشاور

دكتر بابك ناصرشريف

دانشكده

كامپيوتر

چكيده

استفاده از شبكه‌هاي عصبي عميق در زمينه‌هاي مختلف در حال افزايش است. اين مدل‌ها در كاربردهاي گوناگوني مورد استفاده قرار گرفته‌اند و دقتي بالاتر از مدل‌هاي پيشين ارائه كرده‌اند. حوزه‌ي بازشناسي گفتار نيز از اين كاربردها مجزا نبوده و بكارگيري يادگيري عميق موجب افزايش دقت در بازشناسي گفتار شده است. يكي از مسايل مهم در زمينه‌ي پردازش گفتار، جبران افت دقت بازشناسي براي گويندگان جديد است. به عبارت ديگر در كاربردهاي عملي بازشناسي گفتار، نياز است تا مدل مستقل از گوينده كه با مجموعه‌ي دادگان آموزش ديده است، با گوينده ي جديد تطبيق داده شود. تطبيق مدل با گوينده موجب افزايش دقت بازشناسي خواهد شد. يكي از روش‌هاي تطبيق با گوينده كه براي مدل‌هاي مخلوط گوسي توسعه داده شده است، روش تحليل عامل نام دارد. در روش تحليل عامل سعي مي‌شود تا اجزاي تشكيل دهنده‌ي گفتار را مورد بررسي قرار دهيم و روابط بين عوامل را فرا بگيريم. در اين تحقيق ابتدا دو عامل واج و جنسيت توسط شبكه‌هاي گلوگاه استخراج مي‌شوند. پس از آن سعي مي‌شود تا رابطه ي بين دو عامل توسط شبكه‌ي تحليل عامل فراگرفته شود. براي بهبود روش تطبيق با گوينده، ابتدا ويژگي‌هاي گلوگاهي براي هر عامل را از دو شبكه با توابع فعاليت مختلف استخراج و به يكديگر الحاق مي‌كنيم. در مرحله ي دوم نيز براي آموزش شبكه‌ي تحليل عامل از نرون‌هاي تطبيق يافته بهره مي‌گيريم. نتيجه‌ي ارزيابي‌ها روي مجموعه‌ي دادگان TIMIT نشان مي‌دهد كه تحليل عامل موجب افزايش متوسط 2 درصدي در رده‌بندي تك واج مي‌شود. بكارگيري ايده‌ي الحاق ويژگي‌ها موجب متوسط 0.8 درصد بهبود و ايده‌ي بكارگيري نرون‌هاي تطبيق يافته موجب 0.6 درصد بهبود در رده‌بندي تك واج مي‌شوند. واژه‌هاي كليدي: شبكه‌ي عصبي عميق، تطبيق با گوينده، تحليل عامل، ويژگي‌هاي گلوگاه، بازشناسي گفتار.

تاريخ ورود اطلاعات

1396/03/02

تاريخ بهره برداري

1/1/1900 12:00:00 AM

دانشجوي وارد كننده اطلاعات

محمدحسن سوهان اجيني

Name: محمدحسن سوهان اجيني
Author: محمد حسن سوهان آجيني

چكيده به لاتين

Deep neural networks have been widely used in many research areas. These models have been used in various tasks where they show a better performance in comparison with previous conventional methods. Automatic Speech recognition (ASR) is one of applications that using of deep neural network tends to a superior recognition accuracy for ASR systems. One of the important issues in ASR is compensation of degraded accuracy because of new speakers. In other words, in real ASR applications, we need to adapt speaker independent model (which has been trained with training set), with a new speaker. Speaker adaptation methods tnes to a higher recognition accuracy for the new speaker. One of the speaker adaptation methods, developed for GMM models, is called Factor Analysis (FA). In the factor analysis methods, we investigate fundamental construction factors of speech signal and discover intera factor relations. In this research, we propose to use bottleneck networks in order to extract gender and phoneme factors. After learning factors, we use a factor analysis network to learn intera relationships between two factors. In the other words, in order to improve speaker adaptation method, at the frst step, we extract bottleneck features from two networks with different activation functions. At the second step, we use adapted neurons for factor analysis network. Evaluation on TIMIT database shows that factor analysis and bottleneck feature concatenation improve average recognition accuracy for monophones by 2% and 0.8%, respectively. In addition, using adapted neurons in factor analysis network, inscreases recognition accuracy for monophones by 0.6%. Keywords: deep neural network, speaker adaptation, factor analysis, bottleneck features, speech recognition

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=17269&Field=0&DTC=6