حامد رياضتي سرشت

عنوان

بهبود عملكرد دسته‌بندي صداهاي محيطي بر مبناي مدل يادگيري عميق با ابعاد و پيچيدگي كم

مقطع تحصيلي

دكتري

رشته تحصيلي

مهندسي برق- الكترونيك

سال تحصيل

1394

تاريخ دفاع

1402/01/27

استاد راهنما

دكتر كريم محمدي

دانشكده

مهندسي برق

چكيده

دسته‌بندي صداهاي محيطي يكي از موضوعات مهم در گستره مختلفي از كاربرد‌¬¬‌ها، (مثل شهر‌‌هاي هوشمند و نظارت صوتي) است، كه به دلايل مختلفي از جمله الگوهاي كمتر ساختار‌يافته، ايستايي كم، و تنوع زياد درون- و بين-كلاسي، يكي از چالشي‌ترين حوزه‌هاي بازشناسي صوت است. اگرچه در سال‌هاي اخير، روش‌هاي مبتني بر يادگيري عميق عملكرد روش‌هاي سنتي را بهبود داده‌اند، اما اين بهبود‌ها عمدتاً با افزايش عمق، پيچيدگي محاسباتي، و اندازه شبكه همراه بوده است. در اين رساله، يك مدل مبتني بر شبكه‌هاي عصبي پيچشي با اندازه و پيچيدگي محاسباتي كم ارائه مي‌كنيم كه در طرح آن، ضمن الگوبرداري از سيستم شنوايي انسان، با توجه به مشخصات زماني و فركانسي صداهاي محيطي روش ادغام ميانگين عمومي نويني را به نام ادغام تنك مناطق برجسته پيشنهاد مي‌كنيم. با در نظر گرفتن تنوع بسيار زياد الگوهاي ورودي به عنوان اصلي‌ترين مانع براي يادگيري بهينه، روش ادغام ويژگي پيشنهادي با ايجاد يك گلوگاه اطلاعاتي، مدل را به يادگيري مؤثر از الگو‌هاي موجود در مناطق برجسته ورودي راهنمايي مي‌كند. نتايج بدست آمده از ارزيابي‌ها نشان مي‌دهند كه مدل پيشنهادي دقتهاي %86.7 و %94.8 را به ترتيب به‌روي مجموعه‌هاي ESC-50 و ESC-10 توليد مي‌كند، دقت‌هايي كه با وجود پيچيدگي محاسباتي و اندازه مدل بسيار كمتر از روش‌هاي سرآمد قابل مقايسه با دقت‌هاي اين روش‌ها هستند. به‌علاوه، مدل پيشنهادي با ابعادي به اندازه %98 كوچكتر نسبت به مدل مبنا به طرز چشمگيري موفق به توليد %21.8 بهبودي مطلق به روي مجموعه ESC-50 شده‌است. همچنين، در تلاش براي مواجه با مسأله كمبود داده آموزش، رويكرد انتقال يادگيري را مورد بررسي قرار‌داده و روش جديدي براي ارتقاء آن ارائه مي‌كنيم. ايده اصلي در روش پيشنهادي، محدود نمودن به‌روز‌رساني در به زيرگروهي از نورون/كرنل‌ها است كه مسئول اصلي خطا در دسته‌بندي كلاس‌هاي مختلف تشخيص داده ‌شده‌اند. در اين راستا، يك مسئله بهينه‌سازي تو‌در‌تو طرح و رويكردي تكاملي براي پاسخ به آن پيشنهاد مي‌كنيم. ارزيابي روش پيشنهادي بيانگر توليد بهبودي مطلق در دقت دسته‌بندي نسبت به روش مرسوم تطبيق به اندازه‌هاي %1.9 و %2.3 به ترتيب به روي مجموعه‌هاي ESC-50 و ESC10 هستند؛ بهبودي كه نه با اضافه نمودن داده مصنوعي بلكه با بهره‌برداري مؤثر‌تر از دانش موجود در شبكه از پيش آموزش ديده بدست آمده‌است.

تاريخ ورود اطلاعات

1402/07/25

عنوان به انگليسي

Environmental Sound Classification based on a Small-Size Low-Complexity Deep Learning Model and Improved Transfer Learning

تاريخ بهره برداري

1/1/1900 12:00:00 AM

دانشجوي وارد كننده اطلاعات

حامد رياضتي سرشت

Name: حامد رياضتي سرشت
Author: حامد رياضتي سرشت

چكيده به لاتين

Environmental Sound Classification (ESC) is an important field in a broad range of applications, such as smart cities, audio surveillance, and health care. Our aim in this thesis is to propose new approaches to deal with challenges of ESC. As the main challenges of ESC, less static and more unstructured patterns, and lower signal-to-noise ratio than other audio signals (such as speech and music), and large inter- and intra-class variations in different sound classes can be mentioned, which make ESC more challenging than other audio-related classification tasks. Recently, utilizing deep learning approaches have taken the lead from traditional approaches and have produced promising results. However, the achieved improvements are often accompanied by increasing depth, computational complexity, and size of the network, and also require large amounts of labelled training data. In this thesis, we present a new small-size low-complexity model based on convolutional neural networks for ESC. Taking spectral and temporal characteristics of environmental sounds, and inspired by the human auditory system, our model jointly processes spectral and temporal patterns of a two-dimensional time-frequency input representation, which is extracted via using a log-scale frequency axis. Also, by considering the large variations of input patterns as one of the main obstacles to learn efficiently from input patterns, we propose a new global feature pooling method, called Sparse Salient Region Pooling (SSRP). Via imposing a regional bottleneck, the proposed SSRP guides the model to effectively learn from the more salient time-frequency regions. The experimental results demonstrate that the proposed model yields accuracies of 86.7% and 94.8% on ESC-50 and ESC-10, respectively, which are comparable to that of the state-of-the-art methods but are obtained under much less computational complexity and model size. Compared to the baseline model, our model strikingly achieves absolute improvement of 21.8% in accuracy on ESC-50, with 98% smaller model size. In order to deal with the insufficient labelled data issue, we focus on transfer learning approach where a network pre-trained on a related large-scale dataset is adapted to the target task. We present a new adaptation method in which the main idea is to concentrate the fine-tuning process only on those neurons/kernels that do need changes and have the greatest impact on misclassifying target data. To identify these neurons/kernels, we pose a nested optimization problem for which we propose an effective evolutionary approach as solution. Compared to the conventional fine-tuning approach, our proposed method achieves absolute improvements of about 1.9% and 2.3% in accuracy on ESC-50 and DCASE-17, respectively; remarkable improvements produced not by adding augmented data but with a more efficient utilization of knowledge stored in the pre-trained network.

كليدواژه هاي فارسي

دسته‌بندي صداهاي محيطي , يادگيري عميق , انتقال يادگيري , برجستگي منطقه‌اي

كليدواژه هاي لاتين

Environmental sound classification , deep learning , transfer learning , regional saliency

Author

Hamed Riazati Seresht

SuperVisor

Karim Mohammadi

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=28913&Field=0&DTC=6