محمدرضا دزياني

عنوان

بازشناسي حالت چهره با استفاده از شبكه‌هاي عميق در تصاوير كنترل‌نشده

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي برق - سيستمهاي الكترونيك ديجيتال

سال تحصيل

شهريور 1400

تاريخ دفاع

شهريور 1400

استاد راهنما

احمد آيت اللهي

دانشكده

مهندسي برق

چكيده

حالت چهره يكي از غالب‌ترين و طبيعي‌ترين ابزارها براي ارتباط غيركلامي (به بيان ديگر بيان احساسات) است كه نقش بسيار مهمي در برقراري ارتباط ميان انسان‌ها ايفا مي‌كند و تشخيص آن براي انسان‌ها به منظور درك رفتار متقابل و حالت احساسي از اهميت بالايي برخوردار است. بازشناسي حالت چهره (FER) يكي از چالش برانگيزترين موضوعات در بينايي كامپيوتر است كه به دليل هم‌بستگي زياد بين حالت چهره و حالت ذهني و وجود چالش‌هاي گوناگون همانند وجود انسداد بر روي چهره، تفاوت در موقعيت قرارگيري، نورپردازي، حالت چهره و پس زمينه، تحقيقات زيادي در رابطه با آن صورت گرفته است. همچنين، از يك طرف، به دليل كوچك بودن مجموعه‌هاي داده موجود و توزيع نامتوازن حالات چهره در آن‌ها، هنوز آموزش يك شبكه‌ي عصبي عميق براي FER يك موضوع بسيار پرچالش است. از طرف ديگر، اين شبكه‌ها از مشكلاتي همانند بيش‌برازش، راندمان يادگيري ناكافي و پيچيدگي محاسباتي بالا نيز رنج مي‌برند. FER خودكار با استفاده از تصاوير ايستا، شامل سه مرحله اصلي پيش‌پردازش تصوير، استخراج ويژگي و دسته‌بندي است. با استفاده از شبكه‌هاي عميق مي‌توان مراحل استخراج ويژگي و دسته‌بندي را به صورت يك مرحله واحد (آموزش سرتاسر) انجام داد. در اين پايان‌نامه، يك روش آموزش سرتاسر دو مرحله‌اي (آموزش شبكه با مجموعه آموزش اصلي و سپس با مجموعه آموزش نهايي) با استفاده از يك شبكه عصبي پيچشي عميق با تعداد پارامترهاي كم براي حل مشكل‌هاي مطرح شده پيشنهاد شده است. از دو روش نرمال‌سازي قابل تعويض و برون‌اندازي بلوكي براي نظام‌بخشي استفاده كرديم. هم‌چنين، به منظور حل مشكل تعداد داده آموزش ناكافي و عدم توازن دسته‌ها در مجموعه‌ي داده، از روش‌هاي گوناگون داده‌افزايي براي تمامي دسته‌ها به جز دسته‌ي اكثريت و تابع هزينه كانوني براي تضمين يادگيري يكسان مدل بين دسته‌هاي اقليت و اكثريت استفاده شده است. نتايج تجربي نشان مي‌دهند كه روش پيشنهادي تنها با 1.5 ميليون پارامتر آموزش توانست در دو پايگاه داده واقع گرايانه معيار (RAF-DB و FERPlus) به ترتيب به دقت‌هاي 85.76 و 85.81 برسد. شايان ذكر است كه روش پيشنهادي نتايج قابل توجهي نسبت به پيشرفته‌ترين روش‌ها بدست آورد و از بسياري از روش‌هاي پيشنهادي در اين حوزه تعداد پارامترهاي كمتري دارد.

تاريخ ورود اطلاعات

1400/10/14

عنوان به انگليسي

Facial Expression Regonition Using Deep Networks in Unconstrained Images

تاريخ بهره برداري

9/22/2022 12:00:00 AM

دانشجوي وارد كننده اطلاعات

محمدرضا دزياني

Name: محمدرضا دزياني
Author: محمدرضا دزياني

چكيده به لاتين

Facial expression act as the most dominant and natural means for non-verbal communications (i.e., expressing emotions) that plays a crucial role in human communications, and its recognition is vital for human beings to understand reciprocal behavior and emotional state. Due to the high correlation between facial expression and mental state, facial expression recognition (FER) is one of the most challenging research issues in computer vision and related fields in which many researches have been conducted about it. In recent years, automatic FER has shifted to unconstrained conditions (a.k.a in-the-wild settings) due to the significant advancement of deep learning techniques, achievement of high accuracy, and the absence of serious challenges in constrained conditions. In unconstrained conditions, there are various real-world challenges, such as the occluded face, variations in head-pose, lighting, facial expression, and background. On the one hand, due to the small size of existing datasets and the imbalance distribution of facial expressions in them, training a deep neural network (DNN) for FER is still a very challenging task. On the other hand, DNNs suffer from problems such as overfitting, insufficient learning efficiency, and high computational complexity. Automatic FER from static images has three main steps, including image preprocessing, feature extraction, and classification. With using DNNs, feature extraction and classification steps can be integrated into a single step (i.e., end-to-end training). To address the aforementioned problems, in this thesis, a two-stage training approach based on a deep convolutional neural network with a small number of training parameters is proposed. Two-stage training is such that, in the first stage, the network is trained with only the original training set for a limited number of epochs. Then, in the second stage, the network is trained on an augmented training set using the first stage training weights. In order to prevent over-fitting, increasing the learning capacity, and network generalization, the Switchable Normalization (SN) method for normalization and dropBlock for Regularization have been used. Meanwhile, to overcome the class imbalance problem, two methods have been adopted, (1) the oversampling method by using data augmentation techniques for all classes except majority classes and (2) the Focal Loss for guaranteeing identical learning between majority and minority classes. The experimental results indicate that the proposed method with only 1.5M training parameters can achieve 85.76 and 85.81 accuracy on two in-the-wild benchmark datasets (RAF-DB and FERPlus), respectively. It is worth to mention that the proposed method attain a remarkable result with comparison to the current state-of-the-art methods, and is shallower than many of the methods proposed in the FER literature.

كليدواژه هاي فارسي

بازشناسي حالت چهره , بينايي كامپيوتر , يادگيري عميق , شبكه عصبي پيچشي , داده‌افزايي

كليدواژه هاي لاتين

Facial Expression Recognition , Computer Vision , Deep Learning , Convolutional Neural Network , Data Augmentation

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=25839&Field=0&DTC=6