محمد حسين احمدي

عنوان

روبه رو سازي چهره در دنباله ي تصاوير با استفاده از وارون GAN

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي كامپيوتر - هوش مصنوعي و رباتيكز

سال تحصيل

1400

تاريخ دفاع

1402/10/26

استاد راهنما

دكتر محمد رضا محمدي

دانشكده

مهندسي كامپيوتر

چكيده

امروزه با افزايش استفاده از دوربين در حوزه‌هاي مختلف و افزايش چشم‌گير تصاوير چهره، پردازش آن‌ها اهميت زيادي پيدا كرده‌است. اين پردازش مي‌تواند هم توسط كامپيوتر و هم توسط ناظر انساني صورت بپذيرد. در هر دوي اين موارد، كيفيت و پيچيدگي تصاوير مورد استفاده براي پردازش بسيار مهم مي‌باشد و كيفيت پايين و يا پيچيدگي بالاي تصاوير مي‌تواند باعث كاهش كيفيت تحليل‌هاي صورت گرفته روي آن‌ها شود. يكي از مهمترين عوامل پيچيدگي تصوير چهره، چرخش آن مي‌باشد. با توجه به پيشرفت شبكه‌هاي عميق، در سال‌هاي اخير روش‌هاي با نظارت و بدون نظارت مختلفي براي چرخش چهره و رو‌به‌رو‌سازي آن معرفي شده‌اند. با اين وجود، اين شبكه‌ها بسيار سنگين و آموزش آن‌ها بسيار چالشي مي‌باشد. از طرف ديگر، كيفيت تصاوير توليد‌شده توسط آن‌ها پايين مي‌باشد. عيب ديگر اين شبكه‌ها، پردازش آن‌ها تنها روي يك تصوير مي‌باشد. اين مورد در حاليست كه در بسياري از كاربرد‌هاي امروزه يك فيلم كوتاه و يا دنباله‌اي از تصاوير چهره در دسترس مي‌باشد و مي‌توان از اطلاعات بيشتري براي رو‌به‌رو‌سازي چهره استفاده كرد. در اين تحقيق تلاش شده است تا مشكلات بالا تا حدودي برطرف شود و روشي با چالش آموزشي كمتر براي رو‌به‌رو‌سازي چهره با كيفيت خوب روي دنباله‌اي از تصاوير ارائه شود. براي اين منظور، ابتدا با توجه به مناسب نبودن مجموعه‌داده‌هاي موجود براي پردازش دنباله‌اي از تصاوير، به ارائه‌ي روشي براي تهيه‌ي يك مجموعه داده‌ي مناسب و با كيفيت پرداخته شده‌است. در ادامه، ابتدا براي رو‌به‌روسازي تك تصوير، ‌روشي مبتني بر وارون GAN ارائه شده‌است كه از ايده‌ي انتقال دانش و وفق‌دهنده براي كاهش زياد تعداد پارامتر‌هاي قابل آموزش و همچنين افزايش كيفيت استفاده مي‌كند. در ادامه با استفاده‌ي مستقيم از اين روش، روش ديگري براي پردازش دنباله‌اي از تصاوير ارائه شده‌است كه علاوه بر در نظر گرفتن اطلاعات هر تصوير به صورت مستقل، روابط بين آن‌ها را نيز در نظر مي‌گيرد. در انتها با ارزيابي كيفي و كمّي نتايج حاصل از اين روش، عملكرد خوب آن و كيفيت بالاي رو‌به‌رو‌سازي چهره با اختلاف 0.16 و 0.12 به ترتيب براي معيار LPIPS و فاصله ويژگي‌هاي هويتي نسبت به روش‌هاي پيشين نشان داده مي‌شود.

تاريخ ورود اطلاعات

1402/11/28

عنوان به انگليسي

Face frontalization in image sequences using GAN Inversion

تاريخ بهره برداري

1/1/1900 12:00:00 AM

دانشجوي وارد كننده اطلاعات

محمد حسين احمدي

Name: محمد حسين احمدي
Author: محمد حسين احمدي

چكيده به لاتين

Nowadays, with the widespread use of cameras across various domains and the significant increase in the availability of facial images, their processing has gained considerable importance. These process can be either done automatically by a machine or manually by a human agent. In both scenarios, the image quality and its complexity are crucial factors that can significantly impact the final result. The variation in pose is an important factor contributing to the complexity of face images. To tackle this challenge, given the significant progress of deep learning models in recent years, many approaches have been proposed for face frontalization. However, these approaches employ heavy architectures and involve complicated training procedures. Moreover, they struggle to reconstruct the frontal-view image with high quality. Another drawback is that, despite the availability of multiple images as frames in a video— which inherently contain more information— these methods reconstruct the frontal-view image by processing only a single input image. In this study, we aim to address all of the above challenges by proposing an approach that has fewer training challenges and is capable of reconstructing the frontal-view image with photorealistic quality. For this purpose, we first address the unavailability of an appropriate dataset for processing sequences of images by presenting an innovative solution to provide a diverse dataset with high quality. Furthermore, we proposed a novel single-input method for face frontalization. In this method, we incorporate the GAN inversion technique with a transfer learning approach to leverage their advantages for reconstructing the frontal-view image with significantly high quality while maintaining an efficient number of trainable parameters. Following this, by employing and modifying this method, we have proposed a new approach for processing sequences of images. This method is capable of leveraging both the independent and collective insights of frames to reconstruct the frontal-view image. Lastly, through a quantitative and qualitative analysis of our methods and comparing them with previous approaches, we demonstrate their effective capability to reconstruct frontal-view images with photorealistic quality. Lastly, through a comprehensive quantitative and qualitative analysis of our methods and a comparative eva‎luation with previous approaches, we demonstrate their superiority, reflected in a 0.15 improvement in the LPIPS metric and a 0.12 improvement in the distance of identity features.

كليدواژه هاي فارسي

شبكه ي مولد رقابتي , وارون گن , روبه رو سازي چهره , ترجمه ي تصوير به تصوير , يادگيري عميق

كليدواژه هاي لاتين

Generative adversarial networks , GAN Inversion , Face frontalization , Image-to-image translation , Deep Learning

Author

Mohmmad Hossein Ahmadi

SuperVisor

Mohammad Reza Mohammadi

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=30505&Field=0&DTC=6