مهدي فروردين

عنوان

يافتن نقش مكانيزم توجه در مبدل‌هاي ديداري

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

مهندسي كامپيوتر - هوش مصنوعي و رباتيكز

سال تحصيل

1398

تاريخ دفاع

1401/11/16

استاد راهنما

مرتضي آنالويي

دانشكده

مهندسي كامپيوتر

چكيده

با افزايش قدرت محاسباتي و دردسترس بودن داده‌هاي زياد، موفقيت و پيشرفت‌هاي زيادي در حوزه ماشين‌هاي يادگيري عميق و هوش‌ مصنوعي شكل گرفته است و باعث بوجود آمدن نتايج خيره‌كننده‌اي در حل مسائل مختلف شده است. از سوي ديگر جعبه سياه بودن ماشين‌هاي يادگيري نيز به عدم اطمينان محققان و مصرف‌كنندگان منجر شده است. در حوزه‌هايي مانند پزشكي، ماشين‌هاي خودران و يا حوزه‌هايي كه تصميمات اشتباه اتخاذ شده توسط سيستم داراي عواقب زيادي است، نمي‌توان به راحتي به سيستمي كه از چرايي و چگونگي تصميم‌گيري آن اطلاعي نداريم، اطمينان كنيم. همين امر باعث بوجود آمدن زمينه‌اي در پژوهش شده است كه محققان در آن به بررسي مدل‌هاي مختلف و تفسير كردن آن‌ها مي‌پردازند. امروزه كارهاي زيادي براي تفسيرپذيري مدل‌هاي گوناگون در زمينه‌هاي مختلف انجام شده است. در اين كار براي پاسخ دادن به چند سوال راجع به مدل ViT يك سري آزمايش طراحي شده است. براي انجام اين آزمايش‌ها ابتدا يك مجموعه داده مناسب ايجاد شده است. در قسمت اول تلاش مي‌كنيم با استفاده از تست فرضيه و تست آماري روي امتياز‌هاي توجه بدست آمده از مجموعه دادگان ايجاد شده، پاسخي براي اين پرسش بيابيم كه آيا مدل قادر به تشخيص جا‌بجايي بخش‌هايي از تصوير مي‌شود يا خير. بدين صورت كه ابتدا بخش‌هايي از هر تصوير در 4 حالت مختلف 1) انتخاب هر دو بخش به صورت تصادفي 2) انتخاب هر دو بخش با بالاترين امتياز 3) انتخاب هر دو بخش با پايين‌ترين امتياز 4) انتخاب دو بخش با بالاترين و پايين‌ترين امتياز، جا‌بجا مي‌شوند سپس با استفاده از تست فرضيه روي داده‌هاي جمع‌آوري شده از امتياز توجه به دست آمده از تصاوير دستكاري شده و تصاوير اصلي، و مقايسه مقادير به دست آمده با آلفا كه در اين كار برابر با 0.05 در نظر گرفته شده است، نشان مي‌دهيم كه مدل مي‌تواند جا‌بجا‌يي بخش‌هاي مختلف تصوير را در اكثر موارد رديابي كند. در قسمت بعد به بررسي تاثير جا‌بجايي بخش‌هايي از تصوير در رده‌بنده نهايي مدل اشاره‌ مي‌كنيم، اين قسمت با انتخاب سايز‌هاي مختلف براي انتخاب بخش‌هايي از تصوير و جا‌بجايي آن‌ها انجام مي‌شود، سايز‌هاي مختلف انتخاب شده در اين قسمت شامل 11 سايز مختلف است كه همگي اعداد بخش‌پذير به 224 هستند، در اين قسمت نشان مي‌دهيم كه با بالاتر رفتن سايز بخش‌هاي جا‌به‌جا شده، مقدار اشتباه نيز بالاتر مي‌رود. در قسمت آخر با افزودن نويزهاي نمك، فلفل، نمك و فلفل، گوسي، پواسون و نويز لكه‌اي به تصوير در 6 سايز مختلف با 3 نحوه انتخاب متفاوت بخش‌هايي از تصوير، به رده‌بندي‌هاي انجام شده توسط مدل اشاره‌مي‌شود و مقاوم بودن مدل در برابر اين نويز‌ها بررسي مي‌شود. و نشان مي‌دهيم كه مدل در برابر نويز لكه‌اي مقاوم نيست و مي‌توان با افزودن اين نويز مدل را به اشتباه انداخت.

تاريخ ورود اطلاعات

1402/02/16

عنوان به انگليسي

Discovering Attention mechanism in Vision Transformers

تاريخ بهره برداري

2/5/2024 12:00:00 AM

دانشجوي وارد كننده اطلاعات

مهدي فروردين

Name: مهدي فروردين
Author: مهدي فروردين

چكيده به لاتين

With the expanding use of machine learning algorithms in the last few decades, as well as the increase in computing power and access to a lot of data, many successes have been made in the field of deep neural networks and machine learning. They have led to stunning results in solving various problems. This performance on various tasks is one of the most important reasons for using these methods to solve problems and challenges in many fields in our daily lives. On the other hand, the ever-increasing growth and existing developments have caused many concerns among researchers. Neural networks being black boxes is one of the main reasons for these concerns. In fields such as medicine, self-driving cars, or any field where the decisions made by the system have many consequences and a wrong decision in those fields can lead to irreparable damage, we cannot easily trust a system that we do not know why and how it makes decisions. These concerns have led to the creation of a research field in which researchers examine different models and interpret them. Today, many works have been done for the interpretability of various models in various fields. In this work, a series of experiments have been designed to answer several questions about the ViT model. To perform these tests, a suitable dataset was created first. In the first part, we try to find an answer to the question of whether the model is able to detect the displacement of parts of the image by using the hypothesis test and statistical test on the attention scores obtained from the created dataset. First, parts of each image in 4 different modes: 1) selecting both parts randomly, 2) selecting both parts with the highest score, 3) selecting both parts with the lowest score, 4) selecting two parts with the highest and lowest score. Then, using the hypothesis test on the data collected from the attention score obtained from the manipulated images and the original images and comparing the obtained values with alpha which in this work is equal to 0.05. We show that the model can track the displacement of different parts of the image in most cases. In the next part, we will mention the impact of moving parts of the image in the final classification of the model. The different sizes selected in this section include 11 different sizes that we have considered numbers divisible by 224. In this section, we can see that as the size of the patches increases, the error value also increases. In the last part, by adding noises of salt, pepper, salt and pepper, Gaussian, Poisson, and speckle noise to the image in 6 different sizes with three different ways of selecting patches of the image, it is pointed out the classifications made by the model and the resistance of the model to these noises. It is checked. And we show that the model is not resistant to speckle noise and can be misled by adding this noise.

كليدواژه هاي فارسي

يادگيري ماشين , يادگيري عميق , مدل‌هاي از پيش آموخته شده , تفسيرپذيري , هوش مصنوعي تفسيرپذير , مبدل , مبدل‌هاي ديداري

كليدواژه هاي لاتين

Machine Learning , Deep Learning , Pretrained Models , transformers , Vision Transformer , Interpretability , Interpretable AI

Author

Mahdi Farvardin

SuperVisor

Morteza Analoui

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=28203&Field=0&DTC=6