شماره ركورد
9777
پديد آورنده
محمدسعيد ارونقي
عنوان
فراتر از MM-BD: بهينهسازي در فضاي نهفته براي تشخيصهاي قويتر در مرحلهٔ پسآموزشِ بكدور
مقطع تحصيلي
كارشناسي
رشته تحصيلي
علوم كامپيوتر
سال فارغ التحصيلي
1404
استاد راهنما
جواد وحيدي
استاد مشاور
جواد وحيدي
دانشجوي وارد كننده اطلاعات
محمدسعيد ارونقي
تاريخ ورود اطلاعات
1404/06/24
دانشكده
رياضي و علوم كامپيوتر
عنوان به انگليسي
Beyond MM-BD: Latent-Space Optimization for Stronger Post-Training Backdoor Detections
چكيده
Backdoor attacks implant hidden triggers so that inputs containing a specific pattern are
mapped to an attacker-chosen target class while clean inputs remain correctly classified.
Detecting such attacks after training—without access to the training set or the trigger
type—is practically important.
This thesis studies the Maximum-Margin Backdoor Detector (MM-BD, also called
UnivBD), which diagnoses backdoors by optimizing, for each class, inputs that maximize
the logit margin to reveal an abnormally dominant class-wise statistic. We (i) faithfully
replicate UnivBD on 50 CIFAR-10 classifiers and (ii) introduce two pragmatic improve-
ments aimed at signal stability and practical deployment:
1. Enhanced UnivBD (input-space): multiple random restarts per class, Adam with
cosine learning-rate schedule, and TV+L2 regularization to stabilize optimization and
reduce spurious high-frequency artifacts.
2. Data-Latent UnivBD (feature-space): seed optimization with real images and
move the optimization from pixels to a mid-level feature tensor, preserving learned
semantics and producing clearer, more decisive per-class maxima.
On the fixed benchmark of 50 CIFAR-10 models, our data-latent variant improves
detection accuracy by 10 percentage points (from 58% to 68%) and substantially increases
precision. We analyze why feature-space probing yields stronger diagnostics than raw
pixels, discuss design trade-offs, and outline how the variants can be combined with
lightweight stability scores for future ensembling.
كليدواژه ها
Backdoor Attack , Post-Training Detection , Trustworthy AI