احمد شريف

عنوان

در راستاي تحقق بخشي از الزامات اخذ مدرك كارشناسي ارشد در رشته هوش مصنوعي و رباتيك

مقطع تحصيلي

ارشد

رشته تحصيلي

كامبيوتر هوش مصنوعي و رباتيك

سال تحصيل

1402

تاريخ دفاع

1404/12/02

استاد راهنما

دكتر بهروز مينايي

استاد مشاور

ندارم

دانشكده

دانشكده مهندسي كامپيوتر

چكيده

چكيده ظهور سلامت ديجيتال، همراه با افزايش ميزان داده‌هاي ديجيتال مرتبط با سلامت توليد شده، فرصتي براي روش‌هاي جديد استفاده از هوش مصنوعي (AI) براي بهبود نحوه تصميم‌گيري‌هاي باليني ايجاد كرده است. با اين حال، محدوديت‌هاي مرتبط با الگوريتم‌هاي سنتي هوش مصنوعي كه فقط از يك قالب براي ورودي استفاده مي‌كنند، توانايي آنها را براي كاربرد باليني محدود كرده است. اين تحقيق تلاشي را براي غلبه بر اين محدوديت‌ها با ايجاد يك چارچوب هوش مصنوعي باليني چندوجهي تشريح كرد. اين چارچوب مي‌تواند انواع مختلفي از داده‌ها در مورد بيماران (مانند پرونده‌هاي سلامت الكترونيكي ساختاريافته (EHR)، يادداشت‌هاي باليني بدون ساختار و تصاوير تصاوير پزشكي) را در يك منبع واحد براي تشخيص‌هاي پاتولوژي كه هم جامع و هم قابل تفسير هستند، ادغام كند. يك مدل زبان بزرگ (LLM) در چارچوب پيشنهادي ادغام شده است. LLM توضيحات مفصلي از پيش‌بيني ارائه مي‌دهد كه براي انسان قابل خواندن است تا پزشكان بتوانند نحوه عملكرد استدلال تشخيصي توليد شده توسط هوش مصنوعي را درك كنند. با اين اطلاعات، پزشكان مي‌توانند استدلال پشت پيش‌بيني‌هاي هوش مصنوعي را ببينند و بهترين تصميمات ممكن را در مورد مراقبت از بيمار بگيرند. ارزيابي‌هاي تجربي بر روي وظايف پاتولوژي، يافته‌هاي باليني و طبقه‌بندي محل‌هاي آناتوميك با استفاده از مجموعه داده‌هاي MIMIC-IV و MIMIC-CXR انجام شد. مدل چندوجهي با دقت كلي 74.8 درصد و ماكرو-F1 برابر با 67 درصد براي دسته‌بندي پاتولوژي، دقت 56 درصد و ماكرو-F1 برابر با 27 درصد براي يافته‌ها و دقت 74 درصد با ماكرو-F1 برابر با 71 درصد براي دسته‌بندي محل، به عملكرد پيش‌بيني بالايي دست يافت. تجزيه و تحليل گام به گام، دستاوردهاي افزايشي از جاسازي مختص به روش، ادغام فضاي پنهان و آموزش سرتاسري را نشان داد. سيستم عملكرد معناداري را در كلاس‌هاي اقليت حفظ كرد كه نشان‌دهنده استحكام رويكرد چندوجهي است، در حالي كه توضيحات مبتني بر LLM به صورت كيفي اعتبارسنجي شدند تا ارتباط باليني و قابليت تفسير تضمين شود.

تاريخ ورود اطلاعات

1405/02/15

عنوان به انگليسي

Fine-Tuning Multimodal Large Language Models for Clinical Diagnosis Reasoning

تاريخ بهره برداري

2/22/2026 12:00:00 AM

دانشجوي وارد كننده اطلاعات

احمد شريف

Name: احمد شريف
Author: احمد شريف

چكيده به لاتين

Abstract With the advent of digital health an‎d an increase in the volume of digital health-linked data created, there is an opportunity to use artificial intelligence (AI) in new ways to help improve how clinical decisions are made. However, traditional AI algorithms are limited in their ability to be clinically applied because they have constraints associated with only having one input type. The goal of research is to address the limitations by constructing a Multimodal Clinical AI Framework. This approach allow for the integration of patient data from various sources (such as medical records, physicianʹs notes, x-rays, etc.) into a single location; thus enabling a more comprehensive an‎d clearer understan‎ding of the patient’s diagnostic situation. A Large Language Model (LLM) is integrated into the proposed framework. The LLM produces human-readable, detailed explanations of the prediction so that clinicians can understan‎d how the AI-generated diagnostic reasoning worked. With this information, clinicians can see the reasoning behind the AIʹs predictions an‎d make the best possible decisions about patient care. An assessment of the method implemented in this research took the form of three distinct classification tasks-‎-pathology, clinical findings an‎d anatomical sites-‎-through application to the MIMIC-IV an‎d MIMIC-CXR datasets. The multimodal model provided a good demonstration of high performance by achieving approximately 69% accuracy overall on the three tasks of interest (i.e., moderate to strong predictive performance). In more detail, it yielded high performance levels across three types of classifications (i.e., pathology (accuracy = 74.8%, macro-F1 = 0.670), findings (accuracy = 58.0%, macro-F1 = 0.270), an‎d sites (accuracy = 74.0%, macro-F1 = 0.710)). When assessing the performance of the multimodal model via stepwise analysis, incremental improvements were gained through modality-specific embeddings, latent-space fusion an‎d end-to-end training. The system maintained high levels of meaningfulness for minority classes, demonstrating evidence that the multimodal approach is robust, as confirmed through qualitative levels of clinical relevance an‎d interpretability of LL-M based explanations.

كليدواژه هاي فارسي

: هوش مصنوعي چندوجهي , پشتيباني از تصميم‌گيري باليني

كليدواژه هاي لاتين

Multimodal AI , Clinical Decision Support

Author

Ahmed Shareef

SuperVisor

Dr. Behrouz Minaei-Bidgoli

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=34823&Field=0&DTC=6