ميلاد غفوري ساربانقلي

عنوان

استفاده از روش هاي بازنمايي برداري پيوسته در توالي يابي ژنوم

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

هوش مصنوعي و رباتيكز

تاريخ دفاع

1399/3/11

استاد راهنما

دكتر عادل تركمان رحماني

دانشكده

كامپيوتر

چكيده

دنباله‌هاي زيستي حاوي اطلاعاتي هستند كه بايد از آنها در عملياتي نظير دسته بندي و انطباق و توالي‌يابي و غيره بهره برد. استخراج اين اطلاعات و استفاده از آنها در عمليات مذكور پيچيده و زمانبر است. با بازنمايي مناسب اين دنباله‌‌ها مي‌توان درك و محاسبه‌ي آنها را ساده‌تر كرد. امروزه با رشد سريع داده‌هاي بيوانفورماتيكي روش‌هاي يادگيري ماشين در زمينه‌هاي متعددي از بيوانفورماتيك كاربرد يافته‌اند. ژنوميك يكي از حوزه‌هاي بسيار مهم بيوانفورماتيك است كه با ظهور تكنولوژي «توالي‌يابي نسل جديد» تعداد دنباله‌هاي موجود در آن بصورت نمايي در حال افزايشند. در گذشته براي تحليل اين دنباله‌ها نياز به دانش و مطالعات پايه‌اي براي توصيف آنها بود. با استفاده از مدل‌هاي يادگيري ماشين مي‌توان هر دنباله‌ي زيستي را بصورت يك بردار n بعدي كه خصوصيات بيوفيزيكي و بيوشيميايي آن را مشخص مي‌كند بازنمايي كرد كه اين بازنمايي را مي‌توان در طيف گسترده‌اي از مسائل بيوانفورماتيك نظير دسته‌بندي پروتئين‌ها و تشخيص ساختار به كار برد. به همين انگيزه در اين تحقيق تاثير به كارگيري بازنمايي برداري در عمليات «توالي‌يابي» و به طور خاص در بخش اسمبلي كردن خوانش‌ها را نشان خواهيم داد. براساس نتايج بدست آمده از اين تحقيق مشاهده شد كه به كارگيري روش‌هاي برپايه‌ي هوش مصنوعي و يادگيري ماشين مي‌تواند تاثير به سزايي در دقت عملكرد داشته باشد.

تاريخ ورود اطلاعات

1399/07/16

عنوان به انگليسي

Using continuous vector space representations of sequences in whole genome sequencing

تاريخ بهره برداري

5/31/2020 12:00:00 AM

دانشجوي وارد كننده اطلاعات

ميلاد غفوري ساربانقلي

Name: ميلاد غفوري ساربانقلي
Author: ميلاد غفوري ساربانقلي

چكيده به لاتين

Biological sequences contain information that should be used in applications such as classification, sequence alignment, sequencing, and so on. Extracting this information and using them in these applications is complex and time-consuming. By applying a proper representation on these sequences, they can be made easier to understand and calculate. Nowadays with the advent of “Next Generation Sequencing”, an abundance of sequence data is now available to be processed for a range of bioinformatics applications and that cause machine learning methods on different bioinformatics problems sequences becomes more applicable. In recent studies, we must know the basic knowledge in the biology domain to describe and analyze the biological sequence. Each biological sequence can be represented as a n-dimentional vector that characterizes the biophysical and biochemical properties of the sequence by using machine learning models. This representation vectors can be applied to a wide range of problems in bioinformatics, such as protein family classification and structure prediction. This motivates us to show the using vector representations efficacy in the "Sequencing" application. Based on the results, it was observed that the use of artificial intelligence based methods can have a significant impact on performance.

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=22452&Field=0&DTC=6