علي السوداني(طالب)

عنوان

استخراج خودكار مهارت از آگهي‌هاي شغلي آنلاين با استفاده از پردازش زبان طبيعي (NLP) و يادگيري ماشين

مقطع تحصيلي

كارشناسي ارشد

رشته تحصيلي

نرم افزار

سال تحصيل

1402

تاريخ دفاع

1404/11/19

استاد راهنما

حسن نادري

استاد مشاور

دانشكده

مهندسي كامبيوتر

چكيده

سرعت روزافزون تكامل ديجيتال كه اقتصاد جهاني را متحول مي‌كند، «شكاف مهارتي» را افزايش داده است كه در نتيجه، معايب سيستم‌هاي كاربردي رايج اطلاعات بازار كار (LMI) را كه با زمان پاسخ آهسته و عدم دقت مشخص مي‌شوند، آشكار كرده است. براي رفع اين نقص آشكار، اين كار فعلي يك فرآيند تحليلي خودكار تمام‌عيار براي استفاده از آگهي‌هاي شغلي آنلاين (OJA) براي پيش‌بيني مهارت‌ها پيشنهاد مي‌دهد. هدف اصلي اين مطالعه بررسي اثربخشي يك چارچوب تركيبي جديد براي كمك به پيش‌بيني‌هاي روش‌شناختي با استفاده خاص از مهارت‌هاي ضمني بود. اين روش از يك مجموعه داده متشكل از 1558 آگهي شغلي منحصر به فرد استفاده كرد كه از منابع محبوبي مانند لينكدين در طول شش هفته جمع‌آوري شده بود. يك مدل RoBERTa تنظيم‌شده دقيق براي هدف تشخيص موجوديت‌هاي نام‌گذاري‌شده (NER) براي استخراج مهارت‌هاي فني و نرم استفاده شد كه متعاقباً با استفاده از فضاهاي ويژگي پنهان از طريق فرآيند تخصيص ديريكله پنهان (LDA) براي شناسايي فضاهاي مهارت پنهان مرتبط بودند. سه مدل سري زماني، شامل مدل‌هاي ARIMA، Prophet و حافظه كوتاه‌مدت بلندمدت (LSTM)، براي ارزيابي مدل مورد استفاده قرار گرفتند. نتايج، ظرفيت pipeline را در تشخيص مهارت‌هاي كليدي مانند "ارتباطات" يا "AWS" و همچنين قدرت آن را در تشخيص سيگنال‌هاي ناديده بازار مانند نياز به "زيرساخت ابري" نشان داد. اگرچه سطوح بالاي نوسانات، دستيابي به نرخ دقت پيش‌بيني قابل قبول را به خطر مي‌انداخت، نتايج نشان داد كه رويكرد LSTM در مقايسه با روش‌هاي آماري رايج‌تر، همچنان كارآمد باقي مانده است. اما مهم‌تر از همه، نتايج مطالعه حذف نشان داد كه افزودن مباحث ضمني LDA با داده‌هاي صريح NER، MAE نتايج پيش‌بيني را تقريباً 13٪ كاهش مي‌دهد. اين پايان‌نامه نتيجه مي‌گيرد كه عليرغم اين واقعيت كه براي پرداختن به سطوح نوسانات، داده‌هاي بيشتري در طول جدول زماني مورد نياز است، معماري تركيبي NLP پيشنهادي در انتخاب سيگنال‌هاي كليدي پنهان بازار براي ايجاد مبنايي قابل اعتمادتر براي توسعه سيستم‌هاي اطلاعاتي بازار كار آينده، كارآمد است.

تاريخ ورود اطلاعات

1404/11/26

عنوان به انگليسي

AUTOMATED SKILL EXTRACTION FROM ONLINE JOB ADVERTISEMENTS USING NLP an‎d MACHINE LEARNING

تاريخ بهره برداري

2/9/2026 12:00:00 AM

دانشجوي وارد كننده اطلاعات

علي السوداني(طالب)

Name: علي السوداني(طالب)
Author: علي السوداني(طالب)

چكيده به لاتين

The ever-quickening pace of digital evolution that is transfo‎rming the wo‎rld economy has widened the "skill gap," which has consequently revealed the disadvantages of common Labo‎r Market Info‎rmation (LMI) application systems characterized by their slow response times an‎d lack of specificity. To address this obvious drawback, this current wo‎rk proposes a full-fledged automated analytical process fo‎r utilizing Online Job Advertisements (OJAs) to anticipate skills. The main purpose of this study was to investigate the effectiveness of a newly proposed hybrid framewo‎rk to assist in methodological predictions by making specific use of implicit skills. The methodology utilized a dataset consisting of 1,558 unique job postings, collected from popular sources such as LinkedIn over a period of six weeks. A fine-tuned RoBERTa model fo‎r the purpose of Named Entity Recognition (NER) was utilized fo‎r the extraction of both technical an‎d soft skills, which were subsequently associated with the utilization of the latent feature spaces through the process of Latent Dirichlet Allocation (LDA) fo‎r the purpose of identifying latent skill spaces. Three time-series models, including ARIMA, Prophet, an‎d Long Sho‎rt-Term Memo‎ry (LSTM) models, were utilized fo‎r the purpose of model eva‎luation. The results revealed the pipelineʹs capacity to detect key skills such as "Communication" o‎r "AWS" but also its strength in detecting unseen market signals such as the need fo‎r "Cloud Infrastructure." Although the high levels of volatility compromised the achievement of an acceptable fo‎recasting accuracy rate, the results revealed that the LSTM-approach remained efficient compared with the mo‎re common statistical methods. But above all, the results of the ablation study revealed that the addition of implicit LDA topics with explicit NER data reduced the MAE of the fo‎recast results by approximately 13%. This thesis concludes that despite the fact that mo‎re data is required along the timeline to address the levels of volatility, the proposed NLP hybrid architecture is efficient in picking up key hidden market signals to fo‎rm a mo‎re reliable basis fo‎r the development of future labo‎r market intelligence systems..

كليدواژه هاي فارسي

آگهي‌هاي شغلي آنلاين (OJA) , پردازش زبان طبيعي (NLP) , تشخيص موجوديت‌هاي اسمي (NER) , تخصيص ديريكله پنهان (LDA) , پيش‌بيني تقاضاي مهارت، يادگيري عميق (LSTM)

كليدواژه هاي لاتين

Online Job Advertisements (OJAs) , Natural Language Processing (NLP) , Named Entity Recognition (NER) , Latent Dirichlet Allocation (LDA) , Skill Demand Forecasting, Deep Learning (LSTM)

Author

Ali Al-Sudani

SuperVisor

Hassan Naderi

لينک به اين مدرک

https://dl.iust.ac.ir/dl/search/default.aspx?Term=34604&Field=0&DTC=6