مطالعات زبانها و گویش های غرب ایران, Year (2024-3)

Title : ( امکانات و کاستی‌های ستاک‌یابی فارسی در پردازش زبان طبیعی )

Authors: Maryam Assadi , vida shaghaghi , Mohsen Kahani ,

Citation: BibTeX | EndNote

Abstract

This article presents a review of stemming techniques for the Persian language, encompassing structural methods, statistical approaches, and lookup tables. In addition, we explore the potential improvement of Persian stemming by drawing insights from theoretical research and experimental results on languages sharing common challenges with Persian. Through a meticulous analysis, we propose the incorporation of Byte Pair Encoding (BPE) and Sequence-to-Sequence (Seq2Seq) models into the Persian stemming framework. This recommendation is rooted in the unique strengths of these methods, tailored to address Persian\\\\\\\'s intricate morphology, extensive loanword integration, and script diversity. BPE excels in capturing prevalent morphemes and managing out-of-vocabulary terms, while Seq2Seq models show promise in decoding implicit morphological rules and accommodating linguistic idiosyncrasies. In light of Persian\\\\\\\'s status as a low-resource language in need of advanced technological resources, we put forward a novel enhancement for Persian stemming. This enhancement leverages both BPE and Seq2Seq models within a unified NLP pipeline, signifying a promising path for further research in Persian language processing. By harnessing linguistic insights, this approach has the potential to contribute significantly to bridging the digital language divide for Persian.

Keywords

, morphology, morphological analysis, Persian language, stemming, Natural Language Processing (NLP), pre-processing, sequence to sequence model
برای دانلود از شناسه و رمز عبور پرتال پویا استفاده کنید.

@article{paperid:1102857,
author = {مریم اسدی and ویدا شقاقی and Kahani, Mohsen},
title = {امکانات و کاستی‌های ستاک‌یابی فارسی در پردازش زبان طبیعی},
journal = {مطالعات زبانها و گویش های غرب ایران},
year = {2024},
month = {March},
issn = {2345-2579},
keywords = {morphology; morphological analysis; Persian language; stemming; Natural Language Processing (NLP); pre-processing; sequence to sequence model},
}

[Download]

%0 Journal Article
%T امکانات و کاستی‌های ستاک‌یابی فارسی در پردازش زبان طبیعی
%A مریم اسدی
%A ویدا شقاقی
%A Kahani, Mohsen
%J مطالعات زبانها و گویش های غرب ایران
%@ 2345-2579
%D 2024

[Download]