Title : ( امکانات و کاستیهای ستاکیابی فارسی در پردازش زبان طبیعی )
Authors: Maryam Assadi , vida shaghaghi , Mohsen Kahani ,
Abstract
This article presents a review of stemming techniques for the Persian language, encompassing structural methods, statistical approaches, and lookup tables. In addition, we explore the potential improvement of Persian stemming by drawing insights from theoretical research and experimental results on languages sharing common challenges with Persian. Through a meticulous analysis, we propose the incorporation of Byte Pair Encoding (BPE) and Sequence-to-Sequence (Seq2Seq) models into the Persian stemming framework. This recommendation is rooted in the unique strengths of these methods, tailored to address Persian\\\\\\\'s intricate morphology, extensive loanword integration, and script diversity. BPE excels in capturing prevalent morphemes and managing out-of-vocabulary terms, while Seq2Seq models show promise in decoding implicit morphological rules and accommodating linguistic idiosyncrasies. In light of Persian\\\\\\\'s status as a low-resource language in need of advanced technological resources, we put forward a novel enhancement for Persian stemming. This enhancement leverages both BPE and Seq2Seq models within a unified NLP pipeline, signifying a promising path for further research in Persian language processing. By harnessing linguistic insights, this approach has the potential to contribute significantly to bridging the digital language divide for Persian.
Keywords
, morphology, morphological analysis, Persian language, stemming, Natural Language Processing (NLP), pre-processing, sequence to sequence model@article{paperid:1102857,
author = {مریم اسدی and ویدا شقاقی and Kahani, Mohsen},
title = {امکانات و کاستیهای ستاکیابی فارسی در پردازش زبان طبیعی},
journal = {مطالعات زبانها و گویش های غرب ایران},
year = {2024},
month = {March},
issn = {2345-2579},
keywords = {morphology; morphological analysis;
Persian language;
stemming;
Natural Language Processing
(NLP);
pre-processing;
sequence to sequence model},
}
%0 Journal Article
%T امکانات و کاستیهای ستاکیابی فارسی در پردازش زبان طبیعی
%A مریم اسدی
%A ویدا شقاقی
%A Kahani, Mohsen
%J مطالعات زبانها و گویش های غرب ایران
%@ 2345-2579
%D 2024