LughaNet: Automated Arabic WordNet Construction and Evaluation Using Semantic Question Similarity

Journal of Soft Computing and Data Mining, Volume (6), No (1), Year (2025-6) , Pages (107-126)

Title : ( LughaNet: Automated Arabic WordNet Construction and Evaluation Using Semantic Question Similarity )

Authors: AMMAR DHEYAA NOOR AL-EDHARI , Mohsen Kahani ,

File:

Full Text

Citation: BibTeX | EndNote

Abstract

Several efforts have been undertaken to enhance the Arabic lexicon and address the limitations of Arabic WordNet. However, the development of a comprehensive Arabic WordNet (AWN) in previous work has faced significant challenges, including limited coverage compared to English WordNet, reliance on incomplete bilingual dictionaries, and the inherent complexities of the Arabic language, such as lexical ambiguity and morphological richness. Traditional machine translation methods have proven inadequate in addressing these issues, particularly in low-resource settings where large-scale parallel corpora are scarce. To overcome these limitations, this study introduces LughaNet, an automated Arabic WordNet developed through six key stages: (1) aligning Princeton WordNet (PWN) synsets with Arabic words using bilingual dictionaries and Large language machine translation models ; (2) The most frequent word is selected as the optimal translation, and incorrect translations are refined and eliminated using BERT and cosine similarity; (3) extracting Arabic words from resources such as Wikipedia and the existing Arabic WordNet; (4) applying NLP methods, including Skip-gram with AraVec 2.0 embeddings, to extract synonyms from Arabic Wikipedia; (5) enhancing synonym selection accuracy using a pre-trained BERT model and cosine similarity; and (6) translating PWN glosses and examples into Arabic. This process produced 85,991 synsets, with evaluations indicating 64.23% coverage of dictionary terms and demonstrating LughaNet’s effectiveness in Arabic Semantic Question Similarity (ASQS) tasks, achieving an accuracy of 64.11%, a precision of 57.57%, a recall of 79.02%, and an F1 score of 66.61%, surpassing the original Arabic WordNet’s F1 score of 56.62%. These results highlight the potential of LughaNet as a valuable resource for Arabic NLP research and applications.

Keywords

, WordNet construction, Arabic WordNet, synonyms extraction, Arabic semantic question similarity

برای دانلود از شناسه و رمز عبور پرتال پویا استفاده کنید.

BibTeX
EndNote

@article{paperid:1103759,
author = {AL-EDHARI, AMMAR DHEYAA NOOR and Kahani, Mohsen},
title = {LughaNet: Automated Arabic WordNet Construction and Evaluation Using Semantic Question Similarity},
journal = {Journal of Soft Computing and Data Mining},
year = {2025},
volume = {6},
number = {1},
month = {June},
issn = {2716-621X},
pages = {107--126},
numpages = {19},
keywords = {WordNet construction; Arabic WordNet; synonyms extraction; Arabic semantic question similarity},
}

[Download]

%0 Journal Article
%T LughaNet: Automated Arabic WordNet Construction and Evaluation Using Semantic Question Similarity
%A AL-EDHARI, AMMAR DHEYAA NOOR
%A Kahani, Mohsen
%J Journal of Soft Computing and Data Mining
%@ 2716-621X
%D 2025

[Download]