Title : ( Combined Genome and Protein Statistical Features Improved the Prediction of Genes Encoding Antimicrobial Peptides: A Machine Learning Based Approach )
Authors: mahin rasani , Karami , Mohammadreza Nassiri , Mojtaba Tahmoorespur , Mohammad Hadi Sekhavati ,Abstract
Antimicrobial peptides (AMPs) are increasingly regarded as a promising class of next-generation antibiotics in drug development. Various computational approaches have been developed to predict AMPs, but the majority focus solely on assessing their potency by analyzing the physicochemical characteristics of the peptides. However, to the best of our knowledge, there are no reports that predicted antimicrobial genes in the genome based on characteristics of genomes. In the present study, a novel machine learning-based ap-proach is developed to predict genes encoding AMPs in the genome based on combined physicochemical, genomic, and protein statistical features. Various types of genome features and different machine learning (ML) algorithms are tested to compare the predictive abilities of resulting models. Next, the gene structures of 110 non-AMP and 158 AMP-encoding genes are examined. In this research, 951 genomic and protein features were extracted for AMP and non-AMP-encoding genes in eleven genomic subdomains as well as their 1 kb, 10 kb, and 100 kb upstream and downstream regulatory regions. Among the ML algorithms, the Naive Bayes model processed with an SVM training dataset with an accuracy of 99.63%, precision of 99.41%, recall of 100%, F measure of 99.7%, and area under the curve (AUC) of 1, was identified as the best model. The results showed that due to the heterogeneity of our AMP dataset, using genome features as additional features enhances the performance of all models compared to previous studies that solely relied on AMP sequence-based features.
Keywords
, antimicrobial peptides, feature selection, machine learning, statistical genome features.@article{paperid:1106278,
author = {Rasani, Mahin and کرمی and Nassiri, Mohammadreza and Tahmoorespur, Mojtaba and Sekhavati, Mohammad Hadi},
title = {Combined Genome and Protein Statistical Features Improved the Prediction of Genes Encoding Antimicrobial Peptides: A Machine Learning Based Approach},
journal = {Iranian Journal of Applied Animal Science},
year = {2025},
volume = {15},
number = {2},
month = {June},
issn = {2251-628X},
pages = {191--209},
numpages = {18},
keywords = {antimicrobial peptides; feature selection; machine learning; statistical genome features.},
}
%0 Journal Article
%T Combined Genome and Protein Statistical Features Improved the Prediction of Genes Encoding Antimicrobial Peptides: A Machine Learning Based Approach
%A Rasani, Mahin
%A کرمی
%A Nassiri, Mohammadreza
%A Tahmoorespur, Mojtaba
%A Sekhavati, Mohammad Hadi
%J Iranian Journal of Applied Animal Science
%@ 2251-628X
%D 2025
