Title : ( Suitability of Sequence-Based Feature Vector for Classification Algorithm Improves Accuracy of Human Protein-Protein Interaction Prediction: A Red Blood Cell Case Study )
Authors: afsaneh maali , Mahmood Akhavan Mahdavi , Reza Gheshlaghi ,Access to full-text not allowed by authors
Abstract
To classify human protein-protein interaction information and consolidate existing data, supervised learning algorithms are implemented. These algorithms require a feature vector to generate a prediction model and feature vectors could be constructed based on various input data. The suitability of feature vector for classification algorithm results in a more predictive model and predictions with higher accuracies based on low-dimension vectors. To investigate the proper combination of feature sets and the algorithms, three feature vectors including AA Frequency, AA Graphical Parameter, and AA Triplex based on the sole knowledge of primary structure of human red blood cell proteins were constructed and then applied to five different classification methods. The results indicated that support vector machine (SVM) algorithm produced the highest accuracy of 84.56% with AA Graphical Parameter feature set while it reached accuracy of 80.65% with AA Triplex feature set. Random forest (RF) achieved high accuracy of 83.69% with all three feature sets on average. Bayesian classifier of TAN performed better than NB using all three features. Artificial neural network (ANN) classifier demonstrated the lowest average accuracy of 76%; however, the performance was comparable with TAN where AA triplex learning feature was used with the accuracy of 77.90%. These figures demonstrated that selecting an appropriate feature set for a classification task results in a higher accuracy with the advantage of utilizing low-dimension feature vectors constructed from more simple data.
Keywords
, Classification algorithms, Protein-protein interaction prediction, Sequence-based feature vectors, Machine learning, Human protein-protein interaction, Accuracy of interaction prediction@article{paperid:1056120,
author = {Maali, Afsaneh and Akhavan Mahdavi, Mahmood and Gheshlaghi, Reza},
title = {Suitability of Sequence-Based Feature Vector for Classification Algorithm Improves Accuracy of Human Protein-Protein Interaction Prediction: A Red Blood Cell Case Study},
journal = {Current Bioinformatics},
year = {2016},
volume = {11},
number = {2},
month = {April},
issn = {1574-8936},
pages = {291--300},
numpages = {9},
keywords = {Classification algorithms; Protein-protein interaction prediction; Sequence-based feature vectors; Machine learning; Human protein-protein interaction; Accuracy of interaction prediction},
}
%0 Journal Article
%T Suitability of Sequence-Based Feature Vector for Classification Algorithm Improves Accuracy of Human Protein-Protein Interaction Prediction: A Red Blood Cell Case Study
%A Maali, Afsaneh
%A Akhavan Mahdavi, Mahmood
%A Gheshlaghi, Reza
%J Current Bioinformatics
%@ 1574-8936
%D 2016