SubCellProt: Predicting protein subcellular localization using machine learning approaches
SubCellProt: Predicting protein subcellular localization using machine learning approaches
High throughput proteome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization. Experimental approaches for proteome annotation including determination of a protein's subcellular localization are very costly and labor intensive. Besides the available experimental methods, in silico methods present alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN), were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it.
LocationNo. of sequences (Training set)No. of sequences (Test set)
Nucleus4442227
Extracellular5409324
Mitochondria2803179
Cytoplasm3419148
Chloroplast3579307
Plasma membrane4849303
Endoplasmic reticulum82862
Golgi apparatus28732
Peroxisome18627
Lysosome15022
Vacuole20429
SubCellProt: Predicting protein subcellular localization using machine learning approaches