※ Computational resources of protein phosphorylation:
Last updated: Dec 17, 2014
Protein phosphorylation is the most ubiquitous post-translational modification (PTM), and plays important roles in most of biological processes. Identification of site-specific phosphorylated substrates is fundamental for understanding the molecular mechanisms of phosphorylation. Besides experimental approaches, prediction of potential candidates with computational methods has also attracted great attention for its convenience and fast-speed. In this review, we present a comprehensive but brief summarization of computational resources of protein phosphorylation, including phosphorylation databases, prediction of non-specific or organism-specific phosphorylation sites, prediction of kinase-specific phosphorylation sites or phospho-binding motifs, and other tools. A testing data set taken from four high throughput experiments is available at: Comparison_data.
We apologized that the computational studies without any web links of databases or tools will not be included in this compendium, since it's not easy for experimentalists to use studies directly. We are grateful for users feedback. Please inform Dr. Yu Xue or Yongbo Wang to add, remove or update one or multiple web links below.
<2> Prediction of non-specific or organism-specific phosphorylation sites
<3> Prediction of kinase-specific phosphorylation sites or phospho-binding motifs
<4> Miscellaneous tools
<5> Detection of potential phosphorylation sites from mass spectrometry data
1. Phospho.ELM 9.0 (PhosphoBase): contains 8,718 experimentally verified phosphorylated proteins from different species with 3,370 tyrosine, 31,754 serine and 7,449 threonine sites (Diella, et al., 2004; Diella, et al., 2008; Dinkel, et al., 2011).
2. PhosphoSitePlus: a new version of PhosphoSite, is a web-based database to collect protein modification sites, including protein phosphorylation sites from scientific literature as well as high-throughput discovery programs. Currently, PhosphoSitePlus contains over 120,000 phosphorylation sites (Hornbeck, et al., 2012).
3. PhosphoNET: PhosphoNET presently holds data on over 950,000 known and putative phosphorylation sites (P-sites) in over 23,000 human proteins that have been collected from the scientific literature and other reputable websites. Over 19% of these phospho-sites have been experimentally validated. The rest have been predicted with a novel P-Site Predictor algorithm developed at Kinexus with academic partners at the University of British Columbia and Simon Fraser University.
4. HPRD release 9: HPRD currently contains information for 16,972 PTMs which belong to various categories with phosphorylation (10,858), dephosphorylation (3,118) and glycosylation (1,860) forming the majority of the annotated PTMs. At least one enzyme responsible for PTMs has been annotated for 8,960 PTMs, which resulted in the documentation of 7,253 enzyme - substrate relationships (Keshava Prasad, et al., 2009).
5. PHOSIDA (Mirror website): a posttranslational modification database, comprises more than 80,000 phosphorylated, N-glycosylated or acetylated sites from nine different species. All sites are obtained from high-resolution mass spectrometric data using the same stringent quality criteria. PHOSIDA is comprised of three main components: the database environment, the prediction platform and the toolkit section (Gnad, et al., 2007; Gnad, et al., 2009; Gnad, et al., 2011).
6. PhosphoPep v2.0: contains MS-derived phosphorylation data from 4 different organisms, including fly (Drosophila melanogaster), human (Homo sapiens), worm (Caenorhabditis elegans), and yeast (Saccharomyces cerevisiae) (Bodenmiller, et al., 2008).
7. PhosPhAt 4.0: contains information on Arabidopsis phosphorylation sites which were identified by mass spectrometry in large scale experiments from different research groups with 60,366 phospho-peptides matching to 8141 nonredundant proteins (Heazlewood, et al., 2008; Durek, et al., 2010; Zulawski, et al., 2013).
8. P(3)DB 3.5: hosts protein phosphorylation data for 9 species from 32 experimental studies, containing 16,477 phosphoproteins, harboring 47,923 phosphosites. Centralized by these phosphorylation data, multiple related data and annotations are provided, including protein-protein interaction (PPI), gene ontology, protein tertiary structures, orthologous sequences, kinase/phosphatase classification and Kinase Client Assay (KiC Assay) data. In addition, it incorporates multiple network viewers for the above features such as PPI network, kinase-substrate network, phosphatase-substrate network, and domain co-occurrence network (Gao, et al., 2009; Yao, et al., 2012; Yao, et al., 2013).
9. Swiss-Prot knowledge base (Mirror website): for each protein annotation, the "Amino acid modifications" in the "Sequence annotation (Features)" section collected the post-translational modification information of proteins (Farriol-Mathis, et al., 2004).
10. dbPTM 3.0: an informative resource of experimental post-translational modifications (PTMs) obtained from public resources as well as manually curated MS/MS peptides associated with PTMs from research articles for investigating the substrate specificity of PTM sites and functional association of PTMs between substrates and their interacting proteins (Lee, et al., 2006; Lu, et al., 2013).
11. SysPTM 2.0 (Mirror website): provides a systematic and sophisticated platform for proteomic PTM research, equipped not only with a knowledge base of manually curated multi-type modification data, but also with four fully developed, in-depth data mining tools (Li, et al., 2009).
13. NetworKIN 3.0: is a method for predicting in vivo kinase-substrate relationships, that augments consensus motifs with context for kinases and phosphoproteins. It's a great resource and open a door for computational discovering of phospho-regulatory network (Linding, et al., 2007; Linding, et al., 2008; Horn, et al., 2014).
14. Phospho3D 2.0: is a database of three-dimensional structures of phosphorylation sites which stores information retrieved from the phospho.ELM database and which is enriched with structural information and annotations at the residue level (Zanzoni, et al., 2007; Zanzoni, et al., 2011).
17. ProMEX: a mass spectral reference database for proteins and protein phosphorylation sites, containing 116,364 tryptic peptide product ion spectra entries of 48,218 different peptide sequence entries (Hummel, et al., 2009; Wienkoop, et al., 2012).
20. PhosSNP 1.0: a genome-wide analysis of genetic polymorphisms that influence protein phosphorylation in H. Sapiens. It was estimated that ~69.76% of nsSNPs (non-synonymous SNPs) are potential phosSNPs (Phosphorylation-related SNPs) (64, 035) in 17, 614 proteins (Ren, et al., 2010).
21. The Phosphorylation Site Database: provides ready access to information from the primary scientific literature concerning those proteins from prokaryotic organisms, i.e., the members of the domains Archaea and Bacteria, that have been reported to undergo covalent phosphorylation on the hydroxyl side chains of serine, threonine, and/or tyrosine residues (Wurgler-Murphy, et al., 2004).
22. PhosphoGRID version 3.2: an online database of experimentally verified in vivo protein phosphorylation sites in the model eukaryotic organism S. cerevisiae. he database includes results from both high throughput (HTP) MS proteomics studies in addition to phosphosites identified in low throughput (LTP) studies of individual proteins or protein complexes (Stark, et al., 2010; Stark, et al., 2013).
23. TAIR: maintains a database of genetic and molecular data for Arabidopsis thaliana. Protein data available from TAIR includes the complete protein sequence along with phosphorylation site annotations (Lamesch, et al., 2011).
25. dbPPT 1.0 : a comprehensive resource of plant protein phosphorylation that contains 82,175 phosphorylation sites in 31,012 proteins from 20 plant organisms. The phosphorylation sites in dbPPT were manually curated from the literatures, while datasets in other public databases were also integrated (Cheng, et al., 2014).
2. CRP: Cleaved Radioactivity of Phosphopeptides. CRP performs an in silico proteolytic cleavage of the sequence and reports the predicted Edman cycles in which radioactivity would be observed if a given serine, threonine or tyrosine will be phosphorylated (Mackey, et al., 2003).
3. DISPHOS 1.3: uses disorder information to improve the discrimination between phosphorylation and non-phosphorylation sites, and predicts serine, threonine and tyrosine phosphorylation sites in proteins (Iakoucheva, et al., 2004).
6. PhosPhAt 4.0: they utilized a set of 802 experimentally validated serine phosphorylation sites as the training data set in their 2.2 version, while with additional 1,818 threonine phosphorylation sites and 676 tyrosine sites in Arabidopsis to develop their 3.0 predictor for phosphorylation sites in Arabidopsis (Heazlewood, et al., 2008; Durek, et al., 2010).
8. GANNPhos: uses a genetic algorithm integrated neural network (GANN) algorithm (Tang, et al., 2007). The tool is not available.
9. PHOSITE: is based on the case-based sequence analysis (Koenig and Grabe, 2004). The tool is not available.
10. PhosphoSVM : prediction of phosphorylation sites by integrating various protein sequence attributes with a support vector machine. The model trained by the animal phosphorylation sites was also applied to a plant phosphorylation site dataset as an independent test (Dou, et al., 2014).
12. PhosphoRice : a meta-predictor of rice-specific phosphorylation site by using weighted voting strategy with parameters selected by restricted grid search and conditional random search (Que, et al., 2012).
1. GPS 2.1 : the current version of GPS system. We renamed the tool as the Group-based Prediction System. GPS 2.1 software was implemented in JAVA and could predict kinase-specific phosphorylation sites for 408 human Protein Kinases in hierarchy (Xue, et al., 2008).
2. GPS 1.10 : the old version of GPS. We designed a novel algorithm GPS (Group-based Phosphorylation sites Prediction) and construct an easy-to-use web server for the experimentalists (Xue, et al., 2005; Zhou, et al., 2004).
4. ScanProsite: consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them (de Castro, et al., 2006; Hulo, et al., 2008).
6. Minimotif Miner: analyzes protein queries for the presence of short functional motifs that, in at least one protein, has been demonstrated to be involved in posttranslational modifications (PTM), binding to other proteins, nucleic acids, or small molecules, or proteins trafficking (Balla, et al., 2006; Rajasekaran, et al., 2009).
7. PhosphoMotif Finder: contains known kinase/phosphatase substrate as well as binding motifs that are curated from the published literature. It reports the PRESENCE of any literature-derived motif in the query sequence (Amanchy, et al., 2007).
9. Predikin & PredikinDB 2.0: consists of two components: (i) PredikinDB, a database of phosphorylation sites that links substrates to kinase sequences and (ii) a Perl module, which provides methods to classify protein kinases, reliably identify substrate-determining residues, generate scoring matrices and score putative phosphorylation sites in query sequences (Saunders, et al., 2008; Saunders and Kobe, 2008).
10. ScanSite 2.0: searches for motifs within proteins that are likely to be phosphorylated by specific protein kinases or bind to domains such as SH2 domains, 14-3-3 domains or PDZ domains (Obenauer, et al., 2003).
11. NetPhosK 1.0: produces neural network predictions of kinase specific eukaryotic protein phosphoylation sites. Currently NetPhosK covers the following kinases: PKA, PKC, PKG, CKII, Cdc2, CaM-II, ATM, DNA PK, Cdk5, p38 MAPK, GSK3, CKI, PKB, RSK, INSR, EGFR and Src (Blom, et al., 2004).
14. KinasePhos 1.0: predicts kinase-specific phosphorylation sites within given protein sequences. Profile Hidden Markov Model (HMM) is applied for learning to each group of sequences surrounding to the phosphorylation residues (Huang, et al., 2005).
15. KinasePhos 2.0: New version of kinase-specific phosphorylation site prediction tool that is based the sequenece-based amino acid coupling-pattern analysis and solvent accessibility as new features of SVM (support vector machine) (Wong, et al., 2007).
18. CRPhos 0.8: Prediction of kinase-specific phosphorylation sites using conditional random fields. Its source code is free for academic research and could be compiled in Linux/Unix OS (Dang, et al., 2008).
19. AutoMotif 2.0: allows for identification of PTM (post-translational modification) sites, including phosphorylation sites in proteins. The AutoMotif Server 2.0 was trained support vector machine (SVM) for each type of PTM separately on proteins of the Swiss-Prot database (version 42.0) (Plewczynski, et al., 2005; Plewczynski, et al., 2008).
22. NetPhorest: is a non-redundant collection of 125 sequence-based classifiers for linear motifs in phosphorylation-dependent signaling. The collection contains both family-based and gene-specific classifiers (Miller, et al., 2008; Miller, et al., 2008; Horn, et al., 2014).
23. SiteSeek: is trained using a novel compact evolutionary and hydrophobicity profile to detect possible protein phosphorylation sites for a target sequence (Yoo, et al., 2008). The tool is not available.
24. PostMod: is a predict sever for phosphorylation sites. The authors combined physicochemical information, motif information, and evolutionary information by simply comaparing sequence similarities, and could predict phosphorylation sites for 48 different kinases (Jung, et al., 2010).
25. iGPS 1.0 : we developed a software package of iGPS (GPS algorithm with the interaction filter, or in vivo GPS) mainly for the prediction of in vivo ssKSRs. Eukaryotic PKs were classified into a hierarchy with four levels: group, family, subfamily, and single PK. Based on the hypothesis that similar PKs recognize similar SLMs, we selected a predictor in GPS 2.0 for each PK and directly predicted the potential PKs for the non-annotated p-sites from the phosphoproteomic studies. Consequently, protein-protein interaction (PPI) information was used as the major contextual factor to filtrate potentially false-positive hits (Song, et al., 2012).
26. Musite: a tool for global prediction of general and kinase-specific phosphorylation sites. The authors collected phosphoproteomics data in multiple organisms from several reliable sources and used them to train prediction models by a comprehensive machine-learning approach that integrates local sequence similarities to known phosphorylation sites, protein disorder scores, and amino acid frequencies (Gao, et al., 2010; Yao, et al., 2012).
27. PlantPhos: is a web tool for predicting potential phosphorylation sites in plant proteins with various substrate motifs based on Hidden Markov Models (HMM) and Maximal Dependence Decomposition (MDD) (Lee, et al., 2011).
28. PSEA: the authors proposed a new method called PSEA (Phosphorylation Set Enrichment Analysis) to detect new sites phosphorylated by a specific kinase, kinase family and kinase group. For each query, they assigned a P-value according to its similarity with known phosphorylated ones. The smaller the P-value, the more significant will be the chance that the given peptides were phosphorylated by the chosed kinase type (Suo, et al., 2014).
29. PKIS: based on the latest version of Phopho.ELM (9.0), a novel kinase identification web server, PKIS, incorporating support vector machines (SVMs) with the composition of monomer spectrum (CMS) is used to assign protein kinase for experimentally verified P-sites of human in high specificity (Zou, et al., 2013).
30. PTMPred: a support vector machine (SVM) with the kernel matrix computed by PSPM(position-specific propensity matrices) is applied to predict the posttranslational modification sites (Xu, et al., 2014).
1. DOG 2.0 : prepares publication-quality figures of protein domain structures. The scale of a protein domain and the position of a functional motif/site will be precisely calculated (Ren, et al., 2009).
2. Motif-X: is a software tool designed to extract overrepresented patterns from any sequence data set. The algorithm is an iterative strategy which builds successive motifs through comparison to a dynamic statistical background (Schwartz and Gygi, 2005; Chou and Schwartz, 2005 ).
3. Scan-X: is a software tool designed to find motifs (identified using motif-x) within any sequence data set. The first large scale scan was performed using all available human, mouse, fly and yeast phosphorylation and acetylation data to perform a scan for undiscovered sites (Schwartz, et al., 2008).
6. RLIMS-P: is a rule-based text-mining program specifically designed to extract protein phosphorylation information on protein kinase, substrate and phosphorylation sites from the abstracts (Hu, et al., 2005; Yuan, et al., 2006).
7. KEA: Kinase enrichment analysis (KEA) is a web-based tool with an underlying database providing users with the ability to link lists of mammalian proteins/genes with the kinases that phosphorylate them (Lachmann and Ma'ayan, 2009).
8. HemI 1.0 : an easy-to-use tool can visualize either gene or protein expression data in heatmaps. Additionally, the heatmaps can be recolored, rescaled or rotated in a customized manner. In addition, HemI provides multiple clustering strategies for analyzing the data. Publication-quality figures can be exported directly (Deng, et al., 2014).
1. PhosphoScore: is a phosphorylation assignment program that is compatible with all levels of tandem mass spectrometry spectra (MSn) generated through the Bioworks/Sequest platform. The program utilizes a "cost function" which takes into account both the match quality and normalized intensity of observed spectral peaks compared to a theoretical spectrum. PhosphoScore was written in Java (Ruttenberg, et al., 2008).
8. PhosTShunter: a fast and reliable tool to detect phosphorylated peptides in liquid chromatography Fourier transform tandem mass spectrometry data sets (Kocher, et al., 2006). The tool is not available.
9. PhosphoScan: a probability-based method for phosphorylation site prediction using MS2/MS3 pair information (Wan, et al., 2008). The tool is not available.