Materials & Data preparation:

    We get the data set of phosphorylation sites from Phospho.ELM which also includes the data of PhosphoBase. After removing the phosphorylation sites with ambiguous information of PKs, we get 1404 items. We also manually checked the recent publications and got ~660 items. After clustering some homology PKs (protein kinases) with too few known phosphorylation sites into a unique group, we got 71 PK groups with 216 unique PKs, including ABL, ALK, AMPK, ATM, AURORA-A, AURORA-B, BTK, CAK, CaM-I/IV, CaM-II, CDKs, Chk1/Chk2, CK1, CK2, DAPK, DNA-PK, EGFR, EPHA/B, FAK, Fer, Fes, FGFR, Fms, Fgr, Fyn/Yes, GRK, GSK3, Hck, IGFR, IKK, ILK, IPL1(yeast), IR, JAK, KIS, KIT, LCK, LYN, MAPKK, MAPK, MAPKAPK2, MAPKK, MAPKKK, MET/RON, MLCK, MTOR, NIMA, P34CDC2, PAK, PDGFR, PDK, PHK, PKA, PKB, PKC, PKG, PKR, PLK, RAF1, RET, RLK, ROCK, S6K, SGK, SRC, SYK, TIE2, TYK2, TRK, VEGFR, and ZAP70, etc. The kinases with few verified phosphorylation sites are excluded. The current version is 1.10 and the release note could be dowload here.

§ Method & Algorithm:

Scoring & Clustering
    A phosphorylation site with m upstream and n downstream amino acids respectively is called a phosphorylation site peptide PSP(m, n). Here we only consider the heptapeptide PSP(3, 3). We use the amino acid substitution matrix BLOSUM62 to evaluate the similarity between two peptide sequences with length 7 AA. Although other matrices could be used, the BLOSUM62 matrix is chosen here.

    For two amino acids a and b, let the substitution score between them in BLOSUM62 be Score(a, b). The similarity between two peptides A and B with length 7AA is defined as:


    Taking all the PSP(3, 3) of a given kinase K as nodes, we connect them with edges whose weight is the distance between the pair of nodes. The nodes can be partitioned into several clusters according to the distances between them. If a peptide sequence with length 7 AA is close enough to one of the clusters, we may assume that this peptide can also be phosphorylated by kinase K. We adopt the Markov Cluster Algorithm (MCL for short) to partition the above graph into several clusters.

Group-based Phosphorylation Scoring method (GPS)
    Based on the above clustering and scoring strategies, we designed the following algorithm to generate a score for a potential phosphorylation site P of kinase K.


§ FAQ:

Q1: How to use it?

A1: Please click on the "Prediction" button. Then you may input your sequence or paste it from other text editors. The inputted data could be in FASTA format or just raw sequence. After that, choose your favorite kinase(s) and the corresponding cut-off score(s), and press the "Submit" button.

Q2: How about the performance of the GPS webserver?

A2: Well, we use "Leave-one-out" validation to illustrate both sensitivity (Sn) and specificity (Sp) under different cut-off values. The default cut-off values stand for the balance between Sn and Sp.

Q3: Where can I get the detailed information about each kinase?

A3: You may click on the names of the kinases to get the corresponding information, including the Sn and Sp for each of the provided cut-off scores.

For publication of results, please cite the following article:

1. Yu Xue, Fengfeng Zhou, Minjie Zhu, Guoliang Chen, and Xuebiao Yao. GPS: a comprehensive www server for phosphorylation sites prediction. Nucleic Acids Res. 2005 Jul 1;33(Web Server issue):W184-7.

2. Fengfeng Zhou, Yu Xue, Guoliang Chen, and Xuebiao Yao. GPS: a novel group-based phosphorylation predicting and scoring method. Biochem Biophys Res Commun. 2004 Dec 24;325(4):1443-8.

  Last update: Mar 23, 2007    This site has been visited for 15165 times
Copyright © 2004-2009 The CUCKOO Workgroup, USTC, All Rights Reserved