¡ì Materials & Data preparation:
We get the data
set of phosphorylation sites from Phospho.ELM
which also includes the data of PhosphoBase.
After removing the phosphorylation sites with ambiguous
information of PKs, we get 1404 items. We also manually
checked the recent publications and got ~660 items.
After clustering some homology PKs (protein kinases)
with too few known phosphorylation sites into a unique
group, we got 71
PK groups with 216
unique PKs, including ABL, ALK, AMPK, ATM, AURORA-A, AURORA-B, BTK, CAK, CaM-I/IV, CaM-II, CDKs, Chk1/Chk2, CK1, CK2, DAPK, DNA-PK, EGFR, EPHA/B, FAK, Fer, Fes, FGFR, Fms, Fgr, Fyn/Yes, GRK, GSK3, Hck, IGFR, IKK, ILK, IPL1(yeast), IR, JAK, KIS, KIT, LCK, LYN, MAPKK, MAPK, MAPKAPK2, MAPKK, MAPKKK, MET/RON, MLCK, MTOR, NIMA, P34CDC2, PAK, PDGFR, PDK, PHK, PKA, PKB, PKC, PKG, PKR, PLK, RAF1, RET, RLK, ROCK, S6K, SGK, SRC, SYK, TIE2, TYK2, TRK, VEGFR, and ZAP70, etc. The kinases with few verified
phosphorylation sites are excluded. The current version is 1.10 and the release note could be dowload here.
Method & Algorithm:
A phosphorylation site with m upstream and n downstream
amino acids respectively is called a phosphorylation
site peptide PSP(m, n). Here we only consider the
heptapeptide PSP(3, 3). We use the amino acid substitution
matrix BLOSUM62 to evaluate the similarity between
two peptide sequences with length 7 AA. Although other
matrices could be used, the BLOSUM62 matrix is chosen
For two amino acids
a and b, let the substitution score between them in
BLOSUM62 be Score(a, b). The similarity between two
peptides A and B with length 7AA is defined as:
Taking all the PSP(3,
3) of a given kinase K as nodes, we connect them with
edges whose weight is the distance between the pair
of nodes. The nodes can be partitioned into several
clusters according to the distances between them. If
a peptide sequence with length 7 AA is close enough
to one of the clusters, we may assume that this peptide
can also be phosphorylated by kinase K. We adopt the
Markov Cluster Algorithm (MCL for short) to partition
the above graph into several clusters.
Phosphorylation Scoring method (GPS)
Based on the above clustering and scoring strategies,
we designed the following algorithm to generate a score
for a potential phosphorylation site P of kinase K.
How to use it?
Please click on the "Prediction"
button. Then you may input your sequence or paste
it from other text editors. The inputted data could
be in FASTA format or just raw sequence. After that,
choose your favorite kinase(s) and the corresponding
cut-off score(s), and press the "Submit" button.
How about the performance of the GPS webserver?
Well, we use "Leave-one-out" validation
to illustrate both sensitivity (Sn) and specificity
(Sp) under different cut-off values. The
default cut-off values stand for the balance between
Sn and Sp.
Where can I get the detailed information about each
You may click on the names of the kinases to get the
corresponding information, including the Sn
and Sp for each of the provided cut-off scores.
| Last update: Mar 23, 2007
This site has been visited for 11971 times
|Copyright © 2004-2009 The CUCKOO Workgroup, USTC,
All Rights Reserved