Kinase substrates Machine Learning Case Study

For this case study, we trained a classifier to predict known interactions between kinases and their substrates, and then predicted kinases for proteins that have phosphosites with unknown regulatory kinase. The classifier performed modestly in cross-validation with an area under the receiver operating characteristic curve of 0.88, F1-score of 0.23, and Matthew’s correlation coefficient of 0.22. Methods and results for this machine learning case study are described in detail in the Harmonizome publication.

Each row in the results table provides the gene symbol of a kinase (Kinase column), provides the gene symbol of a phosphoprotein (Substrate column), provides the predicted probability that the Kinase phosphorylates the protein (Probability column), indicates whether the phosphorylation reaction is known (Known column), indicates whether the pair was used as an example for training the classifier (Training column), and provides the predicted false discovery rate if all predictions with better or equivalent probability were tested (FDR column).

The table below shows predictions with probabilities greater than or equal to 0.5. To download the full table:

Kinase Substrate Probability Known Training FDR
CDK1 MELK 0.76 false true 1.0
CDK1 DLGAP5 0.72 true true 0.5
PRKCD LRRC7 0.71 false true 0.67
CDK1 CENPF 0.71 false true 0.75
CDK2 E2F1 0.7 true true 0.6
CDK1 CDC6 0.69 false true 0.67
CDK1 NUSAP1 0.68 true true 0.5
CDK1 TPX2 0.68 true true 0.57
CAMK2A MYOM3 0.68 false false 0.62
MAPK1 FOS 0.67 true true 0.44
CDK1 RRM2 0.67 true true 0.4
CDK1 AURKA 0.67 true true 0.36
CDK1 CCNA2 0.66 false true 0.44
CDK1 TTK 0.66 false true 0.41
CDK1 CDC20 0.66 true true 0.38
CDK1 CDK1 0.66 true true 0.4
CDK2 MCM3 0.66 true true 0.43
CDK1 ASPM 0.66 false true 0.46
CDK1 CDT1 0.66 false true 0.42
CDK2 TOP2A 0.65 false true 0.45
CDK1 DTL 0.65 true true 0.43
CDK2 TK1 0.65 true true 0.45
CAMK2A HECW1 0.65 false false 0.47
CDK1 AURKB 0.65 false true 0.47
PLK1 CENPF 0.64 false true 0.52
CDK1 MCM6 0.64 false false 0.51
MAPK3 EGR1 0.64 false true 0.5
CDK1 CENPA 0.64 true true 0.48
PRKCD CRHR1 0.64 false true 0.52
CDK2 MYC 0.64 true true 0.5
SGK1 LRRC7 0.64 false true 0.5
CDK1 PLK1 0.64 false true 0.48
CDK2 CCND1 0.64 true true 0.46
SGK1 CRHR1 0.64 false true 0.48
CDK1 KIAA0101 0.63 false false 0.48
MAPK1 JUN 0.63 true true 0.47
CDK1 E2F1 0.63 true true 0.49
CAMK2A HPSE2 0.63 false false 0.49
CDK2 MYBL2 0.63 true true 0.5
LCK CD6 0.63 false true 0.51
CDK1 MAD2L1 0.63 false true 0.5
CDK1 STMN1 0.63 true true 0.48
CDK1 BUB1B 0.63 true true 0.5
CDK1 KIFC1 0.62 false true 0.5
CDK1 GTSE1 0.62 false true 0.49
CDK1 H2AFZ 0.62 false false 0.49
CDK1 CCNB2 0.62 false false 0.48
CDK2 MCM7 0.61 true true 0.5
LCK IL16 0.61 false true 0.51
MAPK14 CDKN1A 0.61 true true 0.5
MAPK1 EGR1 0.61 false true 0.51
CDK1 UBE2C 0.61 false false 0.51
CDK2 RFC3 0.6 false false 0.51
CDK2 MCM6 0.6 false false 0.51
CDK1 PRC1 0.6 true true 0.51
CDK1 MCM2 0.6 false true 0.52
PLK1 CDK1 0.6 false true 0.51
CDK2 BUB1 0.6 false true 0.5
CDK1 KIF23 0.6 false true 0.49
CDK2 MCM2 0.6 true true 0.48
MAPK14 JUN 0.6 true true 0.49
CDK1 CENPU 0.59 false true 0.51
MAPK1 JUNB 0.59 true true 0.5
PLK1 TTK 0.59 false true 0.51
CDK1 LMNB1 0.59 true true 0.5
FYN CD6 0.58 false true 0.52
CDK2 STMN1 0.58 true true 0.52
CDK2 TFDP1 0.58 false true 0.53
CDK2 CDC6 0.58 true true 0.52
CDK1 ATAD2 0.58 false true 0.53
CDK1 SMC4 0.58 false true 0.52
MAPK1 CCND1 0.57 false true 0.53
MAPK1 CDKN1A 0.57 true true 0.52
PRKACA LRRC7 0.57 false true 0.52
CDK1 CCNB1 0.57 true true 0.51
CDK2 PCNA 0.57 false true 0.52
CDK2 CDT1 0.57 true true 0.52
CDK1 CCND1 0.57 false true 0.53
CDK1 MCM7 0.57 true true 0.52
CDK2 SIN3A 0.57 true true 0.52
PLK1 NCAPH 0.57 false true 0.53
CDK1 RACGAP1 0.56 true true 0.52
CSNK2A2 CCND1 0.56 false true 0.53
MAPK14 DUSP1 0.56 false true 0.52
CDK1 GINS2 0.56 false true 0.51
MAPK14 MYC 0.56 true true 0.51
CSNK2A1 CCND1 0.56 false true 0.51
CDK2 BRCA1 0.56 true true 0.51
CDK1 MYBL2 0.56 true true 0.51
CDK1 MCM4 0.55 true true 0.52
CDK2 CHEK1 0.55 true true 0.52
LCK LILRB2 0.55 false false 0.51
GSK3B TP53 0.55 true true 0.51
CDK1 MKI67 0.55 true true 0.51
CSNK2A2 CCNB1 0.55 false true 0.51
CSNK2A1 JUNB 0.55 false true 0.53
PRKACA CRYBG3 0.55 false false 0.52
MAPK1 DUSP1 0.55 true true 0.51
CDK1 KIF11 0.55 true true 0.51
MAPK14 GADD45B 0.55 false true 0.51
CDK1 HMGB2 0.55 false true 0.52
CDK1 NCAPH 0.55 false true 0.52
FYN TIGIT 0.55 false false 0.52
GSK3B JUN 0.54 true true 0.52
PRKACA KRT72 0.54 false false 0.52
CDK2 CDKN1A 0.54 true true 0.53
CDK2 GMNN 0.54 false true 0.53
CSNK2A1 FOS 0.54 false true 0.53
MAPK14 TP53 0.54 true true 0.52
CDK1 NEK2 0.54 false true 0.53
PRKCA PRH2 0.54 false true 0.52
CDK2 PLK4 0.54 false true 0.52
MAPK14 MDM2 0.53 false true 0.52
CDK1 KPNA2 0.53 false true 0.52
SGK1 KIAA1211 0.53 false false 0.52
CDK1 FOS 0.53 false true 0.52
CDK1 FEN1 0.53 true true 0.51
CDK1 ANLN 0.53 true true 0.52
FYN CD3D 0.53 false true 0.52
CDK1 SKP2 0.53 false true 0.52
CDK1 CHEK1 0.53 true true 0.51
CDK1 ZNF148 0.53 false true 0.52
CSNK2A1 TP53 0.53 true true 0.51
PRKCA KIR3DL1 0.53 true true 0.52
CAMK2A PARK2 0.53 false true 0.52
PRKCA KRT72 0.53 false false 0.52
MAPK1 GADD45B 0.53 false true 0.52
MAPK14 EGR1 0.53 false true 0.53
CDK1 CDC7 0.53 true true 0.52
CSNK2A1 MYC 0.53 true true 0.52
MAPK3 FOS 0.53 true true 0.51
CDK2 AURKB 0.52 false true 0.54
CDK2 CCNT1 0.52 false true 0.55
GSK3B CCND1 0.52 true true 0.55
CDK2 CDC25B 0.52 false true 0.55
PRKCD KIAA1211 0.52 false false 0.55
CDK1 CDKN3 0.52 false false 0.55
SGK1 KRT72 0.52 false false 0.55
CDK1 RFC3 0.52 false false 0.55
CDK2 TYMS 0.52 false true 0.55
GSK3B RPTOR 0.52 true true 0.55
PRKACA KIR3DL1 0.52 false true 0.54
CDK2 EHMT2 0.52 false true 0.54
CDK2 SKP2 0.52 true true 0.52
PRKCD CRYBG3 0.52 false false 0.52
CDK1 KIF4A 0.52 false true 0.52
CDK1 ZWINT 0.52 false false 0.52
PRKCA MYH8 0.52 false false 0.52
CSNK2A1 EGFR 0.52 true true 0.52
AKT1 TP53 0.52 false true 0.52
SRC CRHR1 0.52 false true 0.53
MAPK1 GADD45A 0.52 false true 0.53
PRKCA CRHR1 0.52 false true 0.53
CDK2 CCNE2 0.51 false true 0.54
CDK2 FN1 0.51 false true 0.55
PLK1 BUB1 0.51 true true 0.54
CDK2 CDC25A 0.51 true true 0.54
PRKCA CDKN1A 0.51 true true 0.53
PRKCA KRT18 0.51 true true 0.53
PRKCA IL16 0.51 false true 0.53
GSK3B MCM3 0.51 false true 0.53
CDK1 NCAPG 0.51 true true 0.54
CAMK2A ZNF229 0.51 false false 0.54
PRKCD KRT72 0.51 false false 0.55
CDK1 CCNE1 0.51 false true 0.55
CDK2 CDC7 0.51 true true 0.55
MAPK1 BCL2 0.51 true true 0.54
CDK1 RAD51 0.51 false true 0.54
PLK1 TOP2A 0.51 true true 0.54
CDK2 SIRT6 0.51 false false 0.54
PRKACA SLC14A2 0.51 false false 0.54
CDK1 ASF1B 0.5 false true 0.53
CDK1 CDC45 0.5 false false 0.53
CDK2 ZNF485 0.5 false false 0.53
CDK1 CDC25A 0.5 true true 0.53
CDK4 SETD2 0.5 false true 0.53
MAPK9 CELF1 0.5 false true 0.53
CDK1 TUBG1 0.5 false true 0.54
MAPK3 JUN 0.5 true true 0.53
MAPK14 FOS 0.5 true true 0.53
CDK2 EGFR 0.5 true true 0.53
GSK3B SPEN 0.5 true true 0.53
MAPK14 BRCA1 0.5 false true 0.53
CDK2 CENPF 0.5 true true 0.53
CSNK2A1 RB1 0.5 false true 0.54
GSK3B NOTCH2 0.5 true true 0.53
CDK2 CDK1 0.5 false true 0.53
PLK1 DTL 0.5 false true 0.54
CDK1 CEP55 0.5 true true 0.53
CSNK2A1 BTG2 0.5 false true 0.54
LYN CD79A 0.5 true true 0.53
PRKCA CLDN18 0.5 false false 0.53
CDK2 BCL2 0.5 true true 0.53
CDK1 CHAF1A 0.5 false true 0.53