Machine Learning Case Studies

While the Harmonizome website provides a valuable interface for searching and browsing gene-biological entity associations collected from over 100 datasets, there is also enormous potential for biological discovery by using Harmonizome data to build computational models to predict novel properties of genes or proteins, such as molecular interactions or disease associations. We have made each processed dataset available for download in several convenient formats to facilitate use of Harmonizome data for computational analysis.

To demonstrate the value of Harmonizome data for computationally-driven hypothesis generation, we developed four supervised machine learning case studies. Our approach was similar for each case study. First we organized gene-biological entity associations from many Harmonizome datasets into a large feature matrix with genes labelling the rows and biological entities (features) labelling the columns. Then we trained a classifier to use the features to distinguish between genes (or pairs of genes) known to have a property of interest and genes (or pairs of genes) unlikely to have that property. Finally, we applied the classifier to make predictions about genes (or pairs of genes) for which knowledge is missing.

Methods and results for the machine learning case studies are described in detail in the Harmonizome publication. Here, we provide brief descriptions of the case studies, interactive tables for browsing the top predictions of the classifiers, and text files that contain the full results.

Case Study View Table Download Table
Ion Channel Predictions
Mouse Phenotype Predictions
GPCR-Ligand Interaction Predictions
Kinase-Substrate Interaction Predictions