Harmonizome Documentation (v0.4)

1. Harmonizome

Harmonizome is a multi-omics data integration platform developed by the Ma’ayan Lab. Genomics, proteomics, transcriptomics, metabolomics, and related fields continue to advance technologically and create large quantities of data. Additionally curation efforts have succeeded in mining and aggregating text from decades of biomedical literature into online databases.

It has become more important than ever to harmonize and standardize disparate formats and serve that information to facilitate data reusability. The Harmonizome website allows users to view, download, and visualize data from 138 datasets from some of the most-used omics platforms from across the web. In order to accommodate different types of datasets with a large variety of attribute types, the Harmonizome web application and associated API provide an abstracted framework to store and serve metadata and functional associations from processed datasets.

2. Getting Started

2.1 Gene Pages

A key component of the Harmonizome website are gene pages. Gene pages are divided into two sections: a metadata section and a functional association section.

On each gene’s page information about that gene is displayed in the metadata section. Gene symbols have been converted to the symbol used in the NCBI Gene resource. The HGNC family field denotes the family that the gene belongs to from the HUGO Gene Nomenclature Committee, if available. The name field contains the full name of the gene. The description field provides a description of the gene’s function, level of study, functions, and more information if available from NCBI Gene. The synonym field lists any known gene symbols that refer to the same gene. The protein field lists any proteins encoded by the gene, if available, and links to their corresponding Harmonizome page. Finally, the NCBI Gene ID is listed, which links to the NCBI Gene page for the gene. Not all fields will be available for every gene, and any fields that would be empty are not displayed. Links to access the gene’s information and download the gene’s associations through the API are also provided. Lastly, external links to see predicted function, co-expressed genes, and tissue/cell line expression on ARCHS4 are available.

The second part of each gene’s page lists the gene’s functional associations. An expandable menu for each dataset with a gene set functionally associated with the gene is displayed. Each row will contain the name of the dataset and a link to that dataset’s page and a summary of the functional associations listed. Inside the expanded menu, the gene sets with associations are displayed, along with the directions and scores of those associations, if available.

2.2 Dataset Pages

Another key component of the Harmonizome platform are the processed datasets that comprise the knowledge base. Dataset pages are divided into four sections: metadata, data access, visualizations, and gene sets.

The metadata section of the dataset page displays information about the description, measurement, association, category, resource, citation(s), last updated, and statistics for the dataset. The resource field links to a resource page with information about the resource the dataset was sourced from. The citation links will display the corresponding publication in PubMed.

The data access section of the dataset page includes links to API access for the dataset, access to the harmonizomedownloader.py module to download the dataset’s files from Python. The downloads section contains a list of links to download individual files for the dataset. The question mark icon next to each download link can be hovered over to view a description of that download type.

The visualization section displays all available visualizations for the dataset. By default, the visualizations will display in a row of static images. When clicked, the selected visualization will be expanded to provide a better view of the visualization, and where available, provide the interactive version of the visualization. Clicking on another visualization from the display row will replace the currently selected visualization, and clicking on the currently selected visualization in the display row again will minimize it.

Lastly, the gene set section of the dataset page lists all of the gene sets for the dataset. Each of these will have a name which links to the gene set page for that gene set and a description, if available.

Due to the varied nature of data formats and availability between datasets, the processing pipeline is different for each individual dataset. Processing scripts are available as one of the download types, providing a gzip tar directory containing any scripts and mapping files used. The HarmonizomePythonScripts repository on GitHub also houses these processing scripts.

2.3 Gene Set Pages

The third key component of Harmonizome are gene set pages. These pages provide details about the information provided for an attribute from a specific dataset. They are accessed through the functional associations found on gene pages and from the gene set sections of dataset pages. Gene set pages are divided into a metadata section and a gene association section.

The metadata section lists the gene set name, the dataset the gene set is from, the dataset category, the attribute type, and a description if available. There is also a link to search for similar terms, which will open a new tab with search results, using the name of the gene set as the query. Finally, another link is provided to access the gene set information through the API.

The gene association section lists all genes associated with the gene set. Depending on the data from the dataset, this can either be a list of associated genes, or the associations can be separated into positive and negative association lists. If the dataset has scores included in the association data, the associated genes will be sorted by their association scores within these tables.

4. REST API

As an alternative to the website, the Harmonizome knowledge base can also be accessed through a REST API interface. Using this, developers can easily utilize the API to integrate data from Harmonizome into scripts or applications.

The base API URL /api/1.0/ will return a list of supported entities that can be added to the URL string. The list of currently supported entities is:

  • Attribute
  • Dataset
  • Gene
  • Gene Set
  • HGNC Family
  • Naming Authority
  • Protein
  • Resource
    
        GET /Harmonizome/api/1.0

        {
            "version" :1.0,
            "entities":[
                {
                    "entity": "attribute",
                    "href": "/api/1.0/attribute"
                },
                {
                    "entity": "dataset",
                    "href": "/api/1.0/dataset"
                },
                {
                    "entity": "gene",
                    "href": "/api/1.0/gene"
                },
                {
                    "entity": "gene set",
                    "href": "/api/1.0/gene_set"
                },
                ...
            ]
        }
    

Any of these entities can be appended to the base URL to return a list of that entity type. Each entry in the list will have a name and an href tag.

    
        GET /Harmonizome/api/1.0/gene

        {
            "count": 56720,
            "selection": [0, 100],
            "next": "/api/1.0/gene?cursor=100",
            "entities": [
                {
                    "symbol": "LOC105377913",
                    "href": "/api/1.0/gene/LOC105377913"
                    },
                    {
                        "symbol": "LOC105377912",
                        "href": "/api/1.0/gene/LOC105377912"
                    },
                    {
                        "symbol":"LOC105377911",
                        "href":"/api/1.0/gene/LOC105377911"
                    },
                    ...
                ]
        }
    

4.1 Cursor

To minimize wait times and database queries, entity lists are paginated using a cursor. By default, accessing a list of entities will return the first 100 entities of that type. However, the cursor argument can be added to the URL to specify a start index other than 0.

    
        GET /Harmonizome/api/1.0/gene?cursor=3141

        {
            "count": 58358,
            "Selection": [3141,3241],
            "Next": "/api/1.0/gene?cursor=3241",
            ...
        }
    

If cursor value is larger than the count available for that entity type, an error will be returned.

4.2 Entities

The href property of an entity returned from an entity list query can be used in the URL to get specific information for that entity. In this case, rather than returning a list of entities, the API response will include a single entry with fields specific to that entity type and information for each.

    
        GET /Harmonizome/api/1.0/gene/nanog

        {
            "symbol": "NANOG",
            "name": "Nanog homeobox",
            "ncbiEntrezGeneId": 79923,
            "ncbiEntrezGeneUrl": "http://www.ncbi.nlm.nih.gov/gene/79923",
            ...
        }
    

4.3 Associations

When accessing information for a gene or gene set, the showAssociations=True argument can be included in the URL to return a list of associated entities in addition to that entity’s base information. All associations have a threhsold value. Unsigned and positive associations have a threshold value of 1, while negative associations have a threshold value of -1. Some datasets also have standardized values available, which are derived from values from the source data. When available, these can serve as a measure of the strength of associations within a dataset or gene set.

    
        GET /Harmonizome/api/1.0/gene/nanog?showAssociations=true

        {
            "symbol": "NANOG",
            ...
            "associations": [
                {
                    "geneSet": {
                        "name": "V/Allen Brain Atlas Adult Human Brain Tissue Gene Expression Profiles",
                        "href": "/api/1.0/gene_set/V/Allen+Brain+Atlas+Adult+Human+Brain+Tissue+Gene+Expression+Profiles"
                    },
                    "thresholdValue": 1,
                    "standardizedValue": 1.33291
                },
                ...
            ]
        }
    

4.4 Python API Module

The harmonizomeapi.py module can be downloaded to interact with the Harmonizome API in Python. Calling get on a supported entity will return information about that entity in the same JSON format as the web interface.The name of the entity can be omitted to return a list of that entity type. To get more of the same entity type, next can be called on the entity list.

    
        from harmonizomeapi import Harmonizome, Entity
        pid_dataset = Harmonizome.get(Entity.DATASET, name='PID pathways')

        entity_list = Harmonizome.get(Entity.GENE)
        
        more = Harmonizome.next(entity_list)
    

4.5 Knowledge Graph API

Harmonizome-KG can also be accessed via the API from https://harmonizome-kg.maayanlab.cloud/api/knowledge_graph which takes the following parameters:

    
        {
            "start": "string",
            "start_field": "label",
            "start_term": "string",
            "end": "string",
            "end_field": "string",
            "end_term": "string",
            "limit": 5,
            "relation": [
              {
                "name": "string",
                "limit": 5,
              }
            ],
            "path_length": 1,
            "remove": [
              "string"
            ]
          }
    

The following examples walks through some of the common queries when accessing the API.

4.5.1 Single term search

Suppose we want to find the immediate neighbors of STAT3. To do this we need to define start and start_term:


import requests
import json

payload = {
    "start": "Gene", 
    # metadata field to query (default: label)
    "start_field": "label", 
    "start_term": "STAT3",
    "limit": 10
    
}

res = requests.get("https://harmonizome-kg.maayanlab.cloud/api/knowledge_graph", params={"filter": json.dumps(payload)})
if res.ok:
    results = res.json()

This returns a subnetwork containing the immediate neighbor of STAT3

4.5.2 Query only certain relationships

Suppose we only care about STAT3's participation in GO Biological Process, we can add the relation field:


import requests
import json

payload = {
    "start": "Gene", 
    # metadata field to query (default: label)
    "start_field": "label", 
    "start_term": "STAT3",
    "relation": [{
        "name": "participates_in_(GO Bio Process 2023)",
        "limit": 5
    }]
    
}

res = requests.get("https://harmonizome-kg.maayanlab.cloud/api/knowledge_graph", params={"filter": json.dumps(payload)})
if res.ok:
    results = res.json()

    

4.5.3 Removing a node

Nodes can be removed from the subnetwork by sending their ids to the server.


import requests
import json

payload = {
    "start": "Gene", 
    # metadata field to query (default: label)
    "start_field": "label", 
    "start_term": "STAT3",
    "relation": [{
        "name": "participates_in_(GO Bio Process 2023)",
        "limit": 5
    }],
    "remove": [
        "GO:0045944"
    ]
}

res = requests.get("https://harmonizome-kg.maayanlab.cloud/api/knowledge_graph", params={"filter": json.dumps(payload)})
if res.ok:
    results = res.json()

    

4.5.4 Starting from a term and end with a node type

Suppose we want to find HPO nodes that are connected with the gene STAT3 via shortest path


import requests
import json

payload = {
    "start": "Gene", 
    "start_field": "label", 
    "start_term": "STAT3",
    "end": "HPO",
}

res = requests.get("https://harmonizome-kg.maayanlab.cloud/api/knowledge_graph", params={"filter": json.dumps(payload)})
if res.ok:
    results = res.json()

    

4.5.5 Two term search

For this example we want to find the shortest path between STAT3 and MAPK1:


import requests
import json

payload = {
    "start": "Gene", 
    "start_field": "label", 
    "start_term": "STAT3",
    "end": "Gene",
    "end_field": "label",
    "end_term": "MAPK1",
}

res = requests.get("https://harmonizome-kg.maayanlab.cloud/api/knowledge_graph", params={"filter": json.dumps(payload)})
if res.ok:
    results = res.json()

    

5. Additional User Support

5.1 Contact

Available from any page in the footer, the contact form allows users to contact the maintainers of the site. To submit a question, issue, or feature suggestion, the appropriate option should be selected from the drop-down menu. In the next field, an email to which responses should be directed should be included. Finally, the details box provides space to describe the question, issue, or suggestion. After submitting the request, a confirmation or error screen will appear. Contact submissions are reviewed manually, so please allow a few days for a response.

5.2 Terms and License

Harmonizome is available under the Creative Commons BY NC SA 4.0 License for non-commercial use. Users can share and adapt the platform with proper attribution for non-commercial use under the terms of the license. For commercial use, please contact Mount Sinai Innovation Partners.