Functional predictions of human long non-coding RNAs based on lncRNA-gene co-expression correlations.
A long non-coding RNA (lncRNA) is a transcript with more than 200 nucleotides that is not translated into protein. Based on gene-gene co-expression correlations created from ARCHS4's processed RNA-seq samples, we present 18,705 human and 11,274 mouse landing pages for long non-coding RNAs that include expression statistics across tissues and cell lines, predicted biological functions, pathway membership, subcellular localization, and predicted small molecules and CRISPR KO genes that may regulate their expression.
Loading...
Based on lncRNA-gene co-expression, this report provides predictions on the biological functions of ( ), displays the expression statistics of across tissues and cell-lines, and predicts small molecules that may specifically up- or down-regulate the expression of .
The genomic coordinates for are provided from GENCODE (gencode..long_noncoding_RNAs.gtf)[1].
CSV Table 1. Genomic coordinates for .
The PubMed API was used to generate AutoRIF data for . All PubMed IDs and dates were automatically collected for articles mentioning the lncRNA . The Ensembl ID[2], lncRNA gene symbol from GENCODE[1], and any previous symbols found in the HGNC database[3] along with the terms ‘lncRNA’ or ‘long non-coding RNA’ were used to query PubMed (e.g., “(ENSG00000228630 OR HOTAIR) AND (lncRNA OR long non-coding RNA)”)
Figure 1. Publications mentioning the lncRNA . 1102 total publications mentioned from 1992 to 2021 .
Using the loaded lncRNA-gene correlation matrix, we report the genes that are most correlated with .
There are no lncRNAs correlated with .
CSV Table 4a. Top 100 genes positively correlated with ranked by Pearson’s correlation coefficients.
Using the loaded lncRNA-gene correlation matrix, we report the genes that are most correlated with .
There are no lncRNAs correlated with .
CSV Table 4b. Top 100 genes negatively correlated with ranked by Pearson’s correlation coefficients.
Below we list the top 100 lncRNAs, out of all lncRNAs within our database, that correlate most with based on their Pearson correlation coefficients.
There are no lncRNAs correlated with .
CSV Table 5. Top 100 lncRNAs that correlate most with ranked by Pearson correlation coefficients.
Interactive network visualization of the top 100 genes correlated with . Each node represents a gene and is colored by chromosome location, except for the bright red node which represents the lncRNA . The thickness of the edges corresponds to Pearson correlation coefficients. Clicking on a gene node will highlight its corresponding edges in orange. Hovering over a node will display the gene name and chromosome location.
Network Methods: All pairwise correlations between the top 100 genes correlated with are extracted. The 3 edges with the highest correlation per gene node are used to initialize the network. Edges with weights < 0.3 are dropped. To further prune the network, the edge with the lowest weight for each hub node is dropped. At the start, a hub node is defined as a node with > 10 edges. The pruning process is repeated until the network has an average of < 3 edges per node. The top 5 edges for are shown regardless of their weights.
Download interactive network visualization of the top 100 genes correlated with HTML, node metadata CSV, edge metadata CSV.
The top 200 genes most correlated with are submitted to Enrichr [3-5] for enrichment analysis. NOTE: Only genes with official Entrez gene symbols are submitted to Enrichr. Ensembl IDs that do not map to an official gene symbol were dropped.
Number of genes | Positively | Negatively |
---|---|---|
25 | ||
50 | ||
100 | ||
200 | ||
300 | ||
500 |
For each Enrichr library, we compute the mean Pearson correlation coefficients for each gene set by averaging the Pearson correlation coefficients between each gene in the gene set and . Terms with high mean Pearson correlation coefficients are prioritized. These terms are predicted to be associated with .
Cell information was sourced from lncATLAS[5]. If lncATLAS does not contain information for the entered lncRNA, predicted scores will be shown in red. Predicted scores were calculated with unsupervised learning using ranked correlations from ARCHS4 and the availble localization data from lncATLAS[5].
Localization information for sourced from lncATLAS (blue) or the predicted localization (red) calculated using the correlation matrix and the avalible data from lncATLAS.
This part of the report provides the median expression for the lncRNA in various tissues and cell lines. Samples from ARCHS4[4] were automatically labelled by tissue type or cell line of origin. Tissues and cell lines with less than 20 samples were removed and the median expression as well as other statistics of were then calculated for each tissue type and cell line.
Figure 5. Expression statistics for the lncRNA in various tissue types.
Download table with expression statistics for in various tissue types: CSV
Figure 6. Expression statistics for the lncRNA in the top 30 cell lines.
Download table with expression statistics for in various cell lines: CSV
We applied UMAP[9] to visualize lncRNA expression across 3,000 randomly selected samples (with tissue type and cell line labels) from ARCHS4[4]. Samples were first log2 transformed and quantile normalized along the gene axis, then UMAP was applied to the lncRNA expression data with samples as features. Each data point represents a single lncRNA (n=). The black arrow is pointing to the location of .
~1.4 million Level 5 L1000 chemical perturbation gene expression signatures were downloaded from SigCom LINCS (https://maayanlab.cloud/sigcom-lincs) [7]. For each unique signature and lncRNA pair, a mean Pearson correlation coefficient was computed by taking the average Pearson coefficient between the lncRNA and all genes in the signature. All lncRNAs were then ranked by mean Pearson correlation coefficient, and the top 1,000 lncRNAs with the highest coefficients were retained for each signature. The top 500 lncRNA-L1000 signature associations are reported here for , separated by direction. If is highly correlated with the up-regulated genes for a specific small molecule, then this small molecule is predicted to up-regulate .
The prioritized small molecules below are predicted to specifically up-regulate .
There are no small molecules predicted to specifically up-regulate the expression of .
CSV Table 6. L1000 small molecules predicted to up-regulate the lncRNA .
The prioritized small molecules below are predicted to specifically down-regulate .
There are no small molecules predicted to specifically down-regulate the expression of .
CSV Table 7. L1000 small molecules predicted to down-regulate the lncRNA .
The prioritized genes below are predicted to specifically up-regulate .
There are no CRISPR KO genes predicted to specifically up-regulate the expression of .
CSV Table 8. L1000 CRISPR KO genes predicted to up-regulate the lncRNA .
The prioritized genes below are predicted to specifically down-regulate .
There are no CRISPR KO genes predicted to specifically down-regulate the expression of .
CSV Table 9. L1000 CRISPR KO genes predicted to down-regulate the lncRNA .
[1] Frankish A, Diekhans M, Jungreis I, Lagarde J, Loveland Jane E, Mudge JM, Sisu C, Wright JC, Armstrong J, Barnes I: GENCODE 2021. Nucleic Acids Research 2021, 49(D1):D916-D923.
[2] Howe KL, Achuthan P, Allen J, Allen J, Alvarez-Jarreta J, Amode MR, Armean IM, Azov AG, Bennett R, Bhai J: Ensembl 2021. Nucleic Acids Research 2021, 49(D1):D884-D891.
[3] Tweedie S, Braschi B, Gray K, Jones TEM, Seal Ruth L, Yates B, Bruford EA: Genenames.org: the HGNC and VGNC resources in 2021. Nucleic Acids Research 2021, 49(D1):D939-D946.
[4] Lachmann A, Torre D, Keenan AB, Jagodnik KM, Lee HJ, Wang L, Silverstein MC, Ma'ayan A: Massive mining of publicly available RNA-seq data from human and mouse. Nature Communications 2018, 10;9(1):1366.
[5] Mas-Ponte D, Carlevaro-Fita J, Palumbo E, Pulido TH, Guigo R, Johnson R. LncATLAS database for subcellular localization of long noncoding RNAs. Rna. 2017 Jul 1;23(7):1080-7.
[6] Xie Z, Bailey A, Kuleshov MV, Clarke DJB, Evangelista JE, Jenkins SL, Lachmann A, Wojciechowicz ML, Kropiwnicki E, Jagodnik KM: Gene Set Knowledge Discovery with Enrichr. Current Protocols 2021, 1(3):e90.
[7] Chen EY, Tan CM, Kou Y, Duan Q, Wang Z, Meirelles GV, Clark NR, Ma’ayan A: Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinformatics 2013, 14(1):128.
[8] Kuleshov MV, Jones MR, Rouillard AD, Fernandez NF, Duan Q, Wang Z, Koplev S, Jenkins SL, Jagodnik KM, Lachmann A: Enrichr: a comprehensive gene set enrichment analysis web server 2016 update. Nucleic Acids Research 2016, 44(W1):W90-W97.
[9] McInnes L, Healy J, Melville J: Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426 2018.
[10] Evangelista JE, Clarke DJB, Xie Z, Lachmann A, Jeon M, Chen K, Jagodnik KM, Jenkins SL, Kuleshov MV, Wojciechowicz ML, Schürer SC, Medvedovic M, Ma'ayan A. SigCom LINCS: data and metadata search engine for a million gene expression signatures. Nucleic Acids Research 2022, 50(W1):W697–709..