ARCHS4 Downloads


This section contains all files created for the ARCHS4 website. The methods are described at . For help in accessing the files refer to the Help section or contact us directly. The database will be updated on a regular basis and old versions of the files will be accessible.

If you would like to receive updates on the ARCHS4 data and stay informed about new data releases consider signing up for the newsletter.

Expression (gene level) | Expression (transcript level) | TPM (transcript level) | Expression (Affymetrix arrays) | t-SNE sample coordiantes | t-SNE gene coordinates | Gene correlation | PrismExp predictions | JL transformed expression | Kallisto index files | GEO expression | recount2 expression | GitHub repository

ARCHS4 Version 2.3 (Ensembl 107)

This is the newest data-release of the ARCHS4 gene expression collection. The Ensembl Annotation version is now 107 increasing the number of genes significantly for coding and non coding genes. A redesigned pipeline was used to create the files. Older versions are still available.

Expression (gene level)

Expression files for mouse and human in HDF5 format. All gene counts are on gene level (Entrez Gene Symbol). For compression purposes the Kallisto pseudocounts are rounded to integer values.

Human

human_gene_v2.3.h5

Date: 3-11-2024
Size: 41G
SHA1: 31a37ba49ab5cf812a0be9f180d75b3b29e07ea2

Mouse

mouse_gene_v2.3.h5

Date: 3-11-2024
Size: 34G
SHA1: 45f1a98bf2da06ecb249f25be08126cb63553d9c

ARCHS4 Version 2.2 (Ensembl 107)

Expression (gene level)

Expression files for mouse and human in HDF5 format. All gene counts are on gene level (Entrez Gene Symbol). For compression purposes the Kallisto pseudocounts are rounded to integer values.

Human

human_gene_v2.2.h5

Date: 5-30-2023
Size: 35G
SHA1: 20f4063e264437c78c7875d74a31685e2ca2a18d

Mouse

mouse_gene_v2.2.h5

Date: 5-30-2023
Size: 29G
SHA1: 4308672c7a8e93fb063c7c289564105af60aaaf7

Expression (transcript level)

Transcript level expression files for mouse and human in HDF5 format. All transcript counts are on ensembl_id level. For compression purposes the Kallisto pseudocounts are rounded to integer values.

Human

human_transcript_v2.2.h5

Date: 5-30-2023
Size: 107G
SHA1: 9d129b0c06531ec12847fc42f988818e0e7afcf7

Mouse

mouse_transcript_v2.2.h5

Date: 5-30-2023
Size: 65G
SHA1: cb7c8a7e0e1996067208cb526ff1e9670bf4335d

Expression (transcript TPM)

TPM transcript expression files for mouse and human in HDF5 format. All transcript TPM are on ensembl_id level.

Human

human_tpm_v2.2.h5

Date: 5-30-2023
Size: 218G
SHA1: 85bd7ea1fc9aa8bb6c1dda927c9de17b11ba9933

Mouse

mouse_tpm_v2.2.h5

Date: 5-30-2023
Size: 126G
SHA1: 496226cbe8b8148efbe84db0f8e72d7b9af27790


Expression (gene level)

Human

Mouse


Expression files for mouse and human in HDF5 format. All gene counts are on gene level (Entrez Gene Symbol). For compression purposes the Kallisto pseudocounts are rounded to integer values.
human_matrix_v1.11.h5
Date: 11-16-2021
Size: 17G
mouse_matrix_v1.11.h5
Date: 11-16-2021
Size: 17G

Expression (transcript level)

Human

Mouse

Expression files for mouse and human in HDF5 format. All measurements are at the transcript level (Ensembl ID). For compression purposes the Kallisto pseudocounts are rounded to integer values.
human_transcript_v1.11.h5
Date: 11-16-2021
Size: 48G
mouse_transcript_v1.11.h5
Date: 11-16-2021
Size: 33G

TPM (transcript level)

Human

Mouse

Expression files for mouse and human in HDF5 format. All measurements are at the transcript level (Ensembl ID). The files are very large and values are not rounded.
human_tpm_v1.11.h5
Date: 11-16-2021
Size: 101G
mouse_tpm_v1.11.h5
Date: 11-16-2021
Size: 64G

Expression (Affymetrix arrays)

Human

Mouse

Expression files for human and mouse Affymetrix arrays. The collection contains 262,468 human samples and 86,012 mouse samples. All measurements are at the probe level. Values are taken as stored in GEO. For compression reasons values are stored as 16-bit floats.
GPL570_expression.h5
Date: 5/2021
Size: 14.7 GB

GPL571_expression.h5
Date: 5/2021
Size: 936 MB

GPL6244_expression.h5
Date: 5/2021
Size: 2 GB

GPL96_expression.h5
Date: 5/2021
Size: 1.66 GB
GPL6246_expression.h5
Date: 5/2021
Size: 1.68 GB

GPL1261_expression.h5
Date: 5/2021
Size: 4.57 GB

t-SNE sample coordinates

Human

Mouse

Gene expression reduced to 3 dimensions. The files contain 4 columns with the first 3 containing dimensions x, y, z and the last column containing the numeric part of the GSM id (GSM123456 -> 123456).
sample_human_tsne.csv v2
Date: 3/2018
Size: 2.9 MB
sample_mouse_tsne.csv v2
Date: 3/2018
Size: 3.5 MB

t-SNE gene coordinates

Human

Mouse

Gene expression reduced to 3 dimensions. The files contain 4 columns with the first 3 containing dimensions x, y, z and the last column containing Entrez gene symbol.
gene_human_tsne.csv v2
Date: 3/2018
Size: 741.0 KB
gene_mouse_tsne.csv v2
Date: 3/2018
Size: 606.0 KB

Gene correlation

Human

Mouse

Pairwise pearson correlation of genes across expression samples.
human_correlation.rda v1.0
Date: 10/2017
Size: 5.0 GB
mouse_correlation.rda v1.0
Date: 8/2017
Size: 3.0 GB
Pairwise pearson correlation of genes across expression samples. File format is feather and can be loaded directly into memory in Python and R.
human_correlation_archs4.f v1.0
Date: 7/2020
Size: 5.3 GB
mouse_correlation_archs4.f v1.0
Date: 7/2020
Size: 3.3 GB

PrismExp predictions

Gene function prediction from PrismEXP using 300 correlation matrices in feather and can be loaded directly into memory in Python and R. Contains 155 files zipped.
prismx_prediction.zip v1.0
Date: 4/2021
Size: 4.44 GB

JL transfomed expression

Human

Mouse

Gene expression compressed with the Johnson-Lindenstrauss transformation. The RDA files can be loaded into a running R environment with the "load" command. The files create two variables, the transform matrix used for the projection and the jl_expression matrix. The original dimensions are reduced to 1000. The original distances and correlations of the samples should be preserved.
compressed_human_1000.rda v2
Date: 3/2018
Size: 1.0 GB
compressed_mouse_1000.rda v2
Date: 3/2018
Size: 1.19 GB

Kallisto index files

Human

Mouse

Kallisto index files used for the alignment process. The index files where build using the Ensembl annotation version 87 for human and 88 for mouse and reference cDNA Homo_sapiens.GRCh38.cdna.all.fa.gz and Mus_musculus.GRCm38.cdna.all.fa.gz.
human_index.idx v1
Date: 6/2018
Size: 2.2 GB
mouse_index.idx v1
Date: 6/2018
Size: 1.8 GB

GEO expression

Human

Mouse

Gene counts from GEO GEO series with raw gene counts. The samples in the H5 file are also in the current version (v2.1) of ARCHS4. Due to some missing samples in GEO there are only 340713 samples processed by GEO that are overlapping with ARCHS4.
geo_human_v1.h5
Date: 4/2023
Size: 17 GB
Currently not available on GEO

recount2 expression

GTEx

TCGA

Gene counts from GTEx and TCGA from the recount2 project. The reads for these samples was aligned with a different pipeline resulting in significant differences to the ARCHS4 gene expression. Genes that did not overlap with the genes in the ARCHS4 data were removed.
gtex_matrix.h5 recount2
Date: 10/2017
Size: 589.5 MB
tcga_matrix.h5 recount2
Date: 10/2017
Size: 696.9 MB

GitHub repository





The scripts used to process the ARCHS4 data are located at the link below. The project is not easily adapted at the current state. We are working on making the software more accessible in the future.

https://github.com/MaayanLab/archs4



© Ma'ayan Lab.