This section contains all files created for the ARCHS4 website. The methods are described at
. For help in accessing the files refer to the Help section or contact us directly. The
database will be updated on a regular basis and old versions of the files will be accessible.
Expression (gene level)
Human
Mouse
Expression files for mouse and human in HDF5 format. All gene counts are on gene level (Entrez Gene
Symbol). For compression purposes the Kallisto pseudocounts are rounded to integer values.
Expression (transcript level)
Human
Mouse
Expression files for mouse and human in HDF5 format. All measurements are at the transcript level
(Ensembl ID). For compression purposes the Kallisto pseudocounts are rounded to integer values.
TPM (transcript level)
Human
Mouse
Expression files for mouse and human in HDF5 format. All measurements are at the transcript level
(Ensembl ID). The files are very large and values are not rounded.
Expression (Affymetrix arrays)
Human
Mouse
Expression files for human and mouse Affymetrix arrays. The collection contains 262,468 human
samples and 86,012 mouse samples. All measurements are at the probe level. Values are taken as
stored in GEO. For compression reasons values are stored as 16-bit floats.
t-SNE sample coordinates
Human
Mouse
Gene expression reduced to 3 dimensions. The files contain 4 columns with the first 3 containing
dimensions x, y, z and the last column containing the numeric part of the GSM id (GSM123456 ->
123456).
t-SNE gene coordinates
Human
Mouse
Gene expression reduced to 3 dimensions. The files contain 4 columns with the first 3 containing
dimensions x, y, z and the last column containing Entrez gene symbol.
Gene correlation
Human
Mouse
Pairwise pearson correlation of genes across expression samples.
Pairwise pearson correlation of genes across expression samples. File format is
feather and can be loaded
directly into memory in Python and R.
PrismExp predictions
Gene function prediction from PrismEXP using 300 correlation matrices in
feather and can be loaded
directly into memory in Python and R. Contains 155 files zipped.
JL transfomed expression
Human
Mouse
Gene expression compressed with the Johnson-Lindenstrauss transformation. The RDA files can be
loaded into a running R environment with the "load" command. The files create two variables, the
transform matrix used for the projection and the jl_expression matrix. The original dimensions
are reduced to 1000. The original distances and correlations of the samples should be preserved.
Kallisto index files
Human
Mouse
Kallisto index files used for the alignment process. The index files where build using the
Ensembl annotation version 87 for human and 88 for mouse and reference cDNA Homo_sapiens.GRCh38.cdna.all.fa.gz and
Mus_musculus.GRCm38.cdna.all.fa.gz.
recount2 expression
GTEx
TCGA
Gene counts from GTEx and TCGA from the
recount2 project. The reads for these samples was aligned with a
different pipeline resulting in significant differences to the ARCHS4 gene expression. Genes
that did not overlap with the genes in the ARCHS4 data were removed.
GitHub repository
The scripts used to process the ARCHS4 data are located at the link below. The project is not
easily adapted at the current state. We are working on making the software more accessible in
the future.
https://github.com/MaayanLab/archs4