ARCHS4: Massive Mining of Publicly Available RNA-seq Data from Human and Mouse



All RNA-seq and ChIP-seq sample and signature search (ARCHS4) (https://maayanlab.cloud/archs4/) is a resource that provides access to gene and transcript counts uniformly processed from all human and mouse RNA-seq experiments from the Gene Expression Omnibus (GEO) and the Sequence Read Archive (SRA). The ARCHS4 website provides the uniformly processed data for download and programmatic access in H5 format, and as a 3-dimensional interactive viewer and search engine. Users can search and browse the data by metadata enhanced annotations, and can submit their own gene sets for search. Subsets of selected samples can be downloaded as a tab delimited text file that is ready for loading into the R programming environment. To generate the ARCHS4 resource, the kallisto aligner is applied in an efficient parallelized cloud infrastructure. Human and mouse samples are aligned against GRCh38 and GRCm39 with Ensembl annotation (Ensembl 107).

Please acknowledge ARCHS4 in your publications by citing the following reference:
Lachmann A, Torre D, Keenan AB, Jagodnik KM, Lee HJ, Wang L, Silverstein MC, Ma’ayan A. Massive mining of publicly available RNA-seq data from human and mouse. Nature Communications 9. Article number: 1366 (2018), doi:10.1038/s41467-018-03751-6


If you would like to receive updates on the ARCHS4 data and stay informed about new data releases consider signing up for the newsletter.

The ARCHS4 now comes with an official Python package to facilitate extraction of data from the H5 files. It also supports some convenience functions such as normalization and meta data search. The software can be installed using pip. Visit the GitHub page for full documentation at the ARCHS4py GitHub page.

The ARCHS4 database now includes 35000 samples from additional species, such as C. elegans and Drosophila melanogaster. Expression data for genes and transcripts can be downloaded in H5 format from the ARCHS4 ZOO download section.


If you have your own RNA-seq data in FASTQ format and would like to process it with the ARCHS4 pipeline you can use our new web service Elysium. Elysium is using the ARCHS4 pipeline to align FASTQ files. It generates ARCHS4 compatible gene expression profiles ready for downstream analyses.


Platforms

Human

Mouse