About
Elysium is a free to use online RNA-seq alignment tool. It supports fast data upload and processing on a
scalable cloud infrastructure. This is the help page for the Elysium resource explaining the functionality
of this website. This is still an early version of the software and might encounter unforseen issues. Please
help us to improve the experience by giving us
.
Example workflow
In order to retrieve transcript quantification and gene counts three steps have to be performed. First the
user has to log into the system, upload data and then submit alignment jobs to the processing queue. The
alignment algorithm used is
Kallisto. Qualitative
alignment comparisons between Kallisto and STAR were performed for the
ARCHS4 publication.
Account creation and Login
The starting page will display a login field for email and password. If you do not have an
account with Elysium you can enter your email of choice and password. The account will
automatically be created and you will be redirected into the workspace view. All data and jobs
submitted will be accessible through those credentials. You can use the same email and password
to access results at a later time.
Creating user accounts and using this service are currently free of charge and we are hoping to
offer it for free in the future. Due to large data sizes we can't guarantee availability of data
and service indefinitly, please take appropriate action in backing up your files. We will not
share your data now or in the future.
Uploading files
Elysium is using cloud computing so the data has to go where the compute instances are. Data
will be uploaded to Amazon S3. The current supported data format is FASTQ.gz. If you are
using windows we recommend 7-zip.
The maximal file size currently supported for upload is 5GB.
Submitting alignment job
Elysium supports single and paired end read alignment. When files have been uploaded to the
workspace a single or two files can be selected from the list. Upon selection the submission
dialog will appear. Here the name of the job can be selected and the species genome against
which the alignment should be performed.
Once submitted the job will be sent to a queue and processed on a first come first serve basis.
Retrieving results
The processing takes about 15 minutes independent of the number of samples. After successful
completion of the alignment the result section will contain 3 links labled Transcript Counts,
Gene Counts and Alignment Info. The Transcript Counts are the raw Kallisto output. The Gene
Counts result contains counts per gene with all corresponding transcript counts summed up.
Alignment Info contains Kallisto
information about parameters and alignment statistics. The gene counts are compatible with the
ARCHS4 resource.
Job queue status
Elysium uses a scalable cloud computing backend. Once a job is submitted it will enter a waiting queue. All
jobs are processed on a first come first serve order. Most submitted alignment jobs should complete in about
10-15 minutes. The state of the job is displayed in the job queue view. A submitted job can have four
states:
Pending: job is waiting to be processed by available resource
Submitted: job left queue and is currently processing
Completed: job completed successfully
Failed: job timed out during submission process
Output format
After alignment completion Elysium will display 3 text files for download in the job list on the right of
the screen. The files will be accessible through the user credentials provided on job sumbission.
Transcript Counts
The Transcript Count file contains 5 columns. The first column contains the Ensembl transcript ID (e.g.
ENST00000448914.1) followed by transcript length and effective transcript length. The last two columns are
the transcript quantification and TPM. Kallistos transcript quantification does not have to be an integer
but can be treated as a read count.
Gene Counts
This file contains 2 columns. The first is official gene symbols and the second is the sum of all transcript
counts mapping to the corresponding gene.
Alignment Info
Kallisto generates an information file containing the number of reads in the input file as well as the
number of successfully aligned reads. Additionally it will estimate the fragment length in paired end read
jobs. For single file alignment the default will be a fragment length of 200bp.
Compare gene expression to ARCHS4
The Elysium Gene Counts files are compatiple with ARCHS4. Similar samples can be highlighted in the 3D t-SNE
plot. It does require some processing of the Gene Count file generated by Elysium. First download the
human_matrix.h5 or
mouse_matrix.h5 depending on the
species of interest.
The gene expression sample has to be placed into the context of gene expression in the ARCHS4 data. The R
script below will build two files containing gene symbols for genes with relative high expression and
relative low expression.
# load libraries, they will first have to be installed from bioconductor
library("rhdf5")
library("preprocessCore")
# set desired values here
matrix_file = "human_matrix.h5"
# file name of elysium gene count file
elysium_file = "elysium-genecounts.tsv"
# number of random samples the gene expression will be normalized against
numberRandomSamples = 2000
# number of genes in up and down gene files
genesetSize = 200
# Retrieve information from compressed data
samples = h5read(matrix_file, "meta/Sample_geo_accession")
genes = h5read(matrix_file, "meta/genes")
# select numberRandomSamples random samples from ARCHS4
sample_locations = sample(1:length(samples), numberRandomSamples)
# extract gene expression from compressed data
expression = h5read(matrix_file, "data/expression", index=list(1:length(genes), sample_locations))
H5close()
rownames(expression) = genes
colnames(expression) = samples[sample_locations]
# load your gene expression file from Elysium (Gene Counts)
genecounts = read.table(elysium_file, sep="\t", stringsAsFactors=F)
counts = round(genecounts[,2])
names(counts) = genecounts[,1]
# append your gene expression profile to the end as a column
exp = cbind(expression, counts[genes])
colnames(exp)[numberRandomSamples+1] = "mygenecounts"
# normalize for gene count differences
qexp = normalize.quantiles(exp)
dimnames(qexp) = dimnames(exp)
# scale the rows of the log2 transformed matrix with a z-score normalization
zexp = t(scale(t(log2(1+qexp))))
zexp[is.na(zexp)] = 0
# get genesetSize top and bottom z-score genes (smallest to largest z-score)
orderGenes = order(zexp[, numberRandomSamples+1])
bottomGenes = zexp[orderGenes[1:genesetSize], numberRandomSamples+1]
topGenes = zexp[rev(orderGenes)[1:genesetSize], numberRandomSamples+1]
# print output
writeLines(names(bottomGenes), con="down_genes.txt")
writeLines(names(topGenes), con="up_genes.txt")
Now navigate to the data visualization of ARCHS4 and select the Signature tab on the left. Make sure the
correct species is active. Copy the up and down gene lists into the corresponding fields and submit the
search. Similar samples should now be selected.
Pipeline
The pipeline of Elysium is utilizing two APIs for uploading data to S3 and scheduling alignment jobs. The
upload API Charon allows the generation of encrypted tokens that allow users to directly upload data to S3.
Elysium is the API that controls job submissions to the cloud backend and contains the Elysium interface.
For processing of the FASTQ files we deply Docker container to an Amazon compute cluster and scale the
resources depending on demand.
Human samples are aligned against the GRCh38 human reference genome, and mouse samples against the GRCm38
mouse reference genome. For transcript annotation Homo_sapiens.GRCh38.cdna.all.fa.gz and
Mus_musculus.GRCm38.cdna.all.fa.gz from Ensembl are used. The Kallisto index files can be accessed at the
ARCHS4 download page.
Terms of use
Source code is available on GitHub under the Apache Licence 2.0.
Commercial users should contact Mount Sinai Innovation Partners at
MSIPInfo@mssm.edu.
GitHub repositories
https://github.com/maayanlab/charon
https://github.com/maayanlab/elysium
Citation
The Elysium API is currently deposited in the preprint repository bioRxiv.
https://www.biorxiv.org/content/early/2018/08/02/382937
You can consider citing ARCHS4 as Elysium shares the alignment backend with this publication.
Please acknowledge ARCHS4 in your publications by citing the following reference:
Lachmann A, Torre D, Keenan AB, Jagodnik KM, Lee HJ, Wang L, Silverstein MC, Ma’ayan A. Massive mining of
publicly available RNA-seq data from human and mouse. Nature Communications 9. Article number: 1366 (2018),
doi:10.1038/s41467-018-03751-6
https://www.nature.com/articles/s41467-018-03751-6
Disclaimer
Elysium is not to be used for treating or diagnosing human subjects. Elysium or any documents available from
this server are provided as is without any warranty of any kind, either express, implied, or statutory,
including, but not limited to, any implied warranties of merchantability, fitness for particular purpose and
freedom from infringement, or that Elysium or any documents available from this server will be error free.
The Ma'ayan lab makes no representations that the use of Elysium or any documents available from this server
will not infringe any patent or proprietary rights of third parties. In no event will the Ma'ayan lab or any
of its members be liable for any damages, including but not limited to direct, indirect, special or
consequential damages, arising out of, resulting from, or in any way connected with the use of Elysium or
documents available from this server.