Elysium: cloud alignment

Elysium is a light-weight sequence alignment server that enables free alignment of FASTQ files against the human or mouse genomes. The pipeline is compatible with the ARCHS4 transcript quantification pipeline implementing the Kallisto aligner. Please note that supported file uploads can be in FASTQ or FASTQ.gz format, and maximum file size is capped at 5Gb per file.
Upload FASTQ to the cloud
Schedule alignment
Retrieve results

Please note that your uploaded FASTQ.gz files may be deleted after 1 week. The alignment results will stay available online indefinitely. The alignment job should take about 10 minutes to complete. If this is the first time that you use Elysium, please visit the help section. To begin, please set up an account using your e-mail and a password. These credentials will be used to create your private account. Your uploaded data will not be accessible by others.


A publication that describes the Elysium project is currently available as a preprint in bioRxiv.
Alexander Lachmann, Zhuorui Xie, Avi Ma'ayan. Elysium: RNA-seq Alignment in the Cloud. bioRxiv 382937; doi: https://doi.org/10.1101/382937 https://www.biorxiv.org/content/early/2018/08/02/382937



Elysium is a new application under development so please send us so we can improve it.



© Ma'ayan Lab.
?
X
My name is Pipetti and I am here to help you with your RNA-seq alignment needs!

About

Elysium is a free to use online RNA-seq alignment tool. It supports fast data upload and processing on a scalable cloud infrastructure. This is the help page for the Elysium resource explaining the functionality of this website. This is still an early version of the software and might encounter unforseen issues. Please help us to improve the experience by giving us .

Example workflow

In order to retrieve transcript quantification and gene counts three steps have to be performed. First the user has to log into the system, upload data and then submit alignment jobs to the processing queue. The alignment algorithm used is Kallisto. Qualitative alignment comparisons between Kallisto and STAR were performed for the ARCHS4 publication.

Account creation and Login

The starting page will display a login field for email and password. If you do not have an account with Elysium you can enter your email of choice and password. The account will automatically be created and you will be redirected into the workspace view. All data and jobs submitted will be accessible through those credentials. You can use the same email and password to access results at a later time.

Creating user accounts and using this service are currently free of charge and we are hoping to offer it for free in the future. Due to large data sizes we can't guarantee availability of data and service indefinitly, please take appropriate action in backing up your files. We will not share your data now or in the future.

Uploading files

Elysium is using cloud computing so the data has to go where the compute instances are. Data will be uploaded to Amazon S3. The current supported data format is FASTQ.gz. If you are using windows we recommend 7-zip.

The maximal file size currently supported for upload is 5GB.

Submitting alignment job

Elysium supports single and paired end read alignment. When files have been uploaded to the workspace a single or two files can be selected from the list. Upon selection the submission dialog will appear. Here the name of the job can be selected and the species genome against which the alignment should be performed.

Once submitted the job will be sent to a queue and processed on a first come first serve basis.

Retrieving results

The processing takes about 15 minutes independent of the number of samples. After successful completion of the alignment the result section will contain 3 links labled Transcript Counts, Gene Counts and Alignment Info. The Transcript Counts are the raw Kallisto output. The Gene Counts result contains counts per gene with all corresponding transcript counts summed up. Alignment Info contains Kallisto information about parameters and alignment statistics. The gene counts are compatible with the ARCHS4 resource.


Job queue status

Elysium uses a scalable cloud computing backend. Once a job is submitted it will enter a waiting queue. All jobs are processed on a first come first serve order. Most submitted alignment jobs should complete in about 10-15 minutes. The state of the job is displayed in the job queue view. A submitted job can have four states:
Pending: job is waiting to be processed by available resource
Submitted: job left queue and is currently processing
Completed: job completed successfully
Failed: job timed out during submission process

Output format

After alignment completion Elysium will display 3 text files for download in the job list on the right of the screen. The files will be accessible through the user credentials provided on job sumbission.



Transcript Counts
The Transcript Count file contains 5 columns. The first column contains the Ensembl transcript ID (e.g. ENST00000448914.1) followed by transcript length and effective transcript length. The last two columns are the transcript quantification and TPM. Kallistos transcript quantification does not have to be an integer but can be treated as a read count.

Gene Counts
This file contains 2 columns. The first is official gene symbols and the second is the sum of all transcript counts mapping to the corresponding gene.

Alignment Info
Kallisto generates an information file containing the number of reads in the input file as well as the number of successfully aligned reads. Additionally it will estimate the fragment length in paired end read jobs. For single file alignment the default will be a fragment length of 200bp.


Compare gene expression to ARCHS4

The Elysium Gene Counts files are compatiple with ARCHS4. Similar samples can be highlighted in the 3D t-SNE plot. It does require some processing of the Gene Count file generated by Elysium. First download the human_matrix.h5 or mouse_matrix.h5 depending on the species of interest.

The gene expression sample has to be placed into the context of gene expression in the ARCHS4 data. The R script below will build two files containing gene symbols for genes with relative high expression and relative low expression.

# load libraries, they will first have to be installed from bioconductor
library("rhdf5")
library("preprocessCore")

# set desired values here
matrix_file = "human_matrix.h5"

# file name of elysium gene count file
elysium_file = "elysium-genecounts.tsv"

# number of random samples the gene expression will be normalized against
numberRandomSamples = 2000

# number of genes in up and down gene files
genesetSize = 200

# Retrieve information from compressed data
samples = h5read(matrix_file, "meta/Sample_geo_accession")
genes = h5read(matrix_file, "meta/genes")

# select numberRandomSamples random samples from ARCHS4
sample_locations = sample(1:length(samples), numberRandomSamples)

# extract gene expression from compressed data
expression = h5read(matrix_file, "data/expression", index=list(1:length(genes), sample_locations))
H5close()
rownames(expression) = genes
colnames(expression) = samples[sample_locations]

# load your gene expression file from Elysium (Gene Counts)
genecounts = read.table(elysium_file, sep="\t", stringsAsFactors=F)
counts = round(genecounts[,2])
names(counts) = genecounts[,1]

# append your gene expression profile to the end as a column
exp = cbind(expression, counts[genes])
colnames(exp)[numberRandomSamples+1] = "mygenecounts"

# normalize for gene count differences
qexp = normalize.quantiles(exp)
dimnames(qexp) = dimnames(exp)

# scale the rows of the log2 transformed matrix with a z-score normalization
zexp = t(scale(t(log2(1+qexp))))
zexp[is.na(zexp)] = 0

# get genesetSize top and bottom z-score genes (smallest to largest z-score)
orderGenes = order(zexp[, numberRandomSamples+1])
bottomGenes = zexp[orderGenes[1:genesetSize], numberRandomSamples+1]
topGenes = zexp[rev(orderGenes)[1:genesetSize], numberRandomSamples+1]

# print output
writeLines(names(bottomGenes), con="down_genes.txt")
writeLines(names(topGenes), con="up_genes.txt")

Now navigate to the data visualization of ARCHS4 and select the Signature tab on the left. Make sure the correct species is active. Copy the up and down gene lists into the corresponding fields and submit the search. Similar samples should now be selected.

Pipeline

The pipeline of Elysium is utilizing two APIs for uploading data to S3 and scheduling alignment jobs. The upload API Charon allows the generation of encrypted tokens that allow users to directly upload data to S3. Elysium is the API that controls job submissions to the cloud backend and contains the Elysium interface. For processing of the FASTQ files we deply Docker container to an Amazon compute cluster and scale the resources depending on demand.

Human samples are aligned against the GRCh38 human reference genome, and mouse samples against the GRCm38 mouse reference genome. For transcript annotation Homo_sapiens.GRCh38.cdna.all.fa.gz and Mus_musculus.GRCm38.cdna.all.fa.gz from Ensembl are used. The Kallisto index files can be accessed at the ARCHS4 download page.

Terms of use

Source code is available on GitHub under the Apache Licence 2.0. Commercial users should contact Mount Sinai Innovation Partners at MSIPInfo@mssm.edu.

GitHub repositories

https://github.com/maayanlab/charon
https://github.com/maayanlab/elysium

Citation

The Elysium API is currently deposited in the preprint repository bioRxiv.
https://www.biorxiv.org/content/early/2018/08/02/382937

You can consider citing ARCHS4 as Elysium shares the alignment backend with this publication.
Please acknowledge ARCHS4 in your publications by citing the following reference: Lachmann A, Torre D, Keenan AB, Jagodnik KM, Lee HJ, Wang L, Silverstein MC, Ma’ayan A. Massive mining of publicly available RNA-seq data from human and mouse. Nature Communications 9. Article number: 1366 (2018), doi:10.1038/s41467-018-03751-6
https://www.nature.com/articles/s41467-018-03751-6

Disclaimer

Elysium is not to be used for treating or diagnosing human subjects. Elysium or any documents available from this server are provided as is without any warranty of any kind, either express, implied, or statutory, including, but not limited to, any implied warranties of merchantability, fitness for particular purpose and freedom from infringement, or that Elysium or any documents available from this server will be error free. The Ma'ayan lab makes no representations that the use of Elysium or any documents available from this server will not infringe any patent or proprietary rights of third parties. In no event will the Ma'ayan lab or any of its members be liable for any damages, including but not limited to direct, indirect, special or consequential damages, arising out of, resulting from, or in any way connected with the use of Elysium or documents available from this server.