L1000 CD Signatures
Gene expression signatures calculated from the LINCS L1000 dataset using the characteristic direction method
1. Install MongoDB.
2. Download the two files listed under "Signatures" and place them into the MongoDB database folder.
3. Start mongod.
4. Open a mongo shell and switch to LINCS_L1000_CD db. The cpc2014 collection in the db stores all the processed LINCS L1000 data.
5. Optionally it is suggested to download and install Robomongo which provides a nice GUI to browse the data.
6. Refer to the MongoDB documentation for query specifications and drivers for different languages.
rid.json is an array of probe IDs mapping to the order of the genes in the processed data.
apiRowMeta.json is the meta-data information for each probe ID downloaded from the Broad API and can be used to convert probe IDs to gene symbols and determine if a probe ID is a landmark gene.
MongoDB files are listed here:
1.3 million gene expression profiles generated by the Broad Institute's LINCS Project team. The matrices are stored in a binary format called GCTX that provides efficient random access subsets of the data. Matlab and Python and R routines to parse GCTX files are available here.
- q2norm_n1328098x22268.gctx (110.28 GB): Gene expression profiles of both directly measured landmark transcripts plus imputed genes. Normalized using invariant set scaling followed by quantile normalization.
- modzs_n476251x22268.gctx (39.56 GB): Moderated z-scores, where replicate signatures are condensed using a weighted averaging procedure.