MNN2017
This repository contains the code for the paper Correcting batch effects in single-cell RNA sequencing data by matching mutual nearest neighbours by Haghverdi et al. (2018).
Note: Further updates and development of the analysis and simulation code will take place at https://github.com/MarioniLab/FurtherMNN2018. If you have general questions regarding the code (i.e., not specifically involving the manuscript), please post your issues at the above repository instead.
To generate the simulation figure in the main text (uneven
composition of cell types) and supplement (identical composition), enter
the Simulations
directory.
First run the source file simulateBatches.R
, then run the source file plotCorrections.R
.
To generate the haematopoietic data figures in the main text, enter the Haematopoiesis
directory.
First run the source file prepareData.R
, then the plotCorrections.R
script.
For the pancreas figures:
Download gene expression data from the four public data sets
(Gene-by-Cell matrices, meta data and highly variable gene lists) by
running the bash script DownloadData.sh
.
This will download a zipped file containing the raw count matrices for GSE86473.
The remaining data sets are downloaded directly in the data processsing
scripts for the appropriate studies (denoted by GEO/ArrayExpress
accession number from the manuscript).
To run the data processing and normalization, move to this to the pancreas
directory and execute the script normalizePancreas.R
.
To calculate the highly variable genes, execute (or source) the script findHighlyVariableGenes.R
To assign cell type labels according to the approaches described for each study, run the script assignCellTypeLabels.R
.
To generate any of the pancreas data sets results, you have to run the source file PancreasProcessingCorrection.R
in the Pancreas folder first.
You will need to create a directory called 'results', into which all figures and batch corrected data will be saved.
To correct the batch effects and generate t-SNE plots and the
Silhouette boxplots for the pancreas data sets (Fig 4 and Supplementary
Figure 5), run the source file PancreasCorrectionComparison.R
in the Pancreas folder.
This will also generate the pancreas PCA plots and the entropy of mixings boxplots in the supplement (Suppl. Fig.5).
To compare performance of MNN with locally variable batch effects
versus a global batch effect settings (Suppl.Fig. 6), run the source
file local_global_batchvect.R
in the Pancrease folder.
Differential expression testing figures can be generated by running the R markdown document, PancreasDE_analysis.Rmd
, contained in the PancreasDE
directory.
The static version of the R notebook is also available as a html document that can be opened with any internet browser.
The scripts to download and normalize the 10X droplet data can be found in Droplet/
, specifically pbmc_normalisation.R
for the 68,000 PBMCs and tcell_4K_normalisation.R
for the 4,000 T cells.
Please note that trying to normalise 68,000 cells on your local machine
will require a lot of resources (memory and CPU), it is recommended that
the scripts in the Droplet/
are executed on an appropriate
high performance computing cluster.
The scripts to perform tSNE and cluster assignment using community
detection on the uncorrected data can be performed by running uncorrected_68k_tSNE.R
, assign_cell_types_68kPBMC.R
.
To perform the equivalent tasks to generate the panels of Figure 5, run combine_10X.R
, pbmc68k_tSNE.R
, PBMC_68k_plotting.R
, assign_cell_types_68kPBMC_corrected.R
and Corrected_PBMC_68K_assignCellLabels.R
.
上一篇:MNN-APPLICATIONS
下一篇:MNN-yolov3
还没有评论,说两句吧!
热门资源
seetafaceJNI
项目介绍 基于中科院seetaface2进行封装的JAVA...
Keras-ResNeXt
Keras ResNeXt Implementation of ResNeXt models...
spark-corenlp
This package wraps Stanford CoreNLP annotators ...
shih-styletransfer
shih-styletransfer Code from Style Transfer ...
inferno-boilerplate
This is a very basic boilerplate example for pe...
智能在线
400-630-6780
聆听.建议反馈
E-mail: support@tusaishared.com