Friday, 6th of September (2019)
The nCounter® platform provides a simple and cost effective solution for multiplex analysis of up to 800 RNA, DNA, or protein targets from your precious samples.
nSolver 4.0 ressources:
NACHO (NAnostring quality Control dasHbOard) is developed for NanoString nCounter data.
NACHO is able to load, visualise and normalise the exported NanoString nCounter data and facilitates the user in performing a quality control.
NACHO does this by visualising in an interactive web application:
RCC files are summarised and visualised using two functions:
summarise()
function is used to preprocess the data.visualise()
function initiates a Shiny-based dashboard that visualises all relevant QC plots.NACHO also includes a function normalise()
, which (re)calculates sample specific size factors and normalises the data.
normalise()
function creates a list in which your settings, the raw counts and normalised counts are stored.In this example we use a mRNA dataset from the study of Bruce et al. (2015) with the GEO accession number: GSE70970
library(GEOquery) gse <- getGEO("GSE70970") targets <- pData(phenoData(gse[[1]])) getGEOSuppFiles(GEO = "GSE70970", baseDir = ".") untar( tarfile = "./GSE70970/GSE70970_RAW.tar", exdir = "./GSE70970/Data" ) targets$IDFILE <- list.files("./GSE70970/Data")
summarise()
The housekeeping_genes
and normalisation_method
arguments respectively indicate which housekeeping genes and normalisation method should be used.
library(NACHO) GSE70970_sum <- summarise( data_directory = "./GSE70970/Data", ssheet_csv = targets, id_colname = "IDFILE", housekeeping_genes = NULL, housekeeping_predict = TRUE, normalisation_method = "GEO", n_comp = 5 )
summarise()
GSE70970_sum
#> List of 13 #> $ access : chr "IDFILE" #> $ housekeeping_genes : chr [1:5] "hsa-miR-103" "hsa-let-7e" "hsa-miR-1260" "hsa-miR-500+hsa-miR-501-5p" ... #> $ housekeeping_predict: logi TRUE #> $ housekeeping_norm : logi TRUE #> $ normalisation_method: chr "GEO" #> $ remove_outliers : logi FALSE #> $ n_comp : num 5 #> $ data_directory : chr "/home/travis/build/mcanouil/NACHO_slides/GSE70970/Data" #> $ pc_sum :'data.frame': 5 obs. of 4 variables: #> $ nacho :'data.frame': 198170 obs. of 112 variables: #> $ outliers_thresholds :List of 6 #> $ raw_counts :'data.frame': 792 obs. of 265 variables: #> $ normalised_counts :'data.frame': 792 obs. of 265 variables: #> - attr(*, "RCC_type")= chr "n1" #> - attr(*, "class")= chr "nacho"
normalise()
NACHO allows the discovery of housekeeping genes within your own dataset.
NACHO finds the five best suitable housekeeping genes, however, it is possible that one of these five genes might not be suitable.
The discovered housekeeping genes are saved in a global variable named predicted_housekeeping.
GSE70970_sum[["housekeeping_genes"]]
#> [1] "hsa-miR-103" "hsa-let-7e" #> [3] "hsa-miR-1260" "hsa-miR-500+hsa-miR-501-5p" #> [5] "hsa-miR-1274b"
normalise()
Let’s say "GEO"
is not the best normalisation for our dataset and we want to use "GLM"
instead.
GSE70970_norm <- normalise( nacho_object = GSE70970_sum, normalisation_method = "GLM", remove_outliers = TRUE )
normalise()
GSE70970_norm
#> List of 13 #> $ access : chr "IDFILE" #> $ housekeeping_genes : chr [1:5] "hsa-let-7e" "hsa-miR-1260" "hsa-miR-1274b" "hsa-miR-103" ... #> $ housekeeping_predict: logi TRUE #> $ housekeeping_norm : logi TRUE #> $ normalisation_method: chr "GLM" #> $ remove_outliers : logi TRUE #> $ n_comp : num 5 #> $ data_directory : chr "/home/travis/build/mcanouil/NACHO_slides/GSE70970/Data" #> $ pc_sum :'data.frame': 5 obs. of 4 variables: #> $ nacho :'data.frame': 115271 obs. of 112 variables: #> $ raw_counts :'data.frame': 792 obs. of 155 variables: #> $ normalised_counts :'data.frame': 792 obs. of 155 variables: #> $ outliers_thresholds :List of 6 #> - attr(*, "RCC_type")= chr "n1" #> - attr(*, "class")= chr "nacho"
visualise()
)visualise(GSE70970_sum)
#> [NACHO] Custom "outliers_thresholds" can be loaded for later use with: #> outliers_thresholds <- readRDS("outliers_thresholds.rds")
NACHO includes two (three) additional functions:
The render()
function renders a full quality-control report (HTML) based on the results of a call to summarise()
or normalise()
(using print()
in a Rmarkdown chunk).
The autoplot()
function draws any quality-control metrics from visualise()
and render()
.
autoplot()
functionThe autoplot()
function provides an easy way to plot any quality-control from the visualise()
function.
"BD"
(Binding Density)"FoV"
(Imaging)"PC"
(Positive Control Linearity)"LoD"
(Limit of Detection)"Positive"
(Positive Controls)"Negative"
(Negative Controls)"Housekeeping"
(Housekeeping Genes)"PN"
(Positive Controls vs. Negative Controls)"ACBD"
(Average Counts vs. Binding Density)"ACMC"
(Average Counts vs. Median Counts)"PCA12"
(Principal Component 1 vs. 2)"PCAi"
(Principal Component scree plot)"PCA"
(Principal Components planes)"PFNF"
(Positive Factor vs. Negative Factor)"HF"
(Housekeeping Factor)"NORM"
(Normalisation Factor)autoplot()
functionautoplot(GSE70970_sum, x = "BD")
autoplot()
functionautoplot(GSE70970_sum, x = "Positive")
autoplot()
functionautoplot(GSE70970_sum, x = "Housekeeping")
autoplot()
functionautoplot(GSE70970_sum, x = "NORM")
render()
functionThe render()
function renders (using print(..., echo = TRUE)
a comprehensive HTML report which includes all quality-control metrics and description of those metrics.
render( nacho_object = GSE70970_sum, colour = "CartridgeID", output_file = "NACHO_QC.html", output_dir = "./GSE70970/", size = 0.5, show_legend = TRUE, clean = TRUE )
print()
functionThe underneath function print()
can be used directly within any Rmakrdown chunk, setting the parameter echo = TRUE
.
print( x = GSE70970_sum, colour = "CartridgeID", size = 0.5, show_legend = TRUE, echo = TRUE, title_level = 3 )
Code rewrite and optimisation
Normalisation method template
CARoT (Centralised and Automated Reporting Tools) is an under development set of Quality-Control reporting tools and some other functions.
Currently CARoT includes the following main functions:
estimate_ethnicity()
Compute the genomic component (ethnicity)pca_report()
Compute a principal component analysisqc_idats()
QC of methylation arrayqc_plink()
QC of genotyping arrayqc_impute()
QC of imputated genotyping arrayBruce, J. P., Hui, A. B. Y., Shi, W., Perez-Ordonez, B., Weinreb, I., Xu, W., … Liu, F.-F. (2015). Identification of a microRNA signature associated with risk of distant metastasis in nasopharyngeal carcinoma. Oncotarget, 6(6), 4537–4550. https://doi.org/10.18632/oncotarget.3005