This function is used to preprocess the data from NanoString nCounter.
Usage
load_rcc(
data_directory,
ssheet_csv,
id_colname = NULL,
housekeeping_genes = NULL,
housekeeping_predict = FALSE,
housekeeping_norm = TRUE,
normalisation_method = "GEO",
n_comp = 10
)
Arguments
- data_directory
[character] A character string of the directory where the data are stored.
- ssheet_csv
[character] or [data.frame] Either a string with the name of the CSV of the samplesheet or the samplesheet as a
data.frame
. Should contain a column that matches the file names in the folder.- id_colname
[character] Character string of the column in
ssheet_csv
that matches the file names indata_directory
.- housekeeping_genes
[character] A vector of names of the miRNAs/mRNAs that should be used as housekeeping genes. Default is
NULL
.- housekeeping_predict
[logical] Boolean to indicate whether the housekeeping genes should be predicted (
TRUE
) or not (FALSE
). Default isFALSE
.- housekeeping_norm
[logical] Boolean to indicate whether the housekeeping normalisation should be performed. Default is
TRUE
.- normalisation_method
[character] Either
"GEO"
or"GLM"
. Character string to indicate normalisation using the geometric mean ("GEO"
) or a generalized linear model ("GLM"
). Default is"GEO"
.- n_comp
[numeric] Number indicating the number of principal components to compute. Cannot be more than n-1 samples. Default is
10
.
Value
[list] A list object of class "nacho"
:
access
[character] Value passed to
load_rcc()
inid_colname
.housekeeping_genes
[character] Value passed to
load_rcc()
.housekeeping_predict
[logical] Value passed to
load_rcc()
.housekeeping_norm
[logical] Value passed to
load_rcc()
.normalisation_method
[character] Value passed to
load_rcc()
.remove_outliers
[logical]
FALSE
.n_comp
[numeric] Value passed to
load_rcc()
.data_directory
[character] Value passed to
load_rcc()
.pc_sum
[data.frame] A
data.frame
withn_comp
rows and four columns: "Standard deviation", "Proportion of Variance", "Cumulative Proportion" and "PC".nacho
[data.frame] A
data.frame
with all columns from the sample sheetssheet_csv
and all computed columns, i.e., quality-control metrics and counts, with one sample per row.outliers_thresholds
[list] A
list
of the (default) quality-control thresholds used.
Examples
if (interactive()) {
library(GEOquery)
library(NACHO)
# Import data from GEO
gse <- GEOquery::getGEO(GEO = "GSE74821")
targets <- Biobase::pData(Biobase::phenoData(gse[[1]]))
GEOquery::getGEOSuppFiles(GEO = "GSE74821", baseDir = tempdir())
utils::untar(
tarfile = file.path(tempdir(), "GSE74821", "GSE74821_RAW.tar"),
exdir = file.path(tempdir(), "GSE74821")
)
targets$IDFILE <- list.files(
path = file.path(tempdir(), "GSE74821"),
pattern = ".RCC.gz$"
)
targets[] <- lapply(X = targets, FUN = iconv, from = "latin1", to = "ASCII")
utils::write.csv(
x = targets,
file = file.path(tempdir(), "GSE74821", "Samplesheet.csv")
)
# Read RCC files and format
nacho <- load_rcc(
data_directory = file.path(tempdir(), "GSE74821"),
ssheet_csv = file.path(tempdir(), "GSE74821", "Samplesheet.csv"),
id_colname = "IDFILE"
)
}