This function creates a list in which your settings, the raw counts and normalised counts are stored,
using the result from a call to load_rcc()
.
Usage
normalise(
nacho_object,
housekeeping_genes = nacho_object[["housekeeping_genes"]],
housekeeping_predict = nacho_object[["housekeeping_predict"]],
housekeeping_norm = nacho_object[["housekeeping_norm"]],
normalisation_method = nacho_object[["normalisation_method"]],
n_comp = nacho_object[["n_comp"]],
remove_outliers = nacho_object[["remove_outliers"]],
outliers_thresholds = nacho_object[["outliers_thresholds"]]
)
Arguments
- nacho_object
[list] A list object of class
"nacho"
obtained fromload_rcc()
ornormalise()
.- housekeeping_genes
[character] A vector of names of the miRNAs/mRNAs that should be used as housekeeping genes. Default is
NULL
.- housekeeping_predict
[logical] Boolean to indicate whether the housekeeping genes should be predicted (
TRUE
) or not (FALSE
). Default isFALSE
.- housekeeping_norm
[logical] Boolean to indicate whether the housekeeping normalisation should be performed. Default is
TRUE
.- normalisation_method
[character] Either
"GEO"
or"GLM"
. Character string to indicate normalisation using the geometric mean ("GEO"
) or a generalized linear model ("GLM"
). Default is"GEO"
.- n_comp
[numeric] Number indicating the number of principal components to compute. Cannot be more than n-1 samples. Default is
10
.- remove_outliers
[logical] A boolean to indicate if outliers should be excluded.
- outliers_thresholds
[list] List of thresholds to exclude outliers.
Value
[list] A list containing parameters and data.
access
[character] Value passed to
load_rcc()
inid_colname
.housekeeping_genes
[character] Value passed to
load_rcc()
ornormalise()
.housekeeping_predict
[logical] Value passed to
load_rcc()
.housekeeping_norm
[logical] Value passed to
load_rcc()
ornormalise()
.normalisation_method
[character] Value passed to
load_rcc()
ornormalise()
.remove_outliers
[logical] Value passed to
normalise()
.n_comp
[numeric] Value passed to
load_rcc()
.data_directory
[character] Value passed to
load_rcc()
.pc_sum
[data.frame] A
data.frame
withn_comp
rows and four columns: "Standard deviation", "Proportion of Variance", "Cumulative Proportion" and "PC".nacho
[data.frame] A
data.frame
with all columns from the sample sheetssheet_csv
and all computed columns, i.e., quality-control metrics and counts, with one sample per row.outliers_thresholds
[list] A
list
of the quality-control thresholds used.raw_counts
[data.frame] Raw counts with probes as rows and samples as columns. With
"CodeClass"
(first column), the type of the probes and"Name"
(second column), the Name of the probes.normalised_counts
[data.frame] Normalised counts with probes as rows and samples as columns. With
"CodeClass"
(first column)), the type of the probes and"Name"
(second column), the name of the probes.
Details
Outliers definition (remove_outliers = TRUE
):
Binding Density (
BD
) < 0.1Binding Density (
BD
) > 2.25Field of View (
FoV
) < 75Positive Control Linearity (
PCL
) < 0.95Limit of Detection (
LoD
) < 2Positive normalisation factor (
Positive_factor
) < 0.25Positive normalisation factor (
Positive_factor
) > 4Housekeeping normalisation factor (
house_factor
) < 1/11Housekeeping normalisation factor (
house_factor
) > 11
Examples
data(GSE74821)
GSE74821_norm <- normalise(
nacho_object = GSE74821,
housekeeping_norm = TRUE,
normalisation_method = "GEO",
remove_outliers = TRUE
)
#> [NACHO] Normalising "GSE74821" with new value for parameters:
#> - normalisation_method = TRUE
#> - remove_outliers = TRUE
#> [NACHO] Computing normalisation factors using "GEO" method.
#> [NACHO] Returning a list.
#> $ access : character
#> $ housekeeping_genes : character
#> $ housekeeping_predict: logical
#> $ housekeeping_norm : logical
#> $ normalisation_method: character
#> $ remove_outliers : logical
#> $ n_comp : numeric
#> $ data_directory : character
#> $ pc_sum : data.frame
#> $ nacho : data.frame
#> $ outliers_thresholds : list
if (interactive()) {
library(GEOquery)
library(NACHO)
# Import data from GEO
gse <- GEOquery::getGEO(GEO = "GSE74821")
targets <- Biobase::pData(Biobase::phenoData(gse[[1]]))
GEOquery::getGEOSuppFiles(GEO = "GSE74821", baseDir = tempdir())
utils::untar(
tarfile = file.path(tempdir(), "GSE74821", "GSE74821_RAW.tar"),
exdir = file.path(tempdir(), "GSE74821")
)
targets$IDFILE <- list.files(
path = file.path(tempdir(), "GSE74821"),
pattern = ".RCC.gz$"
)
targets[] <- lapply(X = targets, FUN = iconv, from = "latin1", to = "ASCII")
utils::write.csv(
x = targets,
file = file.path(tempdir(), "GSE74821", "Samplesheet.csv")
)
# Read RCC files and format
nacho <- load_rcc(
data_directory = file.path(tempdir(), "GSE74821"),
ssheet_csv = file.path(tempdir(), "GSE74821", "Samplesheet.csv"),
id_colname = "IDFILE"
)
# (re)Normalise data by removing outliers
nacho_norm <- normalise(
nacho_object = nacho,
remove_outliers = TRUE
)
# (re)Normalise data with "GLM" method and removing outliers
nacho_norm <- normalise(
nacho_object = nacho,
normalisation_method = "GLM",
remove_outliers = TRUE
)
}