Format VCF file(s) by filtering out all variants not satisfaying "–min-alleles 2 –max-alleles 2 –types snps" and setting IDs (if no annotation file using VEP is provided) with "%CHROM:%POS:%REF:%ALT" (see https://samtools.github.io/bcftools/). GWAS is performed on the formatted VCF file(s) by PLINK2 software (https://www.cog-genomics.org/plink/2.0).
Usage
run_eggla_gwas(
data,
results,
id_column,
traits = c("slope_.*", "auc_.*", "^AP_.*", "^AR_.*"),
covariates,
vcfs,
working_directory,
vep_file = NULL,
use_info = TRUE,
bin_path = list(bcftools = "/usr/bin/bcftools", plink2 = "/usr/bin/plink2"),
bcftools_view_options = NULL,
build = "38",
strand = "+",
info_type = "IMPUTE2 info score via 'bcftools +impute-info'",
threads = 1,
quiet = FALSE,
clean = TRUE
)
Arguments
- data
Path to the phenotypes stored as a CSV file.
- results
Paths to the zip archives or directories generated by
run_eggla_lmm()
(vector of length two, one male and one female path).- id_column
Name of the column where sample/individual IDs are stored.
- traits
One or multiple traits, i.e., columns' names from
data
, to be analysed separately.- covariates
One or several covariates, i.e., columns' names from
data
, to be used. Binary trait should be coded as '1' and '2', where sex must be coded: '1' = male, '2' = female, 'NA'/'0' = missing.- vcfs
Path to the "raw" VCF file(s) containing the genotypes of the individuals to be analysed.
- working_directory
Directory in which computation will occur and where output files will be saved.
- vep_file
Path to the VEP annotation file to be used to set variants RSIDs and add gene SYMBOL, etc.
- use_info
A logical indicating whether to extract all informations stored in the "INFO" field.
- bin_path
A named list containing the path to the PLINK2 and BCFtools binaries For PLINK2, an URL to the binary can be provided (see https://www.cog-genomics.org/plink/2.0).
- bcftools_view_options
A string or a vector of strings (which will be pass to
paste()
) containing BCFtools view parameters, e.g.,"--min-af 0.05"
,"--exclude 'INFO/INFO < 0.8'"
, and/or"--min-alleles 2 --max-alleles 2 --types snps"
.- build
Build of the genome on which the SNP is orientated. Default is "38".
- strand
Orientation of the site to the human genome strand used. Should be "+" (default).
- info_type
Type of information provided in the INFO column, e.g., "IMPUTE2 info score via 'bcftools +impute-info'",
- threads
Number of threads to be used by some BCFtools and PLINK2 commands.
- quiet
A logical indicating whether to suppress the output.
- clean
A logical indicating whether to clean intermediary files or not.
Examples
if (interactive()) {
data("bmigrowth")
bmigrowth_csv <- file.path(tempdir(), "bmigrowth.csv")
fwrite(
x = bmigrowth,
file = bmigrowth_csv
)
results_archives <- run_eggla_lmm(
data = fread(
file = file.path(tempdir(), "bmigrowth.csv"),
colClasses = list(character = "ID")
),
id_variable = "ID",
age_days_variable = NULL,
age_years_variable = "age",
weight_kilograms_variable = "weight",
height_centimetres_variable = "height",
sex_variable = "sex",
covariates = NULL,
male_coded_zero = FALSE,
random_complexity = 1,
parallel = FALSE,
parallel_n_chunks = 1,
working_directory = tempdir()
)
run_eggla_gwas(
data = fread(
file = file.path(tempdir(), "bmigrowth.csv"),
colClasses = list(character = "ID")
),
results = results_archives,
id_column = "ID",
traits = c("slope_.*", "auc_.*", "^AP_.*", "^AR_.*"),
covariates = c("sex"),
vcfs = list.files(
path = system.file("vcf", package = "eggla"),
pattern = "\\.vcf$|\\.vcf.gz$",
full.names = TRUE
),
working_directory = tempdir(),
vep_file = NULL,
bin_path = list(
bcftools = "/usr/bin/bcftools",
plink2 = "/usr/bin/plink2"
),
threads = 1
)
}