Perform GWAS using PLINK2 (and BCFtools) — run_eggla

Format VCF file(s) by filtering out all variants not satisfaying "–min-alleles 2 –max-alleles 2 –types snps" and setting IDs (if no annotation file using VEP is provided) with "%CHROM:%POS:%REF:%ALT" (see https://samtools.github.io/bcftools/). GWAS is performed on the formatted VCF file(s) by PLINK2 software (https://www.cog-genomics.org/plink/2.0).

Usage

run_eggla_gwas(
  data,
  results,
  id_column,
  traits = c("slope_.*", "auc_.*", "^AP_.*", "^AR_.*"),
  covariates,
  vcfs,
  working_directory,
  vep_file = NULL,
  use_info = TRUE,
  bin_path = list(bcftools = "/usr/bin/bcftools", plink2 = "/usr/bin/plink2"),
  bcftools_view_options = NULL,
  build = "38",
  strand = "+",
  info_type = "IMPUTE2 info score via 'bcftools +impute-info'",
  threads = 1,
  quiet = FALSE,
  clean = TRUE
)

Arguments

data: Path to the phenotypes stored as a CSV file.
results: Paths to the zip archives or directories generated by run_eggla_lmm() (vector of length two, one male and one female path).
id_column: Name of the column where sample/individual IDs are stored.
traits: One or multiple traits, i.e., columns' names from data, to be analysed separately.
covariates: One or several covariates, i.e., columns' names from data, to be used. Binary trait should be coded as '1' and '2', where sex must be coded: '1' = male, '2' = female, 'NA'/'0' = missing.
vcfs: Path to the "raw" VCF file(s) containing the genotypes of the individuals to be analysed.
working_directory: Directory in which computation will occur and where output files will be saved.
vep_file: Path to the VEP annotation file to be used to set variants RSIDs and add gene SYMBOL, etc.
use_info: A logical indicating whether to extract all informations stored in the "INFO" field.
bin_path: A named list containing the path to the PLINK2 and BCFtools binaries For PLINK2, an URL to the binary can be provided (see https://www.cog-genomics.org/plink/2.0).
bcftools_view_options: A string or a vector of strings (which will be pass to paste()) containing BCFtools view parameters, e.g., "--min-af 0.05", "--exclude 'INFO/INFO < 0.8'", and/or "--min-alleles 2 --max-alleles 2 --types snps".
build: Build of the genome on which the SNP is orientated. Default is "38".
strand: Orientation of the site to the human genome strand used. Should be "+" (default).
info_type: Type of information provided in the INFO column, e.g., "IMPUTE2 info score via 'bcftools +impute-info'",
threads: Number of threads to be used by some BCFtools and PLINK2 commands.
quiet: A logical indicating whether to suppress the output.
clean: A logical indicating whether to clean intermediary files or not.

Value

Path to results file.

Examples

if (interactive()) {
  data("bmigrowth")
  bmigrowth_csv <- file.path(tempdir(), "bmigrowth.csv")
  fwrite(
    x = bmigrowth,
    file = bmigrowth_csv
  )
  results_archives <- run_eggla_lmm(
    data = fread(
      file = file.path(tempdir(), "bmigrowth.csv"),
      colClasses = list(character = "ID")
    ),
    id_variable = "ID",
    age_days_variable = NULL,
    age_years_variable = "age",
    weight_kilograms_variable = "weight",
    height_centimetres_variable = "height",
    sex_variable = "sex",
    covariates = NULL,
    male_coded_zero = FALSE,
    random_complexity = 1,
    parallel = FALSE,
    parallel_n_chunks = 1,
    working_directory = tempdir()
  )
  run_eggla_gwas(
    data = fread(
      file = file.path(tempdir(), "bmigrowth.csv"),
      colClasses = list(character = "ID")
    ),
    results = results_archives,
    id_column = "ID",
    traits = c("slope_.*", "auc_.*", "^AP_.*", "^AR_.*"),
    covariates = c("sex"),
    vcfs = list.files(
      path = system.file("vcf", package = "eggla"),
      pattern = "\\.vcf$|\\.vcf.gz$",
      full.names = TRUE
    ),
    working_directory = tempdir(),
    vep_file = NULL,
    bin_path = list(
      bcftools = "/usr/bin/bcftools",
      plink2 = "/usr/bin/plink2"
    ),
    threads = 1
  )
}