Skip to contents

Format VCF file(s) by filtering out all variants not satisfaying "--min-alleles 2 --max-alleles 2 --types snps" and setting IDs (if no annotation file using VEP is provided) with "%CHROM:%POS:%REF:%ALT" (see https://samtools.github.io/bcftools/). GWAS is performed on the formatted VCF file(s) by PLINK2 software (https://www.cog-genomics.org/plink/2.0).

Usage

run_eggla_gwas(
  data,
  results,
  id_column,
  traits = c("slope_.*", "auc_.*", "^AP_.*", "^AR_.*"),
  covariates,
  vcfs,
  working_directory,
  vep_file = NULL,
  use_info = TRUE,
  bin_path = list(bcftools = "/usr/bin/bcftools", plink2 = "/usr/bin/plink2"),
  bcftools_view_options = NULL,
  build = "38",
  strand = "+",
  info_type = "IMPUTE2 info score via 'bcftools +impute-info'",
  threads = 1,
  quiet = FALSE,
  clean = TRUE
)

Arguments

data

Path to the phenotypes stored as a CSV file.

results

Paths to the zip archives or directories generated by run_eggla_lmm() (vector of length two, one male and one female path).

id_column

Name of the column where sample/individual IDs are stored.

traits

One or multiple traits, i.e., columns' names from data, to be analysed separately.

covariates

One or several covariates, i.e., columns' names from data, to be used. Binary trait should be coded as '1' and '2', where sex must be coded: '1' = male, '2' = female, 'NA'/'0' = missing.

vcfs

Path to the "raw" VCF file(s) containing the genotypes of the individuals to be analysed.

working_directory

Directory in which computation will occur and where output files will be saved.

vep_file

Path to the VEP annotation file to be used to set variants RSIDs and add gene SYMBOL, etc.

use_info

A logical indicating whether to extract all informations stored in the "INFO" field.

bin_path

A named list containing the path to the PLINK2 and BCFtools binaries For PLINK2, an URL to the binary can be provided (see https://www.cog-genomics.org/plink/2.0).

bcftools_view_options

A string or a vector of strings (which will be pass to paste()) containing BCFtools view parameters, e.g., "--min-af 0.05", "--exclude 'INFO/INFO < 0.8'", and/or "--min-alleles 2 --max-alleles 2 --types snps".

build

Build of the genome on which the SNP is orientated. Default is "38".

strand

Orientation of the site to the human genome strand used. Should be "+" (default).

info_type

Type of information provided in the INFO column, e.g., "IMPUTE2 info score via 'bcftools +impute-info'",

threads

Number of threads to be used by some BCFtools and PLINK2 commands.

quiet

A logical indicating whether to suppress the output.

clean

A logical indicating whether to clean intermediary files or not.

Value

Path to results file.

Examples

if (interactive()) {
  data("bmigrowth")
  bmigrowth_csv <- file.path(tempdir(), "bmigrowth.csv")
  fwrite(
    x = bmigrowth,
    file = bmigrowth_csv
  )
  results_archives <- run_eggla_lmm(
    data = fread(
      file = file.path(tempdir(), "bmigrowth.csv"),
      colClasses = list(character = "ID")
    ),
    id_variable = "ID",
    age_days_variable = NULL,
    age_years_variable = "age",
    weight_kilograms_variable = "weight",
    height_centimetres_variable = "height",
    sex_variable = "sex",
    covariates = NULL,
    male_coded_zero = FALSE,
    random_complexity = 1,
    parallel = FALSE,
    parallel_n_chunks = 1,
    working_directory = tempdir()
  )
  run_eggla_gwas(
    data = fread(
      file = file.path(tempdir(), "bmigrowth.csv"),
      colClasses = list(character = "ID")
    ),
    results = results_archives,
    id_column = "ID",
    traits = c("slope_.*", "auc_.*", "^AP_.*", "^AR_.*"),
    covariates = c("sex"),
    vcfs = list.files(
      path = system.file("vcf", package = "eggla"),
      pattern = "\\.vcf$|\\.vcf.gz$",
      full.names = TRUE
    ),
    working_directory = tempdir(),
    vep_file = NULL,
    bin_path = list(
      bcftools = "/usr/bin/bcftools",
      plink2 = "/usr/bin/plink2"
    ),
    threads = 1
  )
}