Note: ~ in the following is /home/rhapsody
In order to retrieve the latest version of the main script RHAPSODY_WP3_PreDiab.Rmd and utils/* scripts, you can copy/paste the git commands below.
This will clone (i.e., download) the scripts from GitHub.
Scripts will be downloaded into your home directory within a directory named WP3.
git clone https://github.com/mcanouil/RHAPSODY.git ~/WP3/scripts
Once donwload is complete, you should get the following directory tree in your home (`cd ~`).
## 1 root2
## 2 °--WP3
## 3 °--scripts
## 4 ¦--docker
## 5 ¦ ¦--Dockerfile
## 6 ¦ ¦--Dockerfile_update
## 7 ¦ ¦--login.html
## 8 ¦ ¦--logo.png
## 9 ¦ ¦--markdown_packages.R
## 10 ¦ °--r_packages.R
## 11 ¦--docker_analysis
## 12 ¦ ¦--opal_credentials.txt
## 13 ¦ ¦--rhapsody.R
## 14 ¦ °--rhapsody.sh
## 15 ¦--docs
## 16 ¦ ¦--howto.Rmd
## 17 ¦ ¦--index.html
## 18 ¦ °--RHAPSODY_Logo_WEB_Color.png
## 19 ¦--opal_credentials.txt
## 20 ¦--README.md
## 21 ¦--RHAPSODY_WP3_PreDiab.Rmd
## 22 °--utils
## 23 ¦--ethnicity_template.csv
## 24 ¦--ethnicity_template.R
## 25 ¦--handleVCF.sh
## 26 ¦--Install_Rpackages.R
## 27 ¦--RHAPSODY_WP3_PreDiab_DEBUG.R
## 28 ¦--RHAPSODY_WP3_PreDiab_OPAL_DEBUG.R
## 29 °--RunAnalysis.R
~/WP3/scripts/README.pdf~/WP3/scripts/RHAPSODY_WP3_PreDiab.Rmd~/WP3/scripts/opal_credentials.txt~/WP3/scripts/utils/RHAPSODY_WP3_PreDiab_DEBUG.R,~/WP3/scripts/utils/RunAnalysis.R`Rscript ~/WP3/scripts/utils/RunAnalysis.R`;~/WP3/scripts/utils/handleVCF.shformat_vcfs must be set to FALSE).~/WP3/scripts/utils/Install_Rpackages.R~/WP3/scripts/utils/check_analysis.Rmd~/WP3/scripts/utils/README.RmdREADME.pdf.~/WP3/scripts/utils/howto.Rmdhowto.html.Before runing anything, you can check the current settings of your cluster/grid/computer/laptop.
For reproducibility:
| package | version | package | version | package | version | package | version |
|---|---|---|---|---|---|---|---|
| broom | 0.5.0 | data.tree | 0.7.8 | kableExtra | 0.9.0 | parallel | 3.4.2 |
| scales | 1.0.0 | devtools | 1.13.6 | knitr | 1.20 | readxl | 1.1.0 |
| cowplot | 0.9.3 | grid | 3.4.2 | lme4 | 1.1-18-1 | writexl | 01.0 |
| viridis | 0.5.1 | Hmisc | 4.1-1 | lmerTest | 3.0-1 | tidyverse | 1.2.1 |
~/WP3/scripts/utils/Install_Rpackages.R.
Interactively using R
source("~/WP3/scripts/utils/Install_Rpackages.R")Non-interactively using bash/shell
Rscript ~/WP3/scripts/utils/Install_Rpackages.RIf VCFtools is not available on your machine, please install it following the instructions on https://vcftools.github.io/.
For reproducibility:
To avoid OPAL database credentials to be hard written within the script, you should create a text file with the server, username and password to access your local OPAL database (i.e., RHAPSODY node hosting the phenotype data).
You can also modify the default file ~/WP3/scripts/opal_credentials.txt.
http://localhost:8080
administrator
password
NOTE: The username provided in that file should have “download” rights.
The data are downloaded locally in the R session for and during the analysis, then all local data are deleted (i.e., the data are not leaving the R session in any way).
Please check where the imputation quality is stored in your VCF files.
Depending on where (i.e., locally, Sanger Imputation Server, Michigan Imputation Server, etc.) and with which softwares (i.e., impute2, PBWT, etc.) you used to impute your genetic data, this information might be stored as INFO or R2 (most frequent names used, but it might be something else).
A template script (i.e., ~/WP3/scripts/utils/RunAnalysis.R) is available to help run the analysis as a bash/shell command.
Set the output directory.
The default outputs will be generated in the current directory (./).
working_directory <- "/media/rhapsody_output" Set a cohort name.
It’s a identifier for a cohort and array. For example, “DESIR_Metabochip” for the results from the Metabochip Array in the DESIR cohort.
author_name <- "Firstname LASTNAME"Set the number of CPUs you are going to use to run the Linear Mixed Model. Just to know who performed the analysis, if there are questions at a later point.
n_cpu <- 2Set the path to your file opal_credentials.txt. For example, ~/WP3/scripts/opal_credentials.txt, if you modified the default file (it is better to use absolute path, i.e., without ~/).
opal_credentials <- "/media/credentials/opal_credentials.txt"Set the path to your (RAW) VCF files
vcf_directory <- "/media/vcf"Set the imputation quality tag for your VCF
imputation_quality_tag <- "INFO" # To be set according to VCF (could also be "R2")Set the path to VCFtools.
Default location of the software is usually /usr/local/bin.
vcftools_binary_path <- "/usr/local/bin"Define if you want to format/split the VCF files.
Default is to format and split the VCF files available within the script.
format_vcfs <- TRUEanalysis_step <- 7Define if genetic variants analyses should be performed.
variants_analysis <- TRUEDefine if genetic components should be used.
genomic_component <- NULL # A csv file with SUBJID as first column and PC01 to PCXXTemplate: ~/WP3/scripts/utils/RunAnalysis.R
# set the output directory or leave as is;
# output will be generated where the Rmarkdown file is
working_directory <- "/media/rhapsody_output"
cohort_name <- "Cohort_Name"
author_name <- "Firstname LASTNAME"
n_cpu <- 2
opal_credentials <- "/media/credentials/opal_credentials.txt"
vcf_directory <- "/media/vcf"
imputation_quality_tag <- "INFO" # To be set according to VCF (could also be "R2")
vcftools_binary_path <- "/usr/local/bin"
format_vcfs <- TRUE
analysis_step <- 7
variants_analysis <- TRUE
genomic_component <- NULL # A csv file with SUBJID as first column and PC01 to PCXX
# Run the analysis
dir.create(path = working_directory, showWarnings = FALSE, mode = "0777")
rmarkdown::render(
input = "/home/rhapsody/WP3/scripts/RHAPSODY_WP3_PreDiab.Rmd",
output_format = "html_document",
output_file = paste0(
"RHAPSODY_WP3_PreDiab_",
cohort_name, "_step",
analysis_step, ".html"
),
output_dir = working_directory,
params = list(
cohort_name = cohort_name,
author_name = author_name,
opal_credentials = opal_credentials,
vcf_input_directory = vcf_directory,
imputation_quality_tag = imputation_quality_tag,
vcftools_binary_path = vcftools_binary_path,
output_directory = working_directory,
analysis_step = analysis_step,
format_vcfs = format_vcfs,
variants_analysis = variants_analysis,
chunk_size = 1000,
exclude_X = TRUE,
genomic_component = genomic_component,
n_cpu = n_cpu,
echo = FALSE, # Should R code be printed in the report
warning = FALSE, # Should warnings be printed in the report
message = FALSE, # Should messages be printed in the report
debug = FALSE
),
encoding = "UTF-8"
)
Once you created a R script using the template (i.e., ~/WP3/scripts/utils/RunAnalysis.R), you can start the analysis with a simple command.
Rscript ~/WP3/scripts/utils/RunAnalysis.R
In the case you want to format your VCF files before running the script.
The best way to proceed is to use/set the parameters as showed below.
format_vcfs <- FALSE
analysis_step <- 4
variants_analysis <- FALSERun the script within bash/shell
Rscript ~/WP3/scripts/utils/RunAnalysis.R
This will create (i.e., in the output directory you set up):
./RHAPSODY_WP3_PreDiab_Cohort_Name_step4.html./vcffilespath.txt
The list of all available VCF files to format using ./handleVCF_Cohort_Name.sh. The files are also listed in the report in Section 6.2.1 Check available VCF files.
/media/vcf/chr1.vcf.gz
/media/vcf/chr2.vcf.gz
...
/media/vcf/chr*.vcf.gz
...
/media/vcf/chr22.vcf.gz./handleVCF_Cohort_Name.sh
The command (with the correct parameters) to format the VCF using ~/WP3/scripts/utils/handleVCF.sh. This is also stated in the report in Section 6.2.2.3.1 Manually format VCFs.
#!/bin/sh
./handleVCF.sh Cohort_Name /media/vcf INFO /usr/local/bin Run the script to format the VCF
sh ./handleVCF_Cohort_Name.shRun the whole analysis with the proper parameter
format_vcfs <- FALSE # keep this to FALSE
analysis_step <- 7
variants_analysis <- TRUE
Rscript ~/WP3/scripts/utils/RunAnalysis.RIf Docker is not installed on your cluster/grid/computer/laptop, you can install it following the instructions on docs.docker.com.
You can download the Docker image from Docker Hub:
docker pull ghcr.io/mcanouil/rhapsody:1.3.0
To pull a particular version (i.e., not the latest), you can add the tag version.
Tags are listed in the repository.
docker pull ghcr.io/mcanouil/rhapsody:latest
The easiest way to start a the Docker container using ghcr.io/mcanouil/rhapsody image is to run the following Docker command.
docker run \
--name rhapsody \
--detach \
ghcr.io/mcanouil/rhapsody:latest
With those settings, the Docker container will not have access to any data stored in your cluster/grid/computer/laptop.
In order to allow the Docker container to see some data, you have to use --volume from:to argument.
docker run \
--name rhapsody \
--detach \
--volume /path/to/vcf:/media/vcf \
ghcr.io/mcanouil/rhapsody:latest
If you want to more carefully set the Docker container, (, CPUs, memory, etc.), you can find all the parameters at the following address: https://docs.docker.com/engine/reference/commandline/run/
Open a web browser and type http://localhost:8787.
You will see an authentication box with Username and Password: * Username: rhapsody * Password: wp3
In order to retrieve the latest version of the main script RHAPSODY_WP3_PreDiab.Rmd and utils/* scripts, you can copy/paste the git commands below.
This will pull (i.e., download) the scripts from GitHub.
sudo git -C /home/rhapsody/WP3/scripts pull origin master
Once donwload is complete, you should get the following directory tree in your home (`cd ~`).
## 1 home
## 2 °--rhapsody
## 3 °--WP3
## 4 °--scripts
## 5 ¦--docker
## 6 ¦ ¦--Dockerfile
## 7 ¦ ¦--Dockerfile_update
## 8 ¦ ¦--login.html
## 9 ¦ ¦--logo.png
## 10 ¦ ¦--markdown_packages.R
## 11 ¦ °--r_packages.R
## 12 ¦--docker_analysis
## 13 ¦ ¦--opal_credentials.txt
## 14 ¦ ¦--rhapsody.R
## 15 ¦ °--rhapsody.sh
## 16 ¦--docs
## 17 ¦ ¦--howto.Rmd
## 18 ¦ ¦--index.html
## 19 ¦ °--RHAPSODY_Logo_WEB_Color.png
## 20 ¦--opal_credentials.txt
## 21 ¦--README.md
## 22 ¦--RHAPSODY_WP3_PreDiab.Rmd
## 23 °--utils
## 24 ¦--ethnicity_template.csv
## 25 ¦--ethnicity_template.R
## 26 ¦--handleVCF.sh
## 27 ¦--Install_Rpackages.R
## 28 ¦--RHAPSODY_WP3_PreDiab_DEBUG.R
## 29 ¦--RHAPSODY_WP3_PreDiab_OPAL_DEBUG.R
## 30 °--RunAnalysis.R
/home/rhapsody/WP3/scripts/README.pdf/home/rhapsody/WP3/scripts/RHAPSODY_WP3_PreDiab.Rmd/home/rhapsody/WP3/scripts/opal_credentials.txt/home/rhapsody/WP3/scripts/utils/RHAPSODY_WP3_PreDiab_DEBUG.R,/home/rhapsody/WP3/scripts/utils/RunAnalysis.R`Rscript ~/WP3/scripts/utils/RunAnalysis.R`;/home/rhapsody/WP3/scripts/utils/handleVCF.shformat_vcfs must be set to FALSE)./home/rhapsody/WP3/scripts/utils/Install_Rpackages.R/home/rhapsody/WP3/scripts/utils/check_analysis.Rmd/home/rhapsody/WP3/scripts/utils/README.RmdREADME.pdf./home/rhapsody/WP3/scripts/utils/howto.Rmdhowto.html.To avoid OPAL database credentials to be hard written within the script, you should create a text file with the server, username and password to access your local OPAL database (i.e., RHAPSODY node hosting the phenotype data).
You can also modify the default file /home/rhapsody/WP3/scripts/opal_credentials.txt.
http://localhost:8080
administrator
password
NOTE: The username provided in that file should have “download” rights.
The data are downloaded locally in the R session for and during the analysis, then all local data are deleted (i.e., the data are not leaving the R session in any way).
Please check where the imputation quality is stored in your VCF files.
Depending on where (i.e., locally, Sanger Imputation Server, Michigan Imputation Server, etc.) and with which softwares (i.e., impute2, PBWT, etc.) you used to impute your genetic data, this information might be stored as INFO or R2 (most frequent names used, but it might be something else).
A template script (i.e., /home/rhapsody/WP3/scripts/utils/RunAnalysis.R) is available to help run the analysis as a bash/shell command.
Set the output directory.
The default outputs will be generated in the current directory (./).
working_directory <- "/media/rhapsody_output" Set a cohort name.
It’s a identifier for a cohort and array. For example, “DESIR_Metabochip” for the results from the Metabochip Array in the DESIR cohort.
author_name <- "Firstname LASTNAME"Set the number of CPUs you are going to use to run the Linear Mixed Model. Just to know who performed the analysis, if there are questions at a later point.
n_cpu <- 2Set the path to your file opal_credentials.txt. For example, ~/WP3/scripts/opal_credentials.txt, if you modified the default file (it is better to use absolute path, i.e., without ~/).
opal_credentials <- "/media/credentials/opal_credentials.txt"Set the path to your (RAW) VCF files
vcf_directory <- "/media/vcf"Set the imputation quality tag for your VCF
imputation_quality_tag <- "INFO" # To be set according to VCF (could also be "R2")Set the path to VCFtools.
Default location of the software is usually /usr/local/bin.
vcftools_binary_path <- "/usr/local/bin"Define if you want to format/split the VCF files.
Default is to format and split the VCF files available within the script.
format_vcfs <- TRUEanalysis_step <- 7Define if genetic variants analyses should be performed.
variants_analysis <- TRUEDefine if genetic components should be used.
genomic_component <- NULL # A csv file with SUBJID as first column and PC01 to PCXXTemplate: ~/WP3/scripts/utils/RunAnalysis.R
# set the output directory or leave as is;
# output will be generated where the Rmarkdown file is
working_directory <- "/media/rhapsody_output"
cohort_name <- "Cohort_Name"
author_name <- "Firstname LASTNAME"
n_cpu <- 2
opal_credentials <- "/media/credentials/opal_credentials.txt"
vcf_directory <- "/media/vcf"
imputation_quality_tag <- "INFO" # To be set according to VCF (could also be "R2")
vcftools_binary_path <- "/usr/local/bin"
format_vcfs <- TRUE
analysis_step <- 7
variants_analysis <- TRUE
genomic_component <- NULL # A csv file with SUBJID as first column and PC01 to PCXX
# Run the analysis
dir.create(path = working_directory, showWarnings = FALSE, mode = "0777")
rmarkdown::render(
input = "/home/rhapsody/WP3/scripts/RHAPSODY_WP3_PreDiab.Rmd",
output_format = "html_document",
output_file = paste0(
"RHAPSODY_WP3_PreDiab_",
cohort_name, "_step",
analysis_step, ".html"
),
output_dir = working_directory,
params = list(
cohort_name = cohort_name,
author_name = author_name,
opal_credentials = opal_credentials,
vcf_input_directory = vcf_directory,
imputation_quality_tag = imputation_quality_tag,
vcftools_binary_path = vcftools_binary_path,
output_directory = working_directory,
analysis_step = analysis_step,
format_vcfs = format_vcfs,
variants_analysis = variants_analysis,
chunk_size = 1000,
exclude_X = TRUE,
genomic_component = genomic_component,
n_cpu = n_cpu,
echo = FALSE, # Should R code be printed in the report
warning = FALSE, # Should warnings be printed in the report
message = FALSE, # Should messages be printed in the report
debug = FALSE
),
encoding = "UTF-8"
)
Once you created a R script using the template (i.e., /home/rhapsody/WP3/scripts/utils/RunAnalysis.R), you can start the analysis with a simple command.
Rscript /home/rhapsody/WP3/scripts/utils/RunAnalysis.R
In the case you want to format your VCF files before running the script.
The best way to proceed is to use/set the parameters as showed below.
format_vcfs <- FALSE
analysis_step <- 4
variants_analysis <- FALSERun the script within bash/shell
Rscript /home/rhapsody/WP3/scripts/utils/RunAnalysis.R
This will create (i.e., in the output directory you set up):
./RHAPSODY_WP3_PreDiab_Cohort_Name_step4.html./vcffilespath.txt
The list of all available VCF files to format using ./handleVCF_Cohort_Name.sh. The files are also listed in the report in Section 6.2.1 Check available VCF files.
/media/vcf/chr1.vcf.gz
/media/vcf/chr2.vcf.gz
...
/media/vcf/chr*.vcf.gz
...
/media/vcf/chr22.vcf.gz./handleVCF_Cohort_Name.sh
The command (with the correct parameters) to format the VCF using /home/rhapsody/WP3/scripts/utils/handleVCF.sh. This is also stated in the report in Section 6.2.2.3.1 Manually format VCFs.
#!/bin/sh
./handleVCF.sh Cohort_Name /media/vcf INFO /usr/local/bin Run the script to format the VCF
sh ./handleVCF_Cohort_Name.shRun the whole analysis with the proper parameter
format_vcfs <- FALSE # keep this to FALSE
analysis_step <- 7
variants_analysis <- TRUE
Rscript /home/rhapsody/WP3/scripts/utils/RunAnalysis.RIf Docker is not installed on your cluster/grid/computer/laptop, you can install it following the instructions on docs.docker.com.
In order to retrieve the latest version of the main script RHAPSODY_WP3_PreDiab.Rmd and utils/* scripts, you can copy/paste the git commands below.
This will pull (i.e., download) the scripts from GitHub.
sudo git -C /home/rhapsody/WP3/scripts pull origin master
Once donwload is complete, you should get the following directory tree in your home (`cd ~`).
## 1 home
## 2 °--rhapsody
## 3 °--WP3
## 4 °--scripts
## 5 ¦--docker
## 6 ¦ ¦--Dockerfile
## 7 ¦ ¦--Dockerfile_update
## 8 ¦ ¦--login.html
## 9 ¦ ¦--logo.png
## 10 ¦ ¦--markdown_packages.R
## 11 ¦ °--r_packages.R
## 12 ¦--docker_analysis
## 13 ¦ ¦--opal_credentials.txt
## 14 ¦ ¦--rhapsody.R
## 15 ¦ °--rhapsody.sh
## 16 ¦--docs
## 17 ¦ ¦--howto.Rmd
## 18 ¦ ¦--index.html
## 19 ¦ °--RHAPSODY_Logo_WEB_Color.png
## 20 ¦--opal_credentials.txt
## 21 ¦--README.md
## 22 ¦--RHAPSODY_WP3_PreDiab.Rmd
## 23 °--utils
## 24 ¦--ethnicity_template.csv
## 25 ¦--ethnicity_template.R
## 26 ¦--handleVCF.sh
## 27 ¦--Install_Rpackages.R
## 28 ¦--RHAPSODY_WP3_PreDiab_DEBUG.R
## 29 ¦--RHAPSODY_WP3_PreDiab_OPAL_DEBUG.R
## 30 °--RunAnalysis.R
/home/rhapsody/WP3/scripts/README.pdf/home/rhapsody/WP3/scripts/RHAPSODY_WP3_PreDiab.Rmd/home/rhapsody/WP3/scripts/opal_credentials.txt/home/rhapsody/WP3/scripts/utils/RHAPSODY_WP3_PreDiab_DEBUG.R,/home/rhapsody/WP3/scripts/utils/RunAnalysis.R`Rscript ~/WP3/scripts/utils/RunAnalysis.R`;/home/rhapsody/WP3/scripts/utils/handleVCF.shformat_vcfs must be set to FALSE)./home/rhapsody/WP3/scripts/utils/Install_Rpackages.R/home/rhapsody/WP3/scripts/utils/check_analysis.Rmd/home/rhapsody/WP3/scripts/utils/README.RmdREADME.pdf./home/rhapsody/WP3/scripts/utils/howto.Rmdhowto.html.To avoid OPAL database credentials to be hard written within the script, you should create a text file with the server, username and password to access your local OPAL database (i.e., RHAPSODY node hosting the phenotype data).
You can also modify the default file /home/rhapsody/WP3/scripts/opal_credentials.txt.
http://localhost:8080
administrator
password
NOTE: The username provided in that file should have “download” rights.
The data are downloaded locally in the R session for and during the analysis, then all local data are deleted (i.e., the data are not leaving the R session in any way).
Please check where the imputation quality is stored in your VCF files.
Depending on where (i.e., locally, Sanger Imputation Server, Michigan Imputation Server, etc.) and with which softwares (i.e., impute2, PBWT, etc.) you used to impute your genetic data, this information might be stored as INFO or R2 (most frequent names used, but it might be something else).
A template script (i.e., /home/rhapsody/WP3/scripts/utils/RunAnalysis.R) is available to help run the analysis as a bash/shell command.
Set the output directory.
The default outputs will be generated in the current directory (./).
working_directory <- "/media/rhapsody_output" Set a cohort name.
It’s a identifier for a cohort and array. For example, “DESIR_Metabochip” for the results from the Metabochip Array in the DESIR cohort.
author_name <- "Firstname LASTNAME"Set the number of CPUs you are going to use to run the Linear Mixed Model. Just to know who performed the analysis, if there are questions at a later point.
n_cpu <- 2Set the path to your file opal_credentials.txt. For example, /home/rhapsody/WP3/scripts/opal_credentials.txt, if you modified the default file (it is better to use absolute path, i.e., without ~/).
opal_credentials <- "/media/credentials/opal_credentials.txt"Set the path to your (RAW) VCF files
vcf_directory <- "/media/vcf"Set the imputation quality tag for your VCF
imputation_quality_tag <- "INFO" # To be set according to VCF (could also be "R2")Set the path to VCFtools.
Default location of the software is usually /usr/local/bin.
vcftools_binary_path <- "/usr/local/bin"Define if you want to format/split the VCF files.
Default is to format and split the VCF files available within the script.
format_vcfs <- TRUEanalysis_step <- 7Define if genetic variants analyses should be performed.
variants_analysis <- TRUEDefine if genetic components should be used.
genomic_component <- NULL # A csv file with SUBJID as first column and PC01 to PCXXTemplate: /home/rhapsody/WP3/scripts/utils/RunAnalysis.R
# set the output directory or leave as is;
# output will be generated where the Rmarkdown file is
working_directory <- "/media/rhapsody_output"
cohort_name <- "Cohort_Name"
author_name <- "Firstname LASTNAME"
n_cpu <- 2
opal_credentials <- "/media/credentials/opal_credentials.txt"
vcf_directory <- "/media/vcf"
imputation_quality_tag <- "INFO" # To be set according to VCF (could also be "R2")
vcftools_binary_path <- "/usr/local/bin"
format_vcfs <- TRUE
analysis_step <- 7
variants_analysis <- TRUE
genomic_component <- NULL # A csv file with SUBJID as first column and PC01 to PCXX
# Run the analysis
dir.create(path = working_directory, showWarnings = FALSE, mode = "0777")
rmarkdown::render(
input = "/home/rhapsody/WP3/scripts/RHAPSODY_WP3_PreDiab.Rmd",
output_format = "html_document",
output_file = paste0(
"RHAPSODY_WP3_PreDiab_",
cohort_name, "_step",
analysis_step, ".html"
),
output_dir = working_directory,
params = list(
cohort_name = cohort_name,
author_name = author_name,
opal_credentials = opal_credentials,
vcf_input_directory = vcf_directory,
imputation_quality_tag = imputation_quality_tag,
vcftools_binary_path = vcftools_binary_path,
output_directory = working_directory,
analysis_step = analysis_step,
format_vcfs = format_vcfs,
variants_analysis = variants_analysis,
chunk_size = 1000,
exclude_X = TRUE,
genomic_component = genomic_component,
n_cpu = n_cpu,
echo = FALSE, # Should R code be printed in the report
warning = FALSE, # Should warnings be printed in the report
message = FALSE, # Should messages be printed in the report
debug = FALSE
),
encoding = "UTF-8"
)
Once you created a R script using the template (i.e., /home/rhapsody/WP3/scripts/utils/RunAnalysis.R), you can start the analysis with a simple command.
docker run \
--name rhapsody \
--detach \
--volume /path/to/vcf/:/media/vcf \
--volume /path/to/opal_credentials:/media/credentials \
--volume /path/to/RunAnalysis:/media/RunAnalysis \
--volume /path/to/rhapsody_output:/media/rhapsody_output \
--rm \
ghcr.io/mcanouil/rhapsody:latest Rscript /media/RunAnalysis/RunAnalysis.R
In that configuration, the script /home/rhapsody/WP3/scripts/utils/RunAnalysis.R will be as follow:
# set the output directory or leave as is;
# output will be generated where the Rmarkdown file is
working_directory <- "/media/rhapsody_output"
cohort_name <- "Cohort_Name"
author_name <- "Firstname LASTNAME"
n_cpu <- 2
opal_credentials <- "/media/credentials/opal_credentials.txt"
vcf_directory <- "/media/vcf"
imputation_quality_tag <- "INFO" # To be set according to VCF (could also be "R2")
vcftools_binary_path <- "/usr/local/bin"
format_vcfs <- TRUE
analysis_step <- 7
variants_analysis <- TRUE
genomic_component <- NULL # A csv file with SUBJID as first column and PC01 to PCXX
# Run the analysis
dir.create(path = working_directory, showWarnings = FALSE, mode = "0777")
rmarkdown::render(
input = "/home/rhapsody/WP3/scripts/RHAPSODY_WP3_PreDiab.Rmd",
output_format = "html_document",
output_file = paste0(
"RHAPSODY_WP3_PreDiab_",
cohort_name, "_step",
analysis_step, ".html"
),
output_dir = working_directory,
params = list(
cohort_name = cohort_name,
author_name = author_name,
opal_credentials = opal_credentials,
vcf_input_directory = vcf_directory,
imputation_quality_tag = imputation_quality_tag,
vcftools_binary_path = vcftools_binary_path,
output_directory = working_directory,
analysis_step = analysis_step,
format_vcfs = format_vcfs,
variants_analysis = variants_analysis,
chunk_size = 1000,
exclude_X = TRUE,
genomic_component = genomic_component,
n_cpu = n_cpu,
echo = FALSE, # Should R code be printed in the report
warning = FALSE, # Should warnings be printed in the report
message = FALSE, # Should messages be printed in the report
debug = FALSE
),
encoding = "UTF-8"
)
In the case you want to format your VCF files before running the script.
The best way to proceed is to use/set the parameters as showed below.
format_vcfs <- FALSE
analysis_step <- 4
variants_analysis <- FALSERun the script within bash/shell
docker run \
--name rhapsody \
--detach \
--volume /path/to/vcf/:/media/vcf \
--volume /path/to/opal_credentials:/media/credentials \
--volume /path/to/RunAnalysis:/media/RunAnalysis \
--volume /path/to/rhapsody_output:/media/rhapsody_output \
--rm \
ghcr.io/mcanouil/rhapsody:latest Rscript /media/RunAnalysis/RunAnalysis.R
In that configuration, the script /home/rhapsody/WP3/scripts/utils/RunAnalysis.R will be as follow:
# set the output directory or leave as is;
# output will be generated where the Rmarkdown file is
working_directory <- "/media/rhapsody_output"
cohort_name <- "Cohort_Name"
author_name <- "Firstname LASTNAME"
n_cpu <- 2
opal_credentials <- "/media/credentials/opal_credentials.txt"
vcf_directory <- "/media/vcf"
imputation_quality_tag <- "INFO" # To be set according to VCF (could also be "R2")
vcftools_binary_path <- "/usr/local/bin"
format_vcfs <- FALSE
analysis_step <- 4
variants_analysis <- FALSE
genomic_component <- NULL # A csv file with SUBJID as first column and PC01 to PCXX
# Run the analysis
dir.create(path = working_directory, showWarnings = FALSE, mode = "0777")
rmarkdown::render(
input = "/home/rhapsody/WP3/scripts/RHAPSODY_WP3_PreDiab.Rmd",
output_format = "html_document",
output_file = paste0(
"RHAPSODY_WP3_PreDiab_",
cohort_name, "_step",
analysis_step, ".html"
),
output_dir = working_directory,
params = list(
cohort_name = cohort_name,
author_name = author_name,
opal_credentials = opal_credentials,
vcf_input_directory = vcf_directory,
imputation_quality_tag = imputation_quality_tag,
vcftools_binary_path = vcftools_binary_path,
output_directory = working_directory,
analysis_step = analysis_step,
format_vcfs = format_vcfs,
variants_analysis = variants_analysis,
chunk_size = 1000,
exclude_X = TRUE,
genomic_component = genomic_component,
n_cpu = n_cpu,
echo = FALSE, # Should R code be printed in the report
warning = FALSE, # Should warnings be printed in the report
message = FALSE, # Should messages be printed in the report
debug = FALSE
),
encoding = "UTF-8"
)
This will create (i.e., in the output directory you set up):
/media/rhapsody_output/RHAPSODY_WP3_PreDiab_Cohort_Name_step4.html/media/rhapsody_output/vcffilespath.txt
The list of all available VCF files to format using ./handleVCF_Cohort_Name.sh. The files are also listed in the report in Section 6.2.1 Check available VCF files.
/media/vcf/chr1.vcf.gz
/media/vcf/chr2.vcf.gz
...
/media/vcf/chr*.vcf.gz
...
/media/vcf/chr22.vcf.gz/media/rhapsody_output/handleVCF_Cohort_Name.sh
The command (with the correct parameters) to format the VCF using ~/WP3/scripts/utils/handleVCF.sh. This is also stated in the report in Section 6.2.2.3.1 Manually format VCFs.
#!/bin/sh
./handleVCF.sh Cohort_Name /media/vcf INFO /usr/local/bin Run the script to format the VCF
docker run \
--name rhapsody \
--detach \
--volume /path/to/vcf/:/media/vcf \
--volume /path/to/opal_credentials:/media/credentials \
--volume /path/to/RunAnalysis:/media/RunAnalysis \
--volume /path/to/rhapsody_output:/media/rhapsody_output \
--rm \
ghcr.io/mcanouil/rhapsody:latest sh /media/rhapsody_output/handleVCF_Cohort_Name.shRun the whole analysis with the proper parameter
format_vcfs <- FALSE # keep this to FALSE
analysis_step <- 7
variants_analysis <- TRUE
docker run \
--name rhapsody \
--detach \
--volume /path/to/vcf/:/media/vcf \
--volume /path/to/opal_credentials:/media/credentials \
--volume /path/to/RunAnalysis:/media/RunAnalysis \
--volume /path/to/rhapsody_output:/media/rhapsody_output \
--rm \
ghcr.io/mcanouil/rhapsody:latest Rscript /media/RunAnalysis/RunAnalysis.R
In that configuration, the script ~/WP3/scripts/utils/RunAnalysis.R will be as follow:
# set the output directory or leave as is;
# output will be generated where the Rmarkdown file is
working_directory <- "/media/rhapsody_output"
cohort_name <- "Cohort_Name"
author_name <- "Firstname LASTNAME"
n_cpu <- 2
opal_credentials <- "/media/credentials/opal_credentials.txt"
vcf_directory <- "/media/vcf"
imputation_quality_tag <- "INFO" # To be set according to VCF (could also be "R2")
vcftools_binary_path <- "/usr/local/bin"
format_vcfs <- FALSE # keep this to FALSE
analysis_step <- 7
variants_analysis <- TRUE
genomic_component <- NULL # A csv file with SUBJID as first column and PC01 to PCXX
# Run the analysis
dir.create(path = working_directory, showWarnings = FALSE, mode = "0777")
rmarkdown::render(
input = "/home/rhapsody/WP3/scripts/RHAPSODY_WP3_PreDiab.Rmd",
output_format = "html_document",
output_file = paste0(
"RHAPSODY_WP3_PreDiab_",
cohort_name, "_step",
analysis_step, ".html"
),
output_dir = working_directory,
params = list(
cohort_name = cohort_name,
author_name = author_name,
opal_credentials = opal_credentials,
vcf_input_directory = vcf_directory,
imputation_quality_tag = imputation_quality_tag,
vcftools_binary_path = vcftools_binary_path,
output_directory = working_directory,
analysis_step = analysis_step,
format_vcfs = format_vcfs,
variants_analysis = variants_analysis,
chunk_size = 1000,
exclude_X = TRUE,
genomic_component = genomic_component,
n_cpu = n_cpu,
echo = FALSE, # Should R code be printed in the report
warning = FALSE, # Should warnings be printed in the report
message = FALSE, # Should messages be printed in the report
debug = FALSE
),
encoding = "UTF-8"
)