Example code for running LDAK-KVIK
It is possible to run LDAK-KVIK directly on your own data, or to first try it out on simulated data. In case you wish to directly apply LDAK-KVIK to your own data, please jump to Running LDAK-KVIK.
Simulating genotypes and phenotypes
Genotype data can be generated using the command line:
./ldak6.linux --make-snps data --num-samples 10000 --num-snps 50000
This generates the files data.bed
, data.bim
and data.fam
, containing SNP data for 10000 individuals and 50000 genetic variants. The generated SNPs are evenly distributed across 22 chromosomes, with minor allele frequencies randomly sampled from the [0, 0.5] interval. Note that SNPs are generated in linkage equilibrium. The --make-snps
command also generates the toy covariate file data.covar
.
Next, a quantitative phenotype can be generated using the command line:
./ldak6.linux --make-phenos pheno --bfile data --power -0.25 --her 0.5 --num-phenos 1 --num-causals 1000
This command generates one phenotype using the previously generated SNP data. The phenotype is simulated under a SNP heritability of 0.5, based on 1000 randomly selected causals SNPs. The power parameter is set to -0.25, indicating that common SNPs (with higher MAF) tend to explain more phenotypic variance (a phenomenon observed in human traits). SNP effect sizes are sampled from a normal distribution, and scaled to match the heritability.
To generate binary phenotypes, the user should add a flag indicating the prevalence:
./ldak6.linux --make-phenos pheno --bfile data --power -0.25 --her 0.5 --num-phenos 1 --num-causals 1000 --prevalence 0.2
This will generate a phenotype data with 20% cases, and a liability heritability of 50%.
Running LDAK-KVIK
An explanation of the input options of LDAK-KVIK can be found in the LDAK-KVIK steps. Here, we demonstrate the basic command lines used to run LDAK-KVIK.
LDAK-KVIK is run in three steps. In Step 1, the Leave-One-Chromosome-Out (LOCO) PRS are computed using an Elastic Net model. Step 2 runs the single-SNP analysis, and Step 3 runs the gene-based associatiòn analysis. These steps are consecutively run using the following commands:
./ldak6.linux --kvik-step1 kvik --bfile data --pheno pheno.pheno --covar data.covar --max-threads 2
./ldak6.linux --kvik-step2 kvik --bfile data --pheno pheno.pheno --covar data.covar --max-threads 2
./ldak6.linux --kvik-step3 kvik --bfile data --genefile <gene annotation file> --max-threads 2
In case a binary trait is analysed, the user should add the --binary YES
flag in Step 1. Instructions for downloading the gene annotation file are described in the gene-based analysis section. An explanation of the generated output files is included in the LDAK-KVIK output page.