LDAK-KVIK output

Below, we discuss the output files generated by LDAK-KVIK, when using the command lines:

./ldak6.linux --kvik-step1 kvik --bfile data --pheno phenofile --covar covfile --max-threads 4
./ldak6.linux --kvik-step2 kvik --bfile data --pheno phenofile --covar covfile --max-threads 4
./ldak6.linux --kvik-step3 kvik --bfile data --genefile genefile --max-threads 4

Step 1

Step 1 of LDAK-KVIK generates five files: kvik.step1.progress, kvik.step1.root, kvik.step1.loco.details,kvik.step1.loco.prs and kvik.step1.effects. The kvik.step1.progress file contains the screen output of Step 1. The kvik.step1.root file contains information on the input arguments, which are used to pass on to the next steps. kvik.step1.loco.detais contains information on the scaling estimate, power parameter and heritability estimate, and kvik.step1.loco.prs contains the leave-one-chromosome-out predictors constructed using the elastic net. Finally, the kvik.step1.effects file contains the SNP effects of the full elastic net model based on all chromosomes.

Step 2

The second step of LDAK-KVIK performs single-SNP analysis using the PRS of step 1 as offset. The fitted regression coefficients of covariates are included in kvik.step2.coeff, and the results of single-SNP analysis are included in kvik.step2.pvalues, kvik.step2.summaries and kvik.step2.assoc. The kvik.step2.progress file contains the verbose of --kvik-step2.

The files kvik.step2.pvalues and kvik.step2.summaries contain information on P values and Z-scores per SNP, and kvik.step2.assoc contains the complete summary statistics. An example header of kvik.step2.assoc is:

Chromosome Predictor Basepair A1 A2 Wald_Stat Wald_P Effect SD A1_mean MAF SPA_Status
1 SNP1 10000 A C -0.4717 6.3715e-01 -6.9038e-03 1.4636e-02 0.241400 0.120700 NOT_USED
1 SNP2 20000 A C 0.0488 9.6105e-01 4.7483e-04 9.7227e-03 0.843100 0.421550 NOT_USED
1 SNP3 30000 A C 1.0593 2.8945e-01 1.0828e-02 1.0222e-02 0.636000 0.318000 NOT_USED
1 SNP4 40000 A C 0.4545 6.4948e-01 5.0009e-03 1.1003e-02 0.502900 0.251450 NOT_USED
1 SNP5 50000 A C -0.5006 6.1664e-01 -5.5770e-03 1.1140e-02 0.475200 0.237600 NOT_USED
...

The regression coefficients and standard errors are respectively given in the Effect and SD columns. Wald_Stat contains the Z-scores, and Wald_P contains the associated P values. SPA_Status indicates whether the saddlepoint approximation has been applied.

Step 3

The third step of LDAK-KVIK generates results from gene-based association analysis. The summary statistics are saved in kvik.step3.remls.all, for example:

Gene_Name Gene_Chr Gene_Start Gene_End Length Heritability SD Null_Likelihood Alt_Likelihood LRT_Stat LRT_P_Raw LRT_P_Perm
OR4F5 1 69091 70008 1 0.000075 0.000210 -19830.5503 -19830.3884 0.3239 2.8465e-01 1.7152e-01
LOC100996442 1 142447 174392 3 0.000001 NA -19830.5503 -19830.5503 0.0000 7.5000e-01 6.8280e-01
SAMD11 1 859993 879961 2 0.000001 NA -19830.5503 -19830.5503 0.0000 7.5000e-01 6.8280e-01
NOC2L 1 879583 894679 2 0.000001 NA -19830.5503 -19830.5503 0.0000 7.5000e-01 6.8280e-01
KLHL17 1 895967 901099 1 0.000185 0.000366 -19830.5503 -19829.9048 1.2912 1.2792e-01 7.4051e-02
PLEKHN1 1 901872 910488 1 0.000001 NA -19830.5503 -19830.5503 0.0000 7.5000e-01 6.8280e-01

The column Gene_Name indicates the gene names, and P values are stored in the LRT_P_Perm column. For an overview of the gene-based association method, we refer to the LDAK-GBAT publication.