WISARD[wɪzərd]
Workbench for Integrated Superfast Association study with Related Data
HOME  |   DOWNLOAD  |   OPTIONS  |   TROUBLE?  |   LOGIN
 

Gene-level Association with Population Stratification

Available statistics [top]

Population stratification invalidates results from rare-variant association analysis, and statistics under the absence of population stratification cannot be applied.

Set file which lists the variants belonging to a same gene should be provided and the format for set file can be found at the gene-level analysis page.

Statistical power is affected by several factors; definition of set, and homogeneity of effect of each rare variant on phenotypes. Depending on the characteristic of these factor, the most efficient statistic is different and several statistics should be considered at the same time.

Summary for available statistics:

  1. Methods which are efficient when effects of rare variants are homogeneous
    • CMC test (fast): can be applied for dichotomous phenotypes. May be efficient if the presence/absence of rare alleles (not number of rare alleles) is related with the disease risk.
    • PEDCMC test (fast): PEDCMC test is an extension of CMC test for correlated samples, and may be more efficient than CMC test with genomic control under the population stratification.
    • Collapsing test (fast): can be applied for dichotomous and quantitative phenotypes. May be efficient if the number of rare alleles is related with the disease risk.
  2. Method which is less sensitive to the definition of rare variants
    • FamVT test (moderate): can be applied for quantitative phenotypes. May be efficient if rarer variants have stronger effect on disease.
  3. Method which is efficient if rare variants with both positive and negative effect on disease are grouped to a single set
    • SKAT test (fast): can be applied for dichotomous and quantitative phenotypes. May be efficient if rare variants with positive and negative effect on phenotype are grouped as a set.
  4. Method which combines collapsing test and SKAT test, and is robust to the heterogeneity of effects of rare variants
    • SKAT-o test (moderate): can be applied for dichotomous and quantitative phenotypes. May be efficient if rare variants with positive and negative effect on phenotype are grouped as a set.
    • FARVAT (fast): This is an extension of MQLS to SKAT-o and useful for case-control design. It can be applied for dichotomous and quantitative phenotypes.


CMC test [top]

Combined Multivariate and Collapsing(CMC) test was originally suggested for caes-control design under absence of the population stratification. Under the population stratification, results from CMC is not valid and the genomic control approach should be applied. Fisher's exact method for CMC test cannot be used under the population stratification.

Example code

  • Calculating CMC test with --genetest option with adjustment of population stratification with genomic control
  • CMC test C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --genoctrl --ibs --genetest --out res_collapsing
    It is recommended to perform CMC test with --genoctrl option, to adjust its p-value by genomic control method. If there are too many variants within a gene or MAFs of some variants are large, CMC test is not valid and those genes should be filtered out. The detailed information can be obtained with --verbose option.


PEDCMC test [top]

PEDCMC test was suggested by Zhu and Xiong (Am J Hum Genet 2012). It is an extension of CMC test for correlated samples, and can be applied for dichotomous phenotypes. Covariate effect cannot be adjusted. Robust against the population stratification and may be more efficient than CMC test with adjustment of genomic control under the population stratification.

Example code

  • Calculating PEDCMC test with --genetest option and --pedcmc option, with population stratification
  • Analysis with PEDCMC C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --genetest --pedcmc --out res_pedcmc
    In default, WISARD adjusts population stratification with genetic relationship matrix(GRM), which is firstly proposed in GCTA. In order to change this, please see this section.


Collapsing-based test [top]

Collapsing-based test was suggested by Morris and Zeggini (Genet Epi 2010) and can be applied to dichotomous and quantitative phenotypes. Under the population stratification, PC scores should be calculated from genetic relationship matrix and be included as covariates for logistic regression for dichotomous phenotypes. Under the presence of population stratification, the linear mixed model should be applied with the variance covariance parameterized with ibs matrix.

Example code

  • Dichotomous phenotypes
  • Collapsing test C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --ibs --genetest --out res_collapsing
  • Quantitative phenotypes


FamVT test [top]

VT test was suggested by Price et al (Am J Hum Genet 2010). This idea was applied to Scoreseq (Lin and Tang, Am J Hum Genet 2011) to consider the various thresholds for gene-level analysis under the population stratification. It is a score test and can be applied for quantitative phenotype. Phenotypes are assumed to be normally distributed.

Final p-values for VT test are calcualted with numerical algorithm. If the number of variants is too large, p-values are calculated with Monte Carlo simulation and the number of iteration should be decided with the significance level. For instance if you are interested in the 0.05 significance level, then we suggest to iterate at least 1/0.05 *10 times. It should be noted that the maximum number of iteration is limited to 2^32 - 1

Example codes

  • Calculating FamVT
  • Calculating p-values with FamVT test C:\Users\WISARD> wisard --ped test_miss0.ped --pheno test_miss0_phen.txt --pname height --genetest --ibs --famvt --set test_gene.txt --nperm 10000 --out res_vt_10Kperm
    NOTE!
    In order to estimate proper IBS, more than 10,000 common(MAF>5%) variants are recommended!
    "--nperm 100000" option makes WISARD calculate p-values with 100,000 iterations.


SKAT [top]

SNP-set/Sequence kernel association test (SKAT) was suggested by Wu et al (Am J Hum Genet 2011) and can be applied for dichotomous and quantitative phenotypes. Under the presence of population stratification, the linear mixed model must be used for quantitative phenotype with the variance covariance matrix parameterized with ibs matrix. For dichotomous phenotypes, PC scores should be calculated from genetic relationship matrix and be included as covariates for logistic regression.

Example code

  • Dichotomous phenotypes with PC adjustment
  • SKAT test C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --genetest --indep --pc2cov --pca --skat --out res_skato
  • Quantitative phenotypes with IBS estimation
  • SKAT test C:\Users\WISARD> wisard --bed test_miss0.bed --pheno test_miss0_phen.txt --pname height --set test_gene.txt --genetest --ibs --skat --out res_skato


SKAT-o [top]

SNP-set/Sequence Kernel Association Test-optimal(SKAT-o) is an extension of SKAT and was suggested by Lee et al (Biostatistics 2012). It is a mixture of burden-type test and SKAT, and can be applied to dichotomous and quantitative phenotypes. Under the presence of population stratification, the linear mixed model must be used for quantitative phenotype with the variance covariance matrix parameterized with ibs matrix. For dichotomous phenotypes, PC scores should be calculated from genetic relationship matrix and be included as covariates for logistic regression.

Example code

  • Dichotomous phenotypes with PC adjustment
  • SKAT-o test for dichotomous phenotype C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --genetest --indep --pc2cov --pca --skato --out res_skato
    NOTE!
    SKAT-o test cannot be calculated if a gene contains only a single variant.
  • Quantitative phenotypes with IBS estimation
  • SKAT-o test for continuous phenotype C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --genetest --ibs --skato --out res_skato


FARVAT [top]

It was suggested by Choi et al (2014) and is an extension of MQLS for rare variant association analysis. FARVAT is an optimized gene-level association test of dichotomous trait and under the population stratification, the genetic relationship matrix should be incorporated as a genetic correlation matrix. If there exists some covariate effects or phenotypes are quantitative, FARVAT can be modified but some power loss is expected. It has similar property with SKAT-o.

Example code

Dichotomous phenotype, using GRM

FARVAT for dichotomous phenotype C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --genetest --farvat --prevalence 0.12 --out res_farvat_di
NOTE!
--prevalence is required when running FARVAT without covariate adjustment!

If there is a single variant within a gene, an optimal test of FARVAT is not calculated.

NOTE!
In order to estimate proper GRM, more than 10,000 common(MAF>5%) variants are recommended!

Dichotomous phenotype with multi-ethnic population, using GRM

In order to running FARVAT with multi-ethnic population dataset without covariate adjustment, an appropriate assignment of population-wise prevalence is required. See the below example. Note that every ethnics must be included in assignment of --prevalence.

Dichotomous phenotype FARVAT with multi-ethnic dataset, with ethnic-wise prevalence C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --genetest --farvat --sampvar test_miss0_phen.txt --prevalence ASIA=0.12,EUROPE=0.08,AMERICA=0.15
NOTE!
Column name for population assignment must be POP_COUNT for this case!

Covariate adjustment for dichotomous phenotypes

FARVAT with adjustment of covariate effect from age and height C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --genetest --farvat --sampvar test_miss0_phen.txt --cname age,height --out res_farvat_dicov

Quantitative phenotypes, covariate adjustment with two-step

Step 1) Residual estimation C:\Users\WISARD> wisard --bed test_miss0.bed --makeblup --sampvar test_miss0_phen.txt --pname sbp --cname age,height --out test_lmm
Step 2) FARVAT for quantitative phenotypes C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --genetest --sampvar test_miss0_phen.txt --pname sbp --cname age,height --blup test_lmm.SD.blup --est test_lmm.poly.est.res --farvat --out res_farvat_qu




Edit this page
Last modified : 2017-09-09 13:47:20