WISARD[wɪzərd] Workbench for Integrated Superfast Association study with Related Data |
|
Population stratification invalidates results from rare-variant association analysis, and statistics under the absence of population stratification cannot be applied.
Set file which lists the variants belonging to a same gene should be provided and the format for set file can be found at the gene-level analysis page.
Statistical power is affected by several factors; definition of set, and homogeneity of effect of each rare variant on phenotypes. Depending on the characteristic of these factor, the most efficient statistic is different and several statistics should be considered at the same time.
Summary for available statistics:
Combined Multivariate and Collapsing(CMC) test was originally suggested for caes-control design under absence of the population stratification. Under the population stratification, results from CMC is not valid and the genomic control approach should be applied. Fisher's exact method for CMC test cannot be used under the population stratification.
Example code
PEDCMC test was suggested by Zhu and Xiong (Am J Hum Genet 2012). It is an extension of CMC test for correlated samples, and can be applied for dichotomous phenotypes. Covariate effect cannot be adjusted. Robust against the population stratification and may be more efficient than CMC test with adjustment of genomic control under the population stratification.
Example code
Collapsing-based test was suggested by Morris and Zeggini (Genet Epi 2010) and can be applied to dichotomous and quantitative phenotypes. Under the population stratification, PC scores should be calculated from genetic relationship matrix and be included as covariates for logistic regression for dichotomous phenotypes. Under the presence of population stratification, the linear mixed model should be applied with the variance covariance parameterized with ibs matrix.
Example code
VT test was suggested by Price et al (Am J Hum Genet 2010). This idea was applied to Scoreseq (Lin and Tang, Am J Hum Genet 2011) to consider the various thresholds for gene-level analysis under the population stratification. It is a score test and can be applied for quantitative phenotype. Phenotypes are assumed to be normally distributed.
Final p-values for VT test are calcualted with numerical algorithm. If the number of variants is too large, p-values are calculated with Monte Carlo simulation and the number of iteration should be decided with the significance level. For instance if you are interested in the 0.05 significance level, then we suggest to iterate at least 1/0.05 *10 times. It should be noted that the maximum number of iteration is limited to 2^32 - 1
Example codes
NOTE! |
In order to estimate proper IBS, more than 10,000 common(MAF>5%) variants are recommended! |
SNP-set/Sequence kernel association test (SKAT) was suggested by Wu et al (Am J Hum Genet 2011) and can be applied for dichotomous and quantitative phenotypes. Under the presence of population stratification, the linear mixed model must be used for quantitative phenotype with the variance covariance matrix parameterized with ibs matrix. For dichotomous phenotypes, PC scores should be calculated from genetic relationship matrix and be included as covariates for logistic regression.
Example code
SNP-set/Sequence Kernel Association Test-optimal(SKAT-o) is an extension of SKAT and was suggested by Lee et al (Biostatistics 2012). It is a mixture of burden-type test and SKAT, and can be applied to dichotomous and quantitative phenotypes. Under the presence of population stratification, the linear mixed model must be used for quantitative phenotype with the variance covariance matrix parameterized with ibs matrix. For dichotomous phenotypes, PC scores should be calculated from genetic relationship matrix and be included as covariates for logistic regression.
Example code
NOTE! |
SKAT-o test cannot be calculated if a gene contains only a single variant. |
It was suggested by Choi et al (2014) and is an extension of MQLS for rare variant association analysis. FARVAT is an optimized gene-level association test of dichotomous trait and under the population stratification, the genetic relationship matrix should be incorporated as a genetic correlation matrix. If there exists some covariate effects or phenotypes are quantitative, FARVAT can be modified but some power loss is expected. It has similar property with SKAT-o.
Example code
NOTE! |
--prevalence is required when running FARVAT without covariate adjustment! |
If there is a single variant within a gene, an optimal test of FARVAT is not calculated.
NOTE! |
In order to estimate proper GRM, more than 10,000 common(MAF>5%) variants are recommended! |
In order to running FARVAT with multi-ethnic population dataset without covariate adjustment, an appropriate assignment of population-wise prevalence is required. See the below example. Note that every ethnics must be included in assignment of --prevalence.
NOTE! |
Column name for population assignment must be POP_COUNT for this case! |