WISARD[wɪzərd] Workbench for Integrated Superfast Association study with Related Data |
|
This section describes about
Statistics available by WISARD are illustrated under the presence of population stratification. The most efficient statistic depends on the ascertainment condition, the property of phenotypes (binary/continuous), the presence of covariates and the absence/presence of population stratification. In this section, we assume the presence of population stratification and if you want to check whether population stratification exists, please see the population stratification page for detail.
Summary for available statistics:
Genomic control was suggested by Devlin and Roeder (Biometrics 1994). It can be applied to Cochran Armitage test and genotype-based test, and logistic/linear regression under the presence of population stratification (see association analysis page for these statistics).
If the level of population stratification is substantial, other approaches should be considered and if the level of population stratification is small, its performance may be almost same as other approaches. This approach is computationally fast, and it can be easily calculated.
Example codes
EIGENSTRAT was suggested by Price et al (Nat Genet 2006). Even though EIGENSTRAT was suggested for case-control design, the same idea can be applied to logistic/linear regression. Principal component analysis is applied to genetic relationship matrix. Then PC scores can be included as covariates for logistic/linear regression. The required number of PC scores to adjust population stratification depends on the situation and 5 or 10 PC scores are often utilized. Even though this approach is usually more efficient than genomic control, MQLS for dichotomous phenotypes and EMMA/GEMMA for quantitative phenotypes may be more efficient.
EIGENSTRAT approach is usually more efficient than genomic control, but if the level of population stratification is small, its performance may be almost same as genomic control. This approach is computationally fast, and it can be easily calculated. In particular, if the polygenic effect is expected to be large, certain number of PC scores cannot completely adjust the population stratification and in such a case, GEMMA for quantitative phenotypes and MQLS for dichotomous phenotypes may be better choices.
Example codes
MQLS is originally suggested for family-based samples (Thornton and McPeek Am J Hum Genet 2007) and it was extended to adjust the population stratification (Thronton et al Am J Hum Genet 2010). The latter was called ROADTRIP in the paper. MQLS is an extended Cochran Armitage test which is robust against the presence of population stratification (Thronton et al Am J Hum Genet 2010), and is a score test for quasi-likelihood.
For MQLS, affected and unaffected individuals are coded as 1 and 0 respectively, and if phenotype is missing, their phenotypes are coded as prevalence. This scheme indicates that individuals with missing genotypes may be affected with a probability of prevalence. Individuals with missing genotype are excluded from analysis.
Because MQLS is an extension of Cochran Armitage test, it is efficient for case-control design but if there are some covariates that need to be adjusted or samples are randomly selected, this approach cannot be applied and some modification is necessary. In such a case, the residuals from the linear mixed model can be utilized as response even though phenotypes are dichotomous (Won and Lange Stat in Med 2013).
Example codes
GEMMA is based on the linear mixed model where the variance-covariance matrix is parameterized with ibs matrix, and is suggested by Zhou and Stephan (Nat Genet 2012).
GEMMA is known to be the most efficient approach for quantitative phenotypes, and if polygenic effect is substantially large(e.g. height), the improvement by them can be substantial. If it is not clear whether polygenic effect is large, heritability for quantitative phenotypes can be alternatively used. Even though parameter estimation in a linear mixed model is usually computationally intensive, computationally efficient algorithm proposed by both approaches enable the genome-wide association anlaysis in a short time. GEMMA provides computationally much efficient strategies, and by default, WISARD calculate GEMMA.
Example codes
WISARD supports generalized score test for linear mixed mode. Linear mixed models for GEMMA and generalized score test are same. GEMMA is Wald tests and Wald tests are known to be statistically more efficient than score tests. However, GEMMA is more sensitive to the normality than generalized score test, and thus if nonnormality is expected, generalized score test may be reasonble choice.
Example codes
Example codes
MFQLS is an extended MQLS for multiple phenotypes and variants. MQLS can be applied to the dataset having multiple phenotype or multiple variant, such as gene set. Hence, WISARD supports a functionality for applying MQLS to such analysis.
NOTE! |
When WISARD executed with --fqls and --mqls concurrently, multiple phenotype/variant cannot be used! |
Example codes