WISARD official site

Select O/S : [?]

Case tutorial

Study design
Data management
Summary statistics
Association analysis
- Relationship matrix
- Population stratification
- Association analysis
- Association analysis with population stratification
- Family-based association analysis
- Epistasis analysis
- Gene-level association
- Gene-level association with population stratification
- Family-based gene-level association
- Pathway-level association
- Meta-analysis
Miscellaneous
- Output control
- Link with other tools

Association analysis under the presence of population stratification

This section describes about

Available statistics
Genomic control
EIGENSTRAT
MQLS
GEMMA
Generalized score test
Ordinary QLS
MFQLS

Available statistics [top]

Statistics available by WISARD are illustrated under the presence of population stratification. The most efficient statistic depends on the ascertainment condition, the property of phenotypes (binary/continuous), the presence of covariates and the absence/presence of population stratification. In this section, we assume the presence of population stratification and if you want to check whether population stratification exists, please see the population stratification page for detail.

Summary for available statistics:

Genomic control (fast): it was suggested by Devlin and Roeder (Biometrics 1994) and can be applied to Cochran Armitage test and genotype-based test, and logistic/linear regression.
EIGENSTRAT (fast): it was suggested by Price et al (Nat Genet 2006) and can be applied to logistic/linear regression.
MQLS(also called ROADTRIP) (fast): it is an extended Cochran Armitage test which is robust against the presence of population stratification (Thronton et al Am J Hum Genet 2010).
EMMA/GEMMA (fast): They are Wald test for linear mixed model where variance-covariance matrix is parameterized by ibs matrix.
Generalized score test (fast): generalized score test for linear mixed model for GEMMA.
Ordinary QLS (fast): ordinary QLS test with linear mixed model.
MFQLS (fast): extended MQLS for multiple phenotypes and variants.

Genomic control [top]

Genomic control was suggested by Devlin and Roeder (Biometrics 1994). It can be applied to Cochran Armitage test and genotype-based test, and logistic/linear regression under the presence of population stratification (see association analysis page for these statistics).

If the level of population stratification is substantial, other approaches should be considered and if the level of population stratification is small, its performance may be almost same as other approaches. This approach is computationally fast, and it can be easily calculated.

Example codes

Genomic controlcd for Cochran Armitage test and genotype-based test C:\Users\WISARD> wisard --bed test_miss0.bed --trend --genoctrl --out res_trend_genoctrl

Genomic control for logistic/linear regression C:\Users\WISARD> wisard --bed test_miss0.bed --regression --genoctrl --out res_regr_genoctrl

EIGENSTRAT [top]

EIGENSTRAT was suggested by Price et al (Nat Genet 2006). Even though EIGENSTRAT was suggested for case-control design, the same idea can be applied to logistic/linear regression. Principal component analysis is applied to genetic relationship matrix. Then PC scores can be included as covariates for logistic/linear regression. The required number of PC scores to adjust population stratification depends on the situation and 5 or 10 PC scores are often utilized. Even though this approach is usually more efficient than genomic control, MQLS for dichotomous phenotypes and EMMA/GEMMA for quantitative phenotypes may be more efficient.

EIGENSTRAT approach is usually more efficient than genomic control, but if the level of population stratification is small, its performance may be almost same as genomic control. This approach is computationally fast, and it can be easily calculated. In particular, if the polygenic effect is expected to be large, certain number of PC scores cannot completely adjust the population stratification and in such a case, GEMMA for quantitative phenotypes and MQLS for dichotomous phenotypes may be better choices.

Example codes

EIGENSTRAT for Cochran Armitage test, two-step approach

Step 1: compute 5 PC scores from dataset C:\Users\WISARD> wisard --ped test_miss0.ped --pca --out test_pca

Step 2: do a linear regression analysis where 5 PC scores are considered as covariates C:\Users\WISARD> wisard --ped test_miss0.ped --trend --sampvar test_pca.pca.res --cname PC1-PC5 --out res_regr_with_5pc

EIGENSTRAT for logistic/linear regression, single step approach

Do a linear regression analysis where 5 PC scores are considered as covariates C:\Users\WISARD> wisard --ped test_miss0.ped --regression --pca --pc2cov --out res_regr_with_5pc

population stratification page

MQLS [top]

MQLS is originally suggested for family-based samples (Thornton and McPeek Am J Hum Genet 2007) and it was extended to adjust the population stratification (Thronton et al Am J Hum Genet 2010). The latter was called ROADTRIP in the paper. MQLS is an extended Cochran Armitage test which is robust against the presence of population stratification (Thronton et al Am J Hum Genet 2010), and is a score test for quasi-likelihood.

For MQLS, affected and unaffected individuals are coded as 1 and 0 respectively, and if phenotype is missing, their phenotypes are coded as prevalence. This scheme indicates that individuals with missing genotypes may be affected with a probability of prevalence. Individuals with missing genotype are excluded from analysis.

Because MQLS is an extension of Cochran Armitage test, it is efficient for case-control design but if there are some covariates that need to be adjusted or samples are randomly selected, this approach cannot be applied and some modification is necessary. In such a case, the residuals from the linear mixed model can be utilized as response even though phenotypes are dichotomous (Won and Lange Stat in Med 2013).

Example codes

Running MQLS

Run MQLS by setting prevalence 5% C:\Users\WISARD> wisard --ped test_miss0.ped --mqls --kinship --prevalence 0.05

--mqls

--prevalence

Adjusting covariate effects in MQLS

GEMMA [top]

GEMMA is based on the linear mixed model where the variance-covariance matrix is parameterized with ibs matrix, and is suggested by Zhou and Stephan (Nat Genet 2012).

GEMMA is known to be the most efficient approach for quantitative phenotypes, and if polygenic effect is substantially large(e.g. height), the improvement by them can be substantial. If it is not clear whether polygenic effect is large, heritability for quantitative phenotypes can be alternatively used. Even though parameter estimation in a linear mixed model is usually computationally intensive, computationally efficient algorithm proposed by both approaches enable the genome-wide association anlaysis in a short time. GEMMA provides computationally much efficient strategies, and by default, WISARD calculate GEMMA.

Example codes

Run GEMMA analysis C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname height --gemma --out res_gemma

Generalized score test [top]

WISARD supports generalized score test for linear mixed mode. Linear mixed models for GEMMA and generalized score test are same. GEMMA is Wald tests and Wald tests are known to be statistically more efficient than score tests. However, GEMMA is more sensitive to the normality than generalized score test, and thus if nonnormality is expected, generalized score test may be reasonble choice.

Example codes

Generalized score test

Perform Generalized score test C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname height --scoretest --out res_genscore

Ordinary QLS [top]

Example codes

Perform ordinary QLS C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname height --qls --out res_qls

QLS with covariate adjustment

Perform QLS with covariate adjustment C:\Users\WISARD> wisard --bed test_miss0.bed --qls --sampvar test_miss0_phen.txt --pname height --cname age,weight --out res_qls_cov

QLS with covariate adjustment and without imputation

Perform QLS with covariate adjustment and without genotype imputation C:\Users\WISARD> wisard --bed test_miss2.bed --qls --avail --sampvar test_miss0_phen.txt --pname height --cname age,weight --out res_qls_cov_woimp

MFQLS [top]

MFQLS is an extended MQLS for multiple phenotypes and variants. MQLS can be applied to the dataset having multiple phenotype or multiple variant, such as gene set. Hence, WISARD supports a functionality for applying MQLS to such analysis.

NOTE!

When WISARD executed with --fqls and --mqls concurrently, multiple phenotype/variant cannot be used!

Example codes

MFQLS

Perform MFQLS C:\Users\WISARD> wisard --bed test_miss0.bed --mfqls --out res_mfqls

Adjusting covariate effects in MFQLS

Perform MFQLS with covariate adjustment C:\Users\WISARD> wisard --bed test_miss0.bed --mfqls --sampvar test_miss0_phen.txt --cname age --out res_mfqls_cov

Edit this page

Last modified : 2017-09-11 16:00:25