WISARD[wɪzərd]
Workbench for Integrated Superfast Association study with Related Data
HOME  |   DOWNLOAD  |   OPTIONS  |   TROUBLE?  |   LOGIN
 

Association analysis under the presence of population stratification

This section describes about

  • Available statistics
    • Genomic control
      • EIGENSTRAT
        • MQLS
          • GEMMA
            • Generalized score test
              • Ordinary QLS
                • MFQLS

                  Available statistics [top]

                  Statistics available by WISARD are illustrated under the presence of population stratification. The most efficient statistic depends on the ascertainment condition, the property of phenotypes (binary/continuous), the presence of covariates and the absence/presence of population stratification. In this section, we assume the presence of population stratification and if you want to check whether population stratification exists, please see the population stratification page for detail.

                  Summary for available statistics:

                  • Genomic control (fast): it was suggested by Devlin and Roeder (Biometrics 1994) and can be applied to Cochran Armitage test and genotype-based test, and logistic/linear regression.
                  • EIGENSTRAT (fast): it was suggested by Price et al (Nat Genet 2006) and can be applied to logistic/linear regression.
                  • MQLS(also called ROADTRIP) (fast): it is an extended Cochran Armitage test which is robust against the presence of population stratification (Thronton et al Am J Hum Genet 2010).
                  • EMMA/GEMMA (fast): They are Wald test for linear mixed model where variance-covariance matrix is parameterized by ibs matrix.
                  • Generalized score test (fast): generalized score test for linear mixed model for GEMMA.
                  • Ordinary QLS (fast): ordinary QLS test with linear mixed model.
                  • MFQLS (fast): extended MQLS for multiple phenotypes and variants.


                  Genomic control [top]

                  Genomic control was suggested by Devlin and Roeder (Biometrics 1994). It can be applied to Cochran Armitage test and genotype-based test, and logistic/linear regression under the presence of population stratification (see association analysis page for these statistics).

                  If the level of population stratification is substantial, other approaches should be considered and if the level of population stratification is small, its performance may be almost same as other approaches. This approach is computationally fast, and it can be easily calculated.

                  Example codes

                  Genomic controlcd for Cochran Armitage test and genotype-based test C:\Users\WISARD> wisard --bed test_miss0.bed --trend --genoctrl --out res_trend_genoctrl
                  Genomic control for logistic/linear regression C:\Users\WISARD> wisard --bed test_miss0.bed --regression --genoctrl --out res_regr_genoctrl


                  EIGENSTRAT [top]

                  EIGENSTRAT was suggested by Price et al (Nat Genet 2006). Even though EIGENSTRAT was suggested for case-control design, the same idea can be applied to logistic/linear regression. Principal component analysis is applied to genetic relationship matrix. Then PC scores can be included as covariates for logistic/linear regression. The required number of PC scores to adjust population stratification depends on the situation and 5 or 10 PC scores are often utilized. Even though this approach is usually more efficient than genomic control, MQLS for dichotomous phenotypes and EMMA/GEMMA for quantitative phenotypes may be more efficient.

                  EIGENSTRAT approach is usually more efficient than genomic control, but if the level of population stratification is small, its performance may be almost same as genomic control. This approach is computationally fast, and it can be easily calculated. In particular, if the polygenic effect is expected to be large, certain number of PC scores cannot completely adjust the population stratification and in such a case, GEMMA for quantitative phenotypes and MQLS for dichotomous phenotypes may be better choices.

                  Example codes

                  • EIGENSTRAT for Cochran Armitage test, two-step approach
                  • Step 1: compute 5 PC scores from dataset C:\Users\WISARD> wisard --ped test_miss0.ped --pca --out test_pca
                    Step 2: do a linear regression analysis where 5 PC scores are considered as covariates C:\Users\WISARD> wisard --ped test_miss0.ped --trend --sampvar test_pca.pca.res --cname PC1-PC5 --out res_regr_with_5pc
                  • EIGENSTRAT for logistic/linear regression, single step approach
                  • Do a linear regression analysis where 5 PC scores are considered as covariates C:\Users\WISARD> wisard --ped test_miss0.ped --regression --pca --pc2cov --out res_regr_with_5pc
                    Further detail is explained in population stratification page


                  MQLS [top]

                  MQLS is originally suggested for family-based samples (Thornton and McPeek Am J Hum Genet 2007) and it was extended to adjust the population stratification (Thronton et al Am J Hum Genet 2010). The latter was called ROADTRIP in the paper. MQLS is an extended Cochran Armitage test which is robust against the presence of population stratification (Thronton et al Am J Hum Genet 2010), and is a score test for quasi-likelihood.

                  For MQLS, affected and unaffected individuals are coded as 1 and 0 respectively, and if phenotype is missing, their phenotypes are coded as prevalence. This scheme indicates that individuals with missing genotypes may be affected with a probability of prevalence. Individuals with missing genotype are excluded from analysis.

                  Because MQLS is an extension of Cochran Armitage test, it is efficient for case-control design but if there are some covariates that need to be adjusted or samples are randomly selected, this approach cannot be applied and some modification is necessary. In such a case, the residuals from the linear mixed model can be utilized as response even though phenotypes are dichotomous (Won and Lange Stat in Med 2013).

                  Example codes

                  • Running MQLS
                  • Run MQLS by setting prevalence 5% C:\Users\WISARD> wisard --ped test_miss0.ped --mqls --kinship --prevalence 0.05
                    WISARD can perform MQLS by using --mqls option. MQLS test requires the information about prevalence and it can be set by using --prevalence option. The prevalence is not related with the type-1 error rate but the accurate prevalence minimizes the false negative finding.
                  • Adjusting covariate effects in MQLS


                  GEMMA [top]

                  GEMMA is based on the linear mixed model where the variance-covariance matrix is parameterized with ibs matrix, and is suggested by Zhou and Stephan (Nat Genet 2012).

                  GEMMA is known to be the most efficient approach for quantitative phenotypes, and if polygenic effect is substantially large(e.g. height), the improvement by them can be substantial. If it is not clear whether polygenic effect is large, heritability for quantitative phenotypes can be alternatively used. Even though parameter estimation in a linear mixed model is usually computationally intensive, computationally efficient algorithm proposed by both approaches enable the genome-wide association anlaysis in a short time. GEMMA provides computationally much efficient strategies, and by default, WISARD calculate GEMMA.

                  Example codes

                  Run GEMMA analysis C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname height --gemma --out res_gemma


                  Generalized score test [top]

                  WISARD supports generalized score test for linear mixed mode. Linear mixed models for GEMMA and generalized score test are same. GEMMA is Wald tests and Wald tests are known to be statistically more efficient than score tests. However, GEMMA is more sensitive to the normality than generalized score test, and thus if nonnormality is expected, generalized score test may be reasonble choice.

                  Example codes

                  • Generalized score test
                  • Perform Generalized score test C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname height --scoretest --out res_genscore


                  Ordinary QLS [top]

                  Example codes

                  • QLS
                  • Perform ordinary QLS C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname height --qls --out res_qls
                  • QLS with covariate adjustment
                  • Perform QLS with covariate adjustment C:\Users\WISARD> wisard --bed test_miss0.bed --qls --sampvar test_miss0_phen.txt --pname height --cname age,weight --out res_qls_cov
                  • QLS with covariate adjustment and without imputation
                  • Perform QLS with covariate adjustment and without genotype imputation C:\Users\WISARD> wisard --bed test_miss2.bed --qls --avail --sampvar test_miss0_phen.txt --pname height --cname age,weight --out res_qls_cov_woimp

                  MFQLS [top]

                  MFQLS is an extended MQLS for multiple phenotypes and variants. MQLS can be applied to the dataset having multiple phenotype or multiple variant, such as gene set. Hence, WISARD supports a functionality for applying MQLS to such analysis.

                  NOTE!
                  When WISARD executed with --fqls and --mqls concurrently, multiple phenotype/variant cannot be used!

                  Example codes

                  • MFQLS
                  • Perform MFQLS C:\Users\WISARD> wisard --bed test_miss0.bed --mfqls --out res_mfqls
                  • Adjusting covariate effects in MFQLS
                  • Perform MFQLS with covariate adjustment C:\Users\WISARD> wisard --bed test_miss0.bed --mfqls --sampvar test_miss0_phen.txt --cname age --out res_mfqls_cov


                  Edit this page
                  Last modified : 2017-09-11 16:00:25