WISARD[wɪzərd]
Workbench for Integrated Superfast Association study with Related Data
HOME  |   DOWNLOAD  |   OPTIONS  |   TROUBLE?  |   LOGIN
 

Family-based association tests

This section describes about

  • Available statistics
    • Transmission Disequilibrium Test (TDT)
      • Sibship TDT (SDT)
        • MQLS
          • FQLS
            • Using proband information
            • Example codes
          • GEMMA
            • Generalized score test
              • MFQLS

                Available statistics [top]

                The most efficient statistic depends on the ascertainment condition, the property of phenotypes (dichotomous/quantitative), the presence of covariates and the absence/presence of population stratification. In this section, we illustrate statistics for family-based samples.

                Summary for available statistics:

                Test name Speed Phenotype Covariates Description
                Transmission Disequilibrium Test (TDT) (fast) Dichotomous Can't adjust It is always robust against the population stratification but because parental genotypes are not used, it is often statistically inefficient.
                Sibship TDT (SDT) (fast) Dichotomous Can't adjust The original TDT is unapplicable if parental genotypes are unknown. SDT overcomes by using variant data from unaffected sibs (Speilman and Ewens Am J Hum Genet 1998).
                MQLS (slow) Dichotomous Can't adjust It is an extended Cochran Armitage test for family-based samples (Thronton et al Am J Hum Genet 2007). It is for dichotomous phenotypes, and for ascertained families it is usually efficient. Covariate effects cannot be adjusted, but some modification enables adjustment of covariate effects.
                Family QLS (FQLS) (moderate) Dichotomous/continuous Can't adjust It is an extended Cochran Armitage test for family-based samples (Thronton et al Am J Hum Genet 2007). It is for dichotomous phenotypes, and for ascertained families it is usually efficient. Covariate effects cannot be adjusted, but some modification enables adjustment of covariate effects.
                EMMA/GEMMA (fast) Continuous Adjust they are Wald test for linear mixed model where variance-covariance matrix is parameterized by kinship coefficient matrix. They are for quantitative phenotypes and usually efficient for randomly selected samples.
                Generalized score test (fast) Continuous Adjust generalized score test for linear mixed model for EMMAX/GEMMA. It can be applied to quantitative phenotypes and usually efficient for randomly selected sample.
                MFQLS (fast) Continuous & multivariate Adjust It is an extended MQLS for joint analysis with multiple phenotypes and multiple variants.


                Transmission Disequilibrium Test (TDT) [top]

                TDT(Speilman et al Am J Hum Genet 1993) is an association test for family-based samples and TDT tests whether transmitted alleles are different between cases.

                TDT is always robust against population stratification. For large-scale genetic data, several statistics such as genomic controls, EIGENSTRAT, etc are robust against population stratification. However if the number of variants is not sufficiently large, they are not robust aginst population stratification but TDT is still robust. TDT does not utilize founders' genotypes and it is often statistically inefficient. In this reason, TDT is often used for candidate gene analysis.

                WISARD can perform TDT by using --tdt option.

                • Example code
                • Analysis with TDT C:\Users\WISARD> wisard --bed test_miss0.bed --tdt --out res_tdt
                • Output file
                  tdt.res is... A result of Transmission Disequilibrium Test (TSV)
                  Column Format Modifier Description
                  CHR integer NONE Proportion of missingess for the sample
                  VARIANT integer NONE Proportion of missingess for the sample
                  POS integer NONE Proportion of missingess for the sample
                  ALT integer NONE Proportion of missingess for the sample
                  ANNOT integer NONE Proportion of missingess for the sample
                  PHENO string --sampvar,--pname Tesed phenotype
                  STAT real NONE Statistic from TDT
                  P_TDT real NONE p-value from TDT


                Sibship TDT (SDT) [top]

                TDT can be performed only when both parent and child's genotype are available. However, parental genotypes can sometimes be unavailable. SDT overcomes by using variant data from unaffected sibs (Speilman and Ewens Am J Hum Genet 1998).

                The general statistical property is similar with TDT, and WISARD can perform SDT by using --sdt option.

                • Example code
                • Analysis with SDT C:\Users\WISARD> wisard --bed test_miss0.bed --sdt --out res_sdt
                • Output file
                  sdt.res is... A result of Sibship Disequilibrium Test (TSV)
                  Column Format Modifier Description
                  CHR real NONE p-value from TDT
                  VARIANT real NONE p-value from TDT
                  POS real NONE p-value from TDT
                  ALT real NONE p-value from TDT
                  ANNOT real NONE p-value from TDT
                  PHENO string --sampvar,--pname Tesed phenotype
                  STAT real NONE Statistic from SDT
                  P_TDT real NONE p-value from SDT


                MQLS [top]

                MQLS is an extended Cochran Armitage test and is suggested for family-based samples (Thornton and McPeek Am J Hum Genet 2007). It is a score test based on quasi-likelihood. MQLS under the presence and the absence of population stratification are same other than the choice of relationship matrix; under the presence of population stratification, genetic relationship matrix should be incorporated and under the absence of popoulation stratification, kinship coefficient matrix should be used.

                For MQLS, affected and unaffected individuals are coded as 1 and 0 respectively, and if phenotype is missing, their phenotypes are coded as prevalence. This scheme indicates that individuals with missing genotypes may be affected with a probability of prevalence. Individuals with missing genotype are excluded from analysis or missing genotypes can be replaced with 2$\times$MAF.

                Because MQLS is an extension of Cochran Armitage test for family-based design, it is efficient for case-control design but if there are some covariates that need to be adjusted or samples are randomly selected, some modification is necessary. In such a case, the residuals from the linear mixed model can be utilized as response even though phenotypes are dichotomous (Won and Lange Stat in Med 2013).

                Example codes

                • Running MQLS
                • Run MQLS by setting prevalence 5% C:\Users\WISARD> wisard --ped test_miss0.ped --mqls --kinship --prevalence 0.05
                  WISARD can perform MQLS by using --mqls option. MQLS requires the information about prevalence and it can be set by using --prevalence option. The prevalence is not related with the type-1 error rate but the accurate prevalence minimizes the false negative finding.
                • Randomly selected samples and covariate effect adjustment
                • Under the presence of population stratification, statistical analyses with MQLS for population-based and family-based designs are completely same and see the association analysis under the presence of population stratification for the example code.


                FQLS [top]

                Family-based quasi-likelihood score(FQLS) test is an extended MQLS and is more efficient than MQLS if each family is ascertained by some probands. When each family is ascertained by some probands, the ascertainment bias depends on the relationship with probands, and it is heterogeneous. The heterogeneity of ascertainment bias is substantial for large family and FQLS adjusts the heterogeneity bias by liability model. If heritability is large, the power improvement is substantial.

                FQLS can be applied to both dichotomous or continuous phenotypes, and modification is necessary if there are some covariante effects to be adjusted or phenotypes are quantitative.

                FQLS using WISARD can be performed with an assignment of --fqls option.
                In default, --fqls requires below additional options, and performed with an imputation of missinge genotype as 2*maf, where maf is minor allele frequency of given variant from founders.

                In default, FQLS computes offset based on the following factors: pedigree structure, proband status, heritability and prevalence. Among of those factors, prevalence should be omitted when the phenotype is not dichotomous.

                Using proband information

                WISARD provides two ways for estimating offset: Assume each family member is potential proband, or there is an exact information who are the probands. If there is an information of proband status for each individual, it is possible to utilize that information into FQLS analysis. In order to do that, sample variable(with --sampvar) is required in default. As introduced in the sample variable section, there are a number of 'reserved' column name for the other usage, and proband status is one of them. In default, if there is a column named 'PROBAND' in the provided sample variable file, WISARD automatically detects it and retrieve it as proband information for each sample. To assign correct proband status, some conditions as the below are required.

                • A column name for proband status should be 'PROBAND' without quote. It should be case-sensitive. If the column name is different, --probandcol should be given to let WISARD know which column is that.
                • Proband status should be given as an integer, either 0(not proband) or 1(proband). Otherwise it will be misinterpreted.
                • At least one of proband should be exist per one family. Otherwise it will raise an error.

                Example codes

                • Setting prevalence of dichotomous disease and heritability
                • Calculate FQLS with prevalence 5%, heritability 80% C:\Users\WISARD> wisard --ped test_miss0.ped --fqls --kinship --prevalence 0.05 --heri 0.8

                  For family-based association analysis, kinship coefficient matrix should be used as a relationship matrix by using --kinship option. If there exists population stratification, genetic relationship matrix must be incorporated.

                • Setting heritability only for continuous phenotype
                • Calculate FQLS for continuous phenotype with heritability 80% C:\Users\WISARD> wisard --ped test_miss0.ped --fqls --kinship --heri 0.8 --sampvar test_miss0_phen.txt --pname height
                  Unlike FQLS with dichotomous phenotype, continuous phenotype does not requires prevalence information for running FQLS. In this case, only heritability is required to perform FQLS but Linear Mixed Model (LMM) will be fitted to compute offset.
                • Assign proband information with dichotomous phenotype
                • Perform FQLS analysis with proband information C:\Users\WISARD> wisard --ped testdat.ped --fqls --kinship --prevalence 0.05 --heri 0.8 --sampvar proband.txt
                  Perform FQLS analysis with proband information and assign non-default column name for proband C:\Users\WISARD> wisard --ped testdat.ped --fqls --kinship --prevalence 0.05 --heri 0.8 --sampvar proband.txt --probandcol T2D_PROBAND
                • Missing genotype handling
                • Calculate FQLS with prevalence 5%, heritability 80%, and set p-value threshold 10.0^-7 C:\Users\WISARD> wisard --ped testdat.ped --fqls --retestthr 1e-7 --kinship --prevalence 0.05 --heri 0.8
                  How to handle missing data for genotypes affects the computational intensity and WISARD provides two different ways to handle missing data. By default, missing value for genotypes are replaced with 2$\times$maf and calculate MQLS, and by using --availonly option, individuals with missing genotypes are excluded from the analysis. It should be noted that the latter is computataionally intensive. The latter for variants of which p-values are less than threshold and the former for the other variants can be performed by using "--retestthr threshold" option.
                  NOTE!
                  --retestthr and --availonly cannot be used simultaneously!


                GEMMA [top]

                For family-based design, GEMMA can be utilized under both presence and absence of population stratification. Under the presence of population stratification, exactly same statistics and codes for GEMMA with population-based samples under the presence of population stratification (see the association analysis under the presence of population stratification for example code). Difference of statistics and WISARD code for family-based design under presence and absence of population stratification is only the choice of relationship matrix, and the kinship coefficient matrix instead of ibs matrix should be used under the absence of population stratification.

                GEMMA are known to be the most efficient approach for quantitative phenotypes (Kang et al Nat Genet 2010), and if polygenic effect is substantially large(e.g. height), the improvement by them can be substantial. If it is not clear whether polygenic effect is large, heritability for quantitative phenotypes can be alternatively used. Even though parameter estimation in a linear mixed model is usually computationally intensive, computationally efficient algorithm proposed by both approaches enable the genome-wide association anlaysis in a short time. GEMMA provides computationally much efficient strategies, and by default, WISARD calculate GEMMA.

                Example codes

                • GEMMA
                  Perform GEMMA C:\Users\WISARD> wisard --bed test_miss0.bed --gemma --out res_gemma
                • GEMMA with covariate
                  Perform GEMMA with covariate adjustment C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --cname age,height --gemma --out res_gemma_cov


                Generalized score test [top]

                For family-based design, generalized score test can be utilized under both presence and absence of population stratification. Under the presence of population stratification, exactly same statistics and codes for population-based samples under the presence of population stratification (see the association analysis under the presence of population stratification for example code). Difference of statistics and WISARD code for family-based design under presence and absence of population stratification is only the choice of relationship matrix, and the kinship coefficient matrix instead of ibs matrix should be used under the absence of population stratification.

                WISARD supports generalized score test for linear mixed mode. Linear mixed models for EMMAX/GEMMA and generalized score test are same. EMMAX/GEMMA are Wald tests and Wald tests are known to be statistically more efficient than score tests. However, EMMAX/GEMMA are more sensitive to the normality than generalized score test, and thus if nonnormality is expected, generalized score test may be reasonble choice.

                Example codes

                • Generalized score test
                  Perform generalized score test for family-based dataset C:\Users\WISARD> wisard --bed test_miss0.bed --scoretest --kinship
                  NOTE!
                  If the dataset is not family-based, generalized score test will not be performed!


                MFQLS [top]

                For family-based design, MFQLS can be utilized under both presence and absence of population stratification. Under the presence of population stratification, exactly same statistics and codes for population-based samples under the presence of population stratification (see the association analysis under the presence of population stratification for example code). Difference of statistics and WISARD code for family-based design under presence and absence of population stratification is only the choice of relationship matrix, and the kinship coefficient matrix instead of ibs matrix should be used under the absence of population stratification.

                MFQLS is an extended MQLS for multiple phenotypes and variants. MQLS can be applied to the dataset having multiple phenotype or multiple variant, such as gene set. Hence, WISARD supports a functionality for applying MQLS to such analysis.

                NOTE!
                When WISARD executed with --fqls and --mqls concurrently, multiple phenotype/variant cannot be used!

                Example codes

                • MFQLS
                  Perform MFQLS C:\Users\WISARD> wisard --bed test_miss0.bed --mfqls --out res_mfqls
                • Adjusting covariate effects in MFQLS
                  Perform MFQLS with covariate adjustment C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --cname age,height --mfqls --out res_mfqls_cov
                • MFQLS with multiple phenotypes
                  Perform MFQLS with multiple phenotypes C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname sbp,dbp --mfqls --out res_mfqls_multi


                Edit this page
                Last modified : 2017-09-11 15:57:23