WISARD[wɪzərd]
Workbench for Integrated Superfast Association study with Related Data
HOME  |   DOWNLOAD  |   OPTIONS  |   TROUBLE?  |   LOGIN
 

Association Analysis

This section describes about

  • Available statistics
    • Quantitative/dichotomous phenotype
      • Cochran-Armitage trend test and genotype based test
        • Fisher's exact test
          • Logistic/linear regression
            • Gene-environment interaction for logistic/linear regression
            • Weighting samples using Weighted Least Squares
          • LASSO regression

            Available statistics [top]

            Statistics available by WISARD are illustrated under the absence of population stratification. The most efficient statistic depends on the ascertainment condition, the property of phenotypes (binary/continuous), the presence of covariates and the absence/presence of population stratification. In this section, we assume the absence of population stratification and if you want to check whether population stratification exists, please see the population stratification page for detail and appropriate statistics under the presence of population substructure can be found at association analysis under the population stratification page.

            Summary for available statistics:

            • Cochran-Armitage trend test (fast): useful for case-control design when covariate adjustment is not required. It is the most efficient for additive disease mode of inheritance
            • Genotype-based test (fast): useful for case-control design when covariate adjustment is not required. It is better to consider genotype-based test and Cochran-Armitage trend test if the disease mode of inheritance is unclear.
            • Fisher's exact test (fast): useful for case-control design when covariate adjustment is not required and the sample size is relatively small
            • Logistic regression (fast): useful for case-control design when covariate adjustment is required and the sample size is relatively large.
            • Linear regression (fast): useful for random samples for quantitative phenotypes and when covariate adjustment is required.
            • LASSO regression (moderate): can be used for variable selection.


            Quantitative/dichotomous phenotype [top]

            WISARD automatically determines whether each phenotype is either quantitative or dichotomous. By default if only 1, or 2 are observed as phenotypic values, it is assumed to be dichotomous by WISARD and otherwise it is to be quantitative phenotype. With some options, phenotypes with different values can be assumed to be dichotomous.

            • --1case: this option makes the phenotype of which observation is either 0 or 1 be dichotomous by WISARD.
            • --cact value1,value2: this option makes the phenotype of which observation is value1 or value2 be dichotomous. For instance, "--cact 1,0 " option has same meaning as "--1case".

            Cochran-Armitage trend test and genotype based test [top]

            Cochran-Armitage trend test and genotype-based test are utilized for case-control design (dichotomous phenotypes and ascertained samples). Weights for Cochran-Armitage trend test are calculated for additive disease mode of inheritance, and it follows the chi-square distribution with a single degree of freedom under the null hypothesis. Genotype-based test is a Pearson chi-square test and because there are three categories (e.g. AA/AT/TT) for genotypes, it follows the chi-square test with 2 degrees of freedom under the null hypothesis. Therefore Cochran-Armitage trend test is optimized for additive disease mode of inheritance, and if disease mode of inheritance is unclear, genotype-based test is more efficient than Cochran-Armitage trend test. In particular, additive disease mode of inheritance is popular and thus we recommend to consider both statistics if disease mode of inheritance is unclear.

            It should be noted that even for dichotomous phenotypes, both tests are not efficient if samples are randomly selected or there are some covariates that need to be adjusted, and alternative test should be considered.

            Both tests can be conducted with WISARD by using --trend option.

            • Example code
            • Cochran-Armitage trend test and genotype-based test C:\Users\WISARD> wisard --bed test_miss0.bed --trend --out res_2x3
              Here the phenotype which is in the sixth column of ped/bed file is assumed to be an affection status for Fisher's exact test and by using --pname, different phenotype can be used for affection status. Running WISARD with the above code generates the res_2x3.trend.res file.
            • Output file
              trend.res is... A result of Cochran-Armitage trend test and allelic based test for the testing of association in case/control dataset (TSV)
              Column Format Modifier Description
              CHR integer NONE Incorporated number of individuals for the test
              VARIANT integer NONE Incorporated number of individuals for the test
              POS integer NONE Incorporated number of individuals for the test
              ALT integer NONE Incorporated number of individuals for the test
              ANNOT integer NONE Incorporated number of individuals for the test
              PHENO string --sampvar,--pname Tested phenotype
              P_TREND real NONE Asymptotic p-value of Cochran-Armitage trend test
              P_ABT real NONE Asymptotic p-value of allelic based test


            Fisher's exact test [top]

            Even though Cochran-Armitage test and genotype-based test are usually efficient for case-control design, their p-values are not valid under the null hypothesis. Fisher's exact test is often utilized for case-control design if the sample size is relatively small. Threshold for small sample size is not clear but it obviously depends on MAF of each variant and the significance level $\alpha$. Thus, our suggestion is to consider the Fisher's exact test for variants of which MAF is less than 0.1 or if sample size is less than 1,000. WISARD can calculate Fisher's exact test by using --fisher option.

            • Example code for Fisher's exact test
            • Fisher's exact test C:\Users\WISARD> wisard --bed test_miss0.bed --fisher --out res_fisher
              Here the phenotype which is in the sixth column of ped/bed file is assumed to be affection status for Fisher's exact test and by using --pname, different phenotype can be used for affection status.
            • Output file
              fisher.res is... A result of Fisher's exact test for the testing of association in case/control dataset (TSV)
              Column Format Modifier Description
              CHR real NONE Asymptotic p-value of allelic based test
              VARIANT real NONE Asymptotic p-value of allelic based test
              POS real NONE Asymptotic p-value of allelic based test
              ALT real NONE Asymptotic p-value of allelic based test
              ANNOT real NONE Asymptotic p-value of allelic based test
              PHENO string --sampvar,--pname Tested phenotype
              P_FISHER real NONE Exact p-value of Fisher's exact test


            Logistic/linear regression [top]

            For dichotomous phenotype, the presence of covariates makes Cochran-Armitage test and genotype-based test inefficient, and in this case, the logistic regression is an alternative choice. For quantitatie phenotype, the linear regression is efficient for random samples. If individuals with extreme phenotypes are selected, phenotypes are often not normally distributed and different statistics should be considered.

            WISARD can perform logistic/linear regression analysis. If phenotypes are dichotomous, logistic regression is applied and if phenotypes are continuous, linear regression is applied.

            Logistic/linear regression can be performed with WISARD by using --regression option.

            • Example code for logistic/liear regression
            • Logistic/linear regression analysis using default phenotype C:\Users\WISARD> wisard --bed test_miss0.bed --regression --out res_regr
              Here by default, the phenotype which is in the sixth column of ped/bed file is assumed to be a response variable and no covariate is included for logistic/linear regression.
            • Example code for specifying covariates with --cname option
            • Specifying covariates for logistic/linear regression C:\Users\WISARD> wisard --bed test_miss0.bed --regression --sampvar test_miss0_phen.txt --cname age,height --out res_regr_cov
              By default, the phenotype which is in the sixth column of ped/bed file is assumed to be a response variable, and age and height are included as covariates for logistic/linear regression.
            • Example code for specifying response with --pname option
            • Specifying response variable and covariates for logistic/linear regression C:\Users\WISARD> wisard --bed test_miss0.bed --regression --sampvar test_miss0_phen.txt --pname sbp --cname age,height --out res_regr_height_cov
              SBP which is in test_miss0_phen.txt file is assumed to be a response variable, and age and height are included as covariates for logistic/linear regression.
            • Output file
              linear.regr.res is... A result of regression analysis with continuous phenotype (TSV)
              Column Format Modifier Description
              VARIANT string continuous phenotype Tested variant name
              ANNOT string continuous phenotype,--annogene Annotation for the variant
              NMISS integer continuous phenotype,--avail Number of missing genotype for the variant
              NIMP integer continuous phenotype,w/o --avail Number of imputed genotype for the variant
              BETA real continuous phenotype Beta coefficient for the variant
              STAT real continuous phenotype Wald statistic of the beta coefficient for the variant
              PVAL real continuous phenotype p-value from Wald statistic of the variant
              GINV 0/1 continuous phenotype Generalized inverion used(1) or not(0) in the test
              BETA_(covariate name) real continuous phenotype,--sampvar,--cname Beta coefficient for the covariates
              STAT_(covariate name) real continuous phenotype,--sampvar,--cname Wald statistic of the beta coefficient for the covariates
              PVAL_(covariate name) real continuous phenotype,--sampvar,--cname p-value from Wald statistic of the covariates
              BETA_(covariate name)*GENO real continuous phenotype,--sampvar,--cname,--gxe Beta coefficient for the interaction between covariate and variant
              STAT_(covariate name)*GENO real continuous phenotype,--sampvar,--cname,--gxe Wald statistic of the beta coefficient for the interaction between covariate and variant
              PVAL_(covariate name)*GENO real continuous phenotype,--sampvar,--cname,--gxe p-value from Wald statistic of the interaction between covariate and variant

            Gene-environment interaction for logistic/linear regression

            WISARD can include the gene-environment interaction for logistic/linear regression by using --gxe and --gxecovs options.

            • Example code for gene-environment interaction for logistic/linear regression
            • Performing gene-environment interaction analysis C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --cname age,height --regression --gxe --out res_regr_gxe_all
              --gxe option without --gxecovs makes WISARD include the gene-environment interaction for all covariates. Therefore for this example code, age and sex are used as covariates, and age*variant and sex*variant for gene-environment interaction are included by default.
            • Example code for specifying certain gene-environment interactions for logistic/linear regression
            • Specifying certain gene-environment interactions C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --cname age,height --regression --gxe --gxecovs age --out res_regr_gxe_age
              --gxecovs option should be with --gxe option, and it is used to specify covariates for gene-environment interaction effect. For the example code, age and sex are used for covariates by --cname option, and age*variant for gene-environment interaction are included by --gxecovs option.

            Weighting samples using Weighted Least Squares

            About the statistical property of Weighted Least Squres, please refer the Wikipedia page. It is possible to give sample-wise weights with linear regression, via --sampleweight option.

            Perform linear regression with HEIGHT using WLS C:\Users\WISARD> wisard --bed sample.bed --sampvar pheno.txt --pname HEIGHT --regression --sampleweight weight.txt

            As shown in above example, --sampleweight option requires an input contains sample-wise weight. The file should be formed with multiple lines consists three columns for each line. The columns should be FID, IID and weight, respectively.

            NOTE!
            Retrieved weights will be inversed. For example, a weight 0.7 will be applied as 1/0.7=1.43.

            LASSO regression [top]

            WISARD provides LASSO regression for variable selection with set definition given by --set option. For example, it is possible to choose some markers for each gene using LASSO regression, using the command like below.

            Perform very basic LASSO regression C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --regression --lasso

            Above command will produce [prefix].lasso.res with below format description.

            Similiar to other regression analyses, it is possible to incorporate helpful covariates into the model, using --sampvar and --cname options.

            Perform LASSO regression with an adjustment of covariates age and height C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --regression --lasso --sampvar test_miss0_phen.txt --cname age,height

            In default, the lambda threshold to report the result is given with 1/10. In order to adjust this value, use --lassolambda option.

            Set the reporting threshold of lambda in LASSO regression to 0.05, instead of its default threshold C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --regression --lasso --lassolambda 0.05

            If it is not desirable to report only the markers having specific lambda threshold, using --lassoall will force to report all results regardless of its lambda value.

            NOTE!
            Since --lassoall is exactly equivalent to --lassolambda 0, --lassoall and --lassolabmda cannot be used simultaneously!
            Force to reporting all results in LASSO regression C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --regression --lasso --lassoall


            Edit this page
            Last modified : 2017-09-11 16:04:32