WISARD[wɪzərd] Workbench for Integrated Superfast Association study with Related Data |
|
This section describes about
Statistics available by WISARD are illustrated under the absence of population stratification. The most efficient statistic depends on the ascertainment condition, the property of phenotypes (binary/continuous), the presence of covariates and the absence/presence of population stratification. In this section, we assume the absence of population stratification and if you want to check whether population stratification exists, please see the population stratification page for detail and appropriate statistics under the presence of population substructure can be found at association analysis under the population stratification page.
Summary for available statistics:
WISARD automatically determines whether each phenotype is either quantitative or dichotomous. By default if only 1, or 2 are observed as phenotypic values, it is assumed to be dichotomous by WISARD and otherwise it is to be quantitative phenotype. With some options, phenotypes with different values can be assumed to be dichotomous.
Cochran-Armitage trend test and genotype-based test are utilized for case-control design (dichotomous phenotypes and ascertained samples). Weights for Cochran-Armitage trend test are calculated for additive disease mode of inheritance, and it follows the chi-square distribution with a single degree of freedom under the null hypothesis. Genotype-based test is a Pearson chi-square test and because there are three categories (e.g. AA/AT/TT) for genotypes, it follows the chi-square test with 2 degrees of freedom under the null hypothesis. Therefore Cochran-Armitage trend test is optimized for additive disease mode of inheritance, and if disease mode of inheritance is unclear, genotype-based test is more efficient than Cochran-Armitage trend test. In particular, additive disease mode of inheritance is popular and thus we recommend to consider both statistics if disease mode of inheritance is unclear.
It should be noted that even for dichotomous phenotypes, both tests are not efficient if samples are randomly selected or there are some covariates that need to be adjusted, and alternative test should be considered.
Both tests can be conducted with WISARD by using --trend option.
trend.res is... | A result of Cochran-Armitage trend test and allelic based test for the testing of association in case/control dataset (TSV) | ||
Column | Format | Modifier | Description | CHR | integer | NONE | Incorporated number of individuals for the test | VARIANT | integer | NONE | Incorporated number of individuals for the test | POS | integer | NONE | Incorporated number of individuals for the test | ALT | integer | NONE | Incorporated number of individuals for the test | ANNOT | integer | NONE | Incorporated number of individuals for the test | PHENO | string | --sampvar,--pname | Tested phenotype | P_TREND | real | NONE | Asymptotic p-value of Cochran-Armitage trend test | P_ABT | real | NONE | Asymptotic p-value of allelic based test |
---|
Even though Cochran-Armitage test and genotype-based test are usually efficient for case-control design, their p-values are not valid under the null hypothesis. Fisher's exact test is often utilized for case-control design if the sample size is relatively small. Threshold for small sample size is not clear but it obviously depends on MAF of each variant and the significance level $\alpha$. Thus, our suggestion is to consider the Fisher's exact test for variants of which MAF is less than 0.1 or if sample size is less than 1,000. WISARD can calculate Fisher's exact test by using --fisher option.
fisher.res is... | A result of Fisher's exact test for the testing of association in case/control dataset (TSV) | ||
Column | Format | Modifier | Description | CHR | real | NONE | Asymptotic p-value of allelic based test | VARIANT | real | NONE | Asymptotic p-value of allelic based test | POS | real | NONE | Asymptotic p-value of allelic based test | ALT | real | NONE | Asymptotic p-value of allelic based test | ANNOT | real | NONE | Asymptotic p-value of allelic based test | PHENO | string | --sampvar,--pname | Tested phenotype | P_FISHER | real | NONE | Exact p-value of Fisher's exact test |
---|
For dichotomous phenotype, the presence of covariates makes Cochran-Armitage test and genotype-based test inefficient, and in this case, the logistic regression is an alternative choice. For quantitatie phenotype, the linear regression is efficient for random samples. If individuals with extreme phenotypes are selected, phenotypes are often not normally distributed and different statistics should be considered.
WISARD can perform logistic/linear regression analysis. If phenotypes are dichotomous, logistic regression is applied and if phenotypes are continuous, linear regression is applied.
Logistic/linear regression can be performed with WISARD by using --regression option.
linear.regr.res is... | A result of regression analysis with continuous phenotype (TSV) | ||
Column | Format | Modifier | Description | VARIANT | string | continuous phenotype | Tested variant name | ANNOT | string | continuous phenotype,--annogene | Annotation for the variant | NMISS | integer | continuous phenotype,--avail | Number of missing genotype for the variant | NIMP | integer | continuous phenotype,w/o --avail | Number of imputed genotype for the variant | BETA | real | continuous phenotype | Beta coefficient for the variant | STAT | real | continuous phenotype | Wald statistic of the beta coefficient for the variant | PVAL | real | continuous phenotype | p-value from Wald statistic of the variant | GINV | 0/1 | continuous phenotype | Generalized inverion used(1) or not(0) in the test | BETA_(covariate name) | real | continuous phenotype,--sampvar,--cname | Beta coefficient for the covariates | STAT_(covariate name) | real | continuous phenotype,--sampvar,--cname | Wald statistic of the beta coefficient for the covariates | PVAL_(covariate name) | real | continuous phenotype,--sampvar,--cname | p-value from Wald statistic of the covariates | BETA_(covariate name)*GENO | real | continuous phenotype,--sampvar,--cname,--gxe | Beta coefficient for the interaction between covariate and variant | STAT_(covariate name)*GENO | real | continuous phenotype,--sampvar,--cname,--gxe | Wald statistic of the beta coefficient for the interaction between covariate and variant | PVAL_(covariate name)*GENO | real | continuous phenotype,--sampvar,--cname,--gxe | p-value from Wald statistic of the interaction between covariate and variant |
---|
WISARD can include the gene-environment interaction for logistic/linear regression by using --gxe and --gxecovs options.
About the statistical property of Weighted Least Squres, please refer the Wikipedia page. It is possible to give sample-wise weights with linear regression, via --sampleweight option.
As shown in above example, --sampleweight option requires an input contains sample-wise weight. The file should be formed with multiple lines consists three columns for each line. The columns should be FID, IID and weight, respectively.
NOTE! |
Retrieved weights will be inversed. For example, a weight 0.7 will be applied as 1/0.7=1.43. |
WISARD provides LASSO regression for variable selection with set definition given by --set option. For example, it is possible to choose some markers for each gene using LASSO regression, using the command like below.
Above command will produce [prefix].lasso.res with below format description.
Similiar to other regression analyses, it is possible to incorporate helpful covariates into the model, using --sampvar and --cname options.
In default, the lambda threshold to report the result is given with 1/10. In order to adjust this value, use --lassolambda option.
If it is not desirable to report only the markers having specific lambda threshold, using --lassoall will force to report all results regardless of its lambda value.
NOTE! |
Since --lassoall is exactly equivalent to --lassolambda 0, --lassoall and --lassolabmda cannot be used simultaneously! |