WISARD official site

Select O/S : [?]

Case tutorial

Association Analysis

This section describes about

Available statistics
Quantitative/dichotomous phenotype
Cochran-Armitage trend test and genotype based test
Fisher's exact test
Logistic/linear regression
- Gene-environment interaction for logistic/linear regression
- Weighting samples using Weighted Least Squares
LASSO regression

Available statistics [top]

Statistics available by WISARD are illustrated under the absence of population stratification. The most efficient statistic depends on the ascertainment condition, the property of phenotypes (binary/continuous), the presence of covariates and the absence/presence of population stratification. In this section, we assume the absence of population stratification and if you want to check whether population stratification exists, please see the population stratification page for detail and appropriate statistics under the presence of population substructure can be found at association analysis under the population stratification page.

Summary for available statistics:

Cochran-Armitage trend test (fast): useful for case-control design when covariate adjustment is not required. It is the most efficient for additive disease mode of inheritance
Genotype-based test (fast): useful for case-control design when covariate adjustment is not required. It is better to consider genotype-based test and Cochran-Armitage trend test if the disease mode of inheritance is unclear.
Fisher's exact test (fast): useful for case-control design when covariate adjustment is not required and the sample size is relatively small
Logistic regression (fast): useful for case-control design when covariate adjustment is required and the sample size is relatively large.
Linear regression (fast): useful for random samples for quantitative phenotypes and when covariate adjustment is required.
LASSO regression (moderate): can be used for variable selection.

Quantitative/dichotomous phenotype [top]

WISARD automatically determines whether each phenotype is either quantitative or dichotomous. By default if only 1, or 2 are observed as phenotypic values, it is assumed to be dichotomous by WISARD and otherwise it is to be quantitative phenotype. With some options, phenotypes with different values can be assumed to be dichotomous.

--1case: this option makes the phenotype of which observation is either 0 or 1 be dichotomous by WISARD.
--cact value1,value2: this option makes the phenotype of which observation is value1 or value2 be dichotomous. For instance, "--cact 1,0 " option has same meaning as "--1case".

Cochran-Armitage trend test and genotype based test [top]

Cochran-Armitage trend test and genotype-based test are utilized for case-control design (dichotomous phenotypes and ascertained samples). Weights for Cochran-Armitage trend test are calculated for additive disease mode of inheritance, and it follows the chi-square distribution with a single degree of freedom under the null hypothesis. Genotype-based test is a Pearson chi-square test and because there are three categories (e.g. AA/AT/TT) for genotypes, it follows the chi-square test with 2 degrees of freedom under the null hypothesis. Therefore Cochran-Armitage trend test is optimized for additive disease mode of inheritance, and if disease mode of inheritance is unclear, genotype-based test is more efficient than Cochran-Armitage trend test. In particular, additive disease mode of inheritance is popular and thus we recommend to consider both statistics if disease mode of inheritance is unclear.

It should be noted that even for dichotomous phenotypes, both tests are not efficient if samples are randomly selected or there are some covariates that need to be adjusted, and alternative test should be considered.

Both tests can be conducted with WISARD by using --trend option.

Example code

Cochran-Armitage trend test and genotype-based test C:\Users\WISARD> wisard --bed test_miss0.bed --trend --out res_2x3

--pname

Output file

Column	Format	Modifier	Description
*trend.res* is...			A result of Cochran-Armitage trend test and allelic based test for the testing of association in case/control dataset (TSV)
CHR	integer	NONE	Incorporated number of individuals for the test
VARIANT	integer	NONE	Incorporated number of individuals for the test
POS	integer	NONE	Incorporated number of individuals for the test
ALT	integer	NONE	Incorporated number of individuals for the test
ANNOT	integer	NONE	Incorporated number of individuals for the test
PHENO	string	--sampvar,--pname	Tested phenotype
P_TREND	real	NONE	Asymptotic p-value of Cochran-Armitage trend test
P_ABT	real	NONE	Asymptotic p-value of allelic based test

Fisher's exact test [top]

Even though Cochran-Armitage test and genotype-based test are usually efficient for case-control design, their p-values are not valid under the null hypothesis. Fisher's exact test is often utilized for case-control design if the sample size is relatively small. Threshold for small sample size is not clear but it obviously depends on MAF of each variant and the significance level $\alpha$. Thus, our suggestion is to consider the Fisher's exact test for variants of which MAF is less than 0.1 or if sample size is less than 1,000. WISARD can calculate Fisher's exact test by using --fisher option.

Example code for Fisher's exact test

Fisher's exact test C:\Users\WISARD> wisard --bed test_miss0.bed --fisher --out res_fisher

--pname

Output file

Column	Format	Modifier	Description
*fisher.res* is...			A result of Fisher's exact test for the testing of association in case/control dataset (TSV)
CHR	real	NONE	Asymptotic p-value of allelic based test
VARIANT	real	NONE	Asymptotic p-value of allelic based test
POS	real	NONE	Asymptotic p-value of allelic based test
ALT	real	NONE	Asymptotic p-value of allelic based test
ANNOT	real	NONE	Asymptotic p-value of allelic based test
PHENO	string	--sampvar,--pname	Tested phenotype
P_FISHER	real	NONE	Exact p-value of Fisher's exact test

Logistic/linear regression [top]

For dichotomous phenotype, the presence of covariates makes Cochran-Armitage test and genotype-based test inefficient, and in this case, the logistic regression is an alternative choice. For quantitatie phenotype, the linear regression is efficient for random samples. If individuals with extreme phenotypes are selected, phenotypes are often not normally distributed and different statistics should be considered.

WISARD can perform logistic/linear regression analysis. If phenotypes are dichotomous, logistic regression is applied and if phenotypes are continuous, linear regression is applied.

Logistic/linear regression can be performed with WISARD by using --regression option.

Example code for logistic/liear regression

Logistic/linear regression analysis using default phenotype C:\Users\WISARD> wisard --bed test_miss0.bed --regression --out res_regr

Example code for specifying covariates with --cname option

Specifying covariates for logistic/linear regression C:\Users\WISARD> wisard --bed test_miss0.bed --regression --sampvar test_miss0_phen.txt --cname age,height --out res_regr_cov

Example code for specifying response with --pname option

Specifying response variable and covariates for logistic/linear regression C:\Users\WISARD> wisard --bed test_miss0.bed --regression --sampvar test_miss0_phen.txt --pname sbp --cname age,height --out res_regr_height_cov

test_miss0_phen.txt

Output file

Column	Format	Modifier	Description
*linear.regr.res* is...			A result of regression analysis with continuous phenotype (TSV)
VARIANT	string	continuous phenotype	Tested variant name
ANNOT	string	continuous phenotype,--annogene	Annotation for the variant
NMISS	integer	continuous phenotype,--avail	Number of missing genotype for the variant
NIMP	integer	continuous phenotype,w/o --avail	Number of imputed genotype for the variant
BETA	real	continuous phenotype	Beta coefficient for the variant
STAT	real	continuous phenotype	Wald statistic of the beta coefficient for the variant
PVAL	real	continuous phenotype	p-value from Wald statistic of the variant
GINV	0/1	continuous phenotype	Generalized inverion used(1) or not(0) in the test
BETA_(covariate name)	real	continuous phenotype,--sampvar,--cname	Beta coefficient for the covariates
STAT_(covariate name)	real	continuous phenotype,--sampvar,--cname	Wald statistic of the beta coefficient for the covariates
PVAL_(covariate name)	real	continuous phenotype,--sampvar,--cname	p-value from Wald statistic of the covariates
BETA_(covariate name)*GENO	real	continuous phenotype,--sampvar,--cname,--gxe	Beta coefficient for the interaction between covariate and variant
STAT_(covariate name)*GENO	real	continuous phenotype,--sampvar,--cname,--gxe	Wald statistic of the beta coefficient for the interaction between covariate and variant
PVAL_(covariate name)*GENO	real	continuous phenotype,--sampvar,--cname,--gxe	p-value from Wald statistic of the interaction between covariate and variant

Gene-environment interaction for logistic/linear regression

WISARD can include the gene-environment interaction for logistic/linear regression by using --gxe and --gxecovs options.

Example code for gene-environment interaction for logistic/linear regression

Performing gene-environment interaction analysis C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --cname age,height --regression --gxe --out res_regr_gxe_all

--gxe

--gxecovs

Example code for specifying certain gene-environment interactions for logistic/linear regression

Specifying certain gene-environment interactions C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --cname age,height --regression --gxe --gxecovs age --out res_regr_gxe_age

--gxecovs

--gxe

--cname

--gxecovs

Weighting samples using Weighted Least Squares

About the statistical property of Weighted Least Squres, please refer the Wikipedia page. It is possible to give sample-wise weights with linear regression, via --sampleweight option.

Perform linear regression with HEIGHT using WLS C:\Users\WISARD> wisard --bed sample.bed --sampvar pheno.txt --pname HEIGHT --regression --sampleweight weight.txt

As shown in above example, --sampleweight option requires an input contains sample-wise weight. The file should be formed with multiple lines consists three columns for each line. The columns should be FID, IID and weight, respectively.

NOTE!

Retrieved weights will be inversed. For example, a weight 0.7 will be applied as 1/0.7=1.43.

LASSO regression [top]

WISARD provides LASSO regression for variable selection with set definition given by --set option. For example, it is possible to choose some markers for each gene using LASSO regression, using the command like below.

Perform very basic LASSO regression C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --regression --lasso

Above command will produce [prefix].lasso.res with below format description.

Similiar to other regression analyses, it is possible to incorporate helpful covariates into the model, using --sampvar and --cname options.

Perform LASSO regression with an adjustment of covariates age and height C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --regression --lasso --sampvar test_miss0_phen.txt --cname age,height

In default, the lambda threshold to report the result is given with 1/10. In order to adjust this value, use --lassolambda option.

Set the reporting threshold of lambda in LASSO regression to 0.05, instead of its default threshold C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --regression --lasso --lassolambda 0.05

If it is not desirable to report only the markers having specific lambda threshold, using --lassoall will force to report all results regardless of its lambda value.

NOTE!

Since --lassoall is exactly equivalent to --lassolambda 0, --lassoall and --lassolabmda cannot be used simultaneously!

Force to reporting all results in LASSO regression C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --regression --lasso --lassoall

Edit this page

Last modified : 2017-09-11 16:04:32