WISARD official site

Select O/S : [?]

Case tutorial

Gene-level Association Analysis

Available statistics [top]

While effect of common variant on phenotypes of interest can be detected by testing its marginal effect, the rare variant suffers from large false negative finding and the statistical algorithms for common variants cannot be directly utilized. To improve the statistical efficiency of rare variant assocation analysis, set of rare variants are simultaneously tested for their association. Therefore, set file which lists the variants belonging to a same gene should be provided. If there is a single variant in a set, the result is not valid and they must be filtered out from the analysis.

For rare variant analysis, statistical power is affected by several factors; definition of set, and homogeneity of effect of each rare variant on phenotypes. Depending on the characteristic of these factor, the most efficient statistic is different and several statistics should be considered at the same time. WISARD provides various statistis for rare variant analysis.

Detailed information about power comparison for various situations can be found at Ladouceur et al (Plos Genetics 2012).

Methods which are efficient when effects of rare variants are homogeneous

CMC test: can be applied for dichotomous phenotypes. May be efficient if the presence of rare alleles is related with the disease risk.
WST test: can be applied for dichotomous phenotypes. May be efficient if the number of rare alleles is related with the disease risk.
Collapsing test: can be applied for dichotomous and quantitative phenotypes. May be efficient if the number of rare alleles is related with the disease risk

Method which is less sensitive to the definition of group

Variable-Threshold test: can be applied for dichotomous and quantitative phenotypes. May be efficient if rarer variants have strong effect on disease

Method which is efficient when there is multi-variant joint action

KBAC test: can be applied for dichotomous phenotypes. May be efficient if there is a joint interaction between rare variants

Method which is efficient if rare variants with both positive and negative effect on disease are grouped to a single set

C-alpha test: can be applied for dichtomous phenotype. It tests the heterogeneity of variance between cases and controls
SKAT method: can be applied for dichotomous and quantitative phenotypes. May be efficient if rare variants with positive and negative effect on phenotype are grouped as a set

Method which combines collapsing test and SKAT method, and is robust to the heterogeneity of effects of rare variants

SKAT-o method: can be applied for dichotomous and quantitative phenotypes. May be efficient if rare variants with positive and negative effect on phenotype are grouped as a set
Q-test: can be applied for quantitative phenotype. similar to SKAT-o statistic but more efficient than SKAT-o if the number of rare variants is not very large.

Quantitative/dichotomous phenotype [top]

WISARD automatically determines whether each phenotype is either quantitative or dichotomous. By default if only 1, or 2 are observed as phenotypic values, it is assumed to be dichotomous by WISARD and otherwise it is to be quantitative phenotype. With some options, phenotypes with different values can be assumed to be dichotomous.

--1case: this option makes the phenotype of which observation is either 0 or 1 be dichotomous by WISARD.
--cact value1,value2: this option makes the phenotype of which observation is value1 or value2 be dichotomous. For instance, "--cact 1,0 " option has same meaning as "--1case".

Rare variant association is tested with a set of rare variants simultaneously because of large false negative rate, and thus a set of rare variants should be defined. WISARD supports four types of set file format, and it can be selected by using --set option.

NOTE!

--set option is mandatory for running gene-level analysis!

Type-I file format

For type-I format, each line consists of two columnes for gene set name (e.g. SET_A) and variant name respectively, and they should be separated with whitespace (space or tab). Gene set name might be a gene name.

Example 1 : Type-I set file format

SET_A	rs172
SET_A	rs29445
SET_A	SNP_A-1924825
SET_B	rs2851
SET_B	SNP_A-124
SET_B	rs38985

NOTE!

Variants which belong to the same gene should be contiguously placed!

Type-II file format

Type-II file format is equal to t he set definition used in PLINK(see here for plink). Each set must start with a set name which can not have any spaces in it. The name is followed by a list of variants in that gene set, and the keyword END specifies the end of that particular set. You also can refer below example:

Example 2 : Type-II set file format

SET_A
rs172
rs29445
SNP_A-1924825
END

SET_B
rs2851
SNP_A-124
rs38985
END

NOTE!

Do not use END as a name of variant!

Type-III file format

Type-III file format is similar to the type-I definition, but all variants for each set should be enumerated in a single line. Type-III file format is equal to the set definition used in EPACTS.

Example 3 : Type III set

SET_A	rs172	rs29445	SNP_A-1924825
SET_B	rs2851	SNP_A-124	rs38985

Type-IV file format

Type-IV file format is different with the other three types of set. It defines a set of multiple variants by allocating specific region to each set. Each set can be overlapped among other sets, and a variant which is placed on overlapped region will be assigned to every sets that occupies that region.

Type-V file format (refGenes format)

In many analysis toolsets such as Rvtests uses an existing format for representing gene information.

Weighting variants with a prior information [top]

Rarer variants are often assumed to be functionally more important for phenotypes, and thus WISARD provides several ways to weight each variant by using MAF.

If we let $\small{p_k}$ be MAF for variant $\small{k}$, $\small{1/\sqrt{p_k(1-p_k)}}$, is used as a weight by default.
--noweight : it disables the default weight.
--betaweight: MAF is transformed by using Beta-distribution based and they are used as set file generated from Beta(1,25).
--weight filename: it makes WISARD use the user-defined weight.

User-defined weight can be loaded by using --weight option, and this option is often used to weight each varaint by measuring the importance of variant in terms of protein structure. Some softwares such as SIFT or PolyPhen score can calculate this information. In this file, each line should have two columns where the first column is a variant name and the second column is a weight. Two columns should be separated by whitespace (space or tab)

Example 4 : An example of user-defined weight file

rs8238523	0.3833
rs92484829	0.9217
var_3_32883571	0.0042

NOTE!

The user-defined weight file should contain weights for all variants within a dataset to be analyzed!

CMC test [top]

Combined Multivariate and Collapsing(CMC) test was suggested by Li and Leal (Am J Hum Genet 2008), and it can be applied to dichotomous phenotypes. This approach is useful for case-control design and it assumes that the presence of rare alleles increases or decreases the disease risk, and the number of rare alleles is not important.

Example code

Calculating CMC test with --genetest option

CMC test C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --indep --genoctrl --genetest --out res_collapsing

--genoctrl

Obtaining the detailed information with --verbose option

CMC test C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --indep --genoctrl --genetest --verbose --out res_collapsing

Weighted-sum test [top]

Weighted-sum test (WST) was suggested by Madsen and Browning (Plos Genetics 2009) and can be applied to dichotomous phenotypes. WST tests whether weighted rare allele counts ares associated with phenotypes and is efficient if the effects of all rare variants on phenotypes are in the same direction. Significance is calculated by comparing the weighted rare allele counts between cases and controls.

Example code

Calculating WST by using --genetest and --wsum options

Weighted-sum test C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --genetest --wsum --out res_wsum

KBAC test [top]

The kernel-based adaptive cluster (KBAC) test was suggested by Liu and Leal, and can be applied for dichotomous phenotypes. KBAC test categorizes set of rare variants depending on the pattern of rare alleles, and it may be efficient if there is a joint interaction between rare variants

Adjusting alpha level for KBAC test

Perform KBAC test with alpha level=0.1 C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --genetest --kbac --kbacalpha 0.1 --out res_kbac_0_1

Example code

Calculating KBAC test by using

Collapsing-based test [top]

Collapsing-based test was suggested by Morris and Zeggini (Genet Epi 2010) and can be applied to dichotomous and quantitative phenotypes. Collapsing-based tests check whether weighted rare allele counts ares associated with phenotypes and is efficient if the effects of all rare variants on phenotypes are in the same direction.

Collapsing-based test incorporates the weighted rare allele counts as a covariate, and for dichotomous and quantitative phenotypes, logistic and linear regressions are respectively used.

Example code

Collapsing test can be performed by WISARD in default of --genetest option.

Collapsing test C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --indep --genetest --out res_collapsing

Variable-Threshold test [top]

Variable-threshold(VT) test was suggested by Price et al (Am J Hum Genet 2010). For rare variant analysis the definition of rare variants is unclear and different MAF thresholds are used. In this reason, VT test selects the MAF threshold which maximize the significance. VT method can be applied for dichotomous and quantitative phenotypes, and it may be useful if rarer variants have strong effect on phenotype.

Final p-values for VT test is calcualted with permutation and the number of iteration should be decided with the significance level. For instance if you are interested in the 0.05 significance level, then we suggest to iterate at least 1/0.05 *10 times.

Example codes

Calculating VT test by using --genetest and --vt options
Performing VT test C:\Users\WISARD> wisard --ped test_miss0.ped --genetest --indep --vt --out result_vt --set test_gene.txt --out res_vt
Calculating VT test with 100,000 iteration for permutation
Performing VT test with specified number (100,000) of permutations C:\Users\WISARD> wisard --ped test_miss0.ped --genetest --indep --vt --out result_vt --set test_gene.txt --nperm 100000 --out res_vt_100Kperm

SKAT method [top]

SNP-set/Sequence kernel association test (SKAT) was suggested by Wu et al (Am J Hum Genet 2011) and can be applied for dichotomous and quantitative phenotypes. SKAT is efficient if rare variants with positive and negative effect on phenotype are grouped as a set and results from SKAT are usually similar with C-alpha test.

SKAT approximately follows the mixture of chi-square distribution if sample size is sufficiently large, and p-values for SKAT are calculated with numerical algorithm by Liu et al (Com Stat Data Anal 2009)

Example codes

Calculating SKAT method by using --genetest and --skat options

SKAT method C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname sbp --set test_gene.txt --indep --genetest --skat --out res_skat

SKAT-o method [top]

SNP-set/Sequence Kernel Association Test-optimal(SKAT-o) is an extension of SKAT and was suggested by Lee et al (Biostatistics 2012). It is a mixture of burden-type test and SKAT, and can be applied to dichotomous and quantitative phenotypes. comes from an integration of mixture probability distribution function of that statistics. SKAT-o is a method robust to the direction of effects of rare variants on phenotype.

Custom optimal weights

WISARD borrows an idea of optimal selection weights from SKAT package of R. In default, the weights for optimal selection is $0$, $0.1^2$, $0.2^2$, $0.3^2$, $0.4^2$, $0.5^2$, 0.5 and 1. The weights can be altered in two ways: Dividing the range from 0 to 1 with given number of equal segments(--skatondiv) and assigning user-defined weights(--skatodivs).

Perform SKAT-o with ten equal weight division(0, 0.1, ... , 1) C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --indep --sampvar test_miss0_phen.txt --pname sbp --genetest --skato --skatondiv 10 --out res_skato_div10

Perform SKAT-o with three user-defined weights(0, 0.5 and 1) C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --indep --sampvar test_miss0_phen.txt --pname sbp --genetest --skato --skatodivs 0,0.5,1 --out res_skato_customdiv3

Example codes

Calculating SKAT-o with --genetest and --skato options

SKAT-o test C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --sampvar test_miss0_phen.txt --pname sbp --indep --genetest --skato --out res_skato

NOTE!

SKAT-o method will not be performed on the gene has single variant.

Q-test [top]

Q-test was suggested by Lee et al. and is a test combining collasping-based test and SKAT test. Q-test can be applied to quantitative phenotype. Q-test has a similar property as SKAT-o test, and Wald-type test while SKAT-o is a score-type test. Q-test may be more efficient if the number of rare variants are not large.

Example codes

Calculating Q-test with --qtest

Perform Q-test C:\Users\WISARD> wisard --qtest --bed test_miss0.bed --set test_gene.txt --sampvar test_miss0_phen.txt --pname height --out res_qtst

Q-test different MAF cutoff

Using rare variants of which MAFs are between 0 and 0.1 C:\Users\WISARD> wisard --qtest --bed test_miss0.bed --set test_gene.txt --qtestrange "(0,0.1)" --sampvar test_miss0_phen.txt --pname height --out res_qtst_10per

--qtestrange

Calculating Q-test with covariate effect adjustment

Perform Q-test with covariate adjustment C:\Users\WISARD> wisard --qtest --sampvar test_miss0_phen.txt --pname height --cname age,weight --bed test_miss0.bed --set test_gene.txt --out res_qtst_cov

Gene set Q-test

Perform set level Q-test C:\Users\WISARD> wisard --qtest --bed test_miss0.bed --set test_gene.txt --geneset gs_def.txt --out res_qtst_geneset

--geneset

Example 5 : Example of gene-set definition file

GENE_SET1	GENE1 GENE2 GENE3
GENE_SET2	GENE2 GENE4
GENE_SET3	GENE5 GENE6 GENE7 GENE8

Gene-level test with longitudinal data [top]

WISARD can conduct the gene-level with a longitudinal data and this analysis can be applied only for SKAT and SKAT-o. For longitudinal data analysis, there exists the correlation between repeated measurements. Estimation of correlation matrix can be conducted with the statistical software such as R and SAS under the assumption that genotypes do not have any effect on phenotype. Then WISARD loads the correlation matrix and the score-type test for gene-level test can be calculated.

Example codes

Specifying correlation matrix between repeated measurements by using --longitudinal option

Gene-level test with longitudinal data C:\Users\WISARD> WISARD --ped test_miss0.ped --sampvar test_miss0_phen2.txt --pname bmi,bmi2,bmi3 --longitudinal test_varcov.txt --genetest --skato --set test_gene.txt

Example 6 : Variance-covariance matrix file (namely test_varcov.txt)

1 0.5 0.25
0.5 1 0.5
0.25 0.5 1

NOTE!

Correlation matrix should be symmetric!

Phenotype file is specified by using --sampvar option.
Phenotype names for repeated measurement is specified by using --pname option.

Edit this page

Last modified : 2017-09-09 13:56:52