WISARD[wɪzərd]
Workbench for Integrated Superfast Association study with Related Data
HOME  |   DOWNLOAD  |   OPTIONS  |   TROUBLE?  |   LOGIN
 

Gene-level Association Analysis

Available statistics [top]

While effect of common variant on phenotypes of interest can be detected by testing its marginal effect, the rare variant suffers from large false negative finding and the statistical algorithms for common variants cannot be directly utilized. To improve the statistical efficiency of rare variant assocation analysis, set of rare variants are simultaneously tested for their association. Therefore, set file which lists the variants belonging to a same gene should be provided. If there is a single variant in a set, the result is not valid and they must be filtered out from the analysis.

For rare variant analysis, statistical power is affected by several factors; definition of set, and homogeneity of effect of each rare variant on phenotypes. Depending on the characteristic of these factor, the most efficient statistic is different and several statistics should be considered at the same time. WISARD provides various statistis for rare variant analysis.

Detailed information about power comparison for various situations can be found at Ladouceur et al (Plos Genetics 2012).

  1. Methods which are efficient when effects of rare variants are homogeneous
    • CMC test: can be applied for dichotomous phenotypes. May be efficient if the presence of rare alleles is related with the disease risk.
    • WST test: can be applied for dichotomous phenotypes. May be efficient if the number of rare alleles is related with the disease risk.
    • each marker is weighted based on the inverse of MAF and weighted sums are compared between cases and controls
    • Collapsing test: can be applied for dichotomous and quantitative phenotypes. May be efficient if the number of rare alleles is related with the disease risk
  2. Method which is less sensitive to the definition of group
    • Variable-Threshold test: can be applied for dichotomous and quantitative phenotypes. May be efficient if rarer variants have strong effect on disease
  3. Method which is efficient when there is multi-variant joint action
    • KBAC test: can be applied for dichotomous phenotypes. May be efficient if there is a joint interaction between rare variants
  4. Method which is efficient if rare variants with both positive and negative effect on disease are grouped to a single set
    • C-alpha test: can be applied for dichtomous phenotype. It tests the heterogeneity of variance between cases and controls
    • SKAT method: can be applied for dichotomous and quantitative phenotypes. May be efficient if rare variants with positive and negative effect on phenotype are grouped as a set
  5. Method which combines collapsing test and SKAT method, and is robust to the heterogeneity of effects of rare variants
    • SKAT-o method: can be applied for dichotomous and quantitative phenotypes. May be efficient if rare variants with positive and negative effect on phenotype are grouped as a set
    • Q-test: can be applied for quantitative phenotype. similar to SKAT-o statistic but more efficient than SKAT-o if the number of rare variants is not very large.


Quantitative/dichotomous phenotype [top]

WISARD automatically determines whether each phenotype is either quantitative or dichotomous. By default if only 1, or 2 are observed as phenotypic values, it is assumed to be dichotomous by WISARD and otherwise it is to be quantitative phenotype. With some options, phenotypes with different values can be assumed to be dichotomous.

  • --1case: this option makes the phenotype of which observation is either 0 or 1 be dichotomous by WISARD.
  • --cact value1,value2: this option makes the phenotype of which observation is value1 or value2 be dichotomous. For instance, "--cact 1,0 " option has same meaning as "--1case".



Rare variant association is tested with a set of rare variants simultaneously because of large false negative rate, and thus a set of rare variants should be defined. WISARD supports four types of set file format, and it can be selected by using --set option.

NOTE!
--set option is mandatory for running gene-level analysis!

Type-I file format

For type-I format, each line consists of two columnes for gene set name (e.g. SET_A) and variant name respectively, and they should be separated with whitespace (space or tab). Gene set name might be a gene name.

Example 1 : Type-I set file format
SET_A rs172
SET_A rs29445
SET_A SNP_A-1924825
SET_B rs2851
SET_B SNP_A-124
SET_B rs38985
NOTE!
Variants which belong to the same gene should be contiguously placed!

Type-II file format

Type-II file format is equal to t he set definition used in PLINK(see here for plink). Each set must start with a set name which can not have any spaces in it. The name is followed by a list of variants in that gene set, and the keyword END specifies the end of that particular set. You also can refer below example:

Example 2 : Type-II set file format
SET_A
rs172
rs29445
SNP_A-1924825
END
SET_B
rs2851
SNP_A-124
rs38985
END
NOTE!
Do not use END as a name of variant!

Type-III file format

Type-III file format is similar to the type-I definition, but all variants for each set should be enumerated in a single line. Type-III file format is equal to the set definition used in EPACTS.

Example 3 : Type III set
SET_A rs172 rs29445 SNP_A-1924825
SET_B rs2851 SNP_A-124 rs38985

Type-IV file format

Type-IV file format is different with the other three types of set. It defines a set of multiple variants by allocating specific region to each set. Each set can be overlapped among other sets, and a variant which is placed on overlapped region will be assigned to every sets that occupies that region.

Type-V file format (refGenes format)

In many analysis toolsets such as Rvtests uses an existing format for representing gene information.



Weighting variants with a prior information [top]

Rarer variants are often assumed to be functionally more important for phenotypes, and thus WISARD provides several ways to weight each variant by using MAF.

  • If we let $\small{p_k}$ be MAF for variant $\small{k}$, $\small{1/\sqrt{p_k(1-p_k)}}$, is used as a weight by default.
  • --noweight : it disables the default weight.
  • --betaweight: MAF is transformed by using Beta-distribution based and they are used as set file generated from Beta(1,25).
  • --weight filename: it makes WISARD use the user-defined weight.
  • User-defined weight can be loaded by using --weight option, and this option is often used to weight each varaint by measuring the importance of variant in terms of protein structure. Some softwares such as SIFT or PolyPhen score can calculate this information. In this file, each line should have two columns where the first column is a variant name and the second column is a weight. Two columns should be separated by whitespace (space or tab)

    Example 4 : An example of user-defined weight file
    rs8238523 0.3833
    rs92484829 0.9217
    var_3_32883571 0.0042
    NOTE!
    The user-defined weight file should contain weights for all variants within a dataset to be analyzed!


CMC test [top]

Combined Multivariate and Collapsing(CMC) test was suggested by Li and Leal (Am J Hum Genet 2008), and it can be applied to dichotomous phenotypes. This approach is useful for case-control design and it assumes that the presence of rare alleles increases or decreases the disease risk, and the number of rare alleles is not important.

Example code

  • Calculating CMC test with --genetest option
  • CMC test C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --indep --genoctrl --genetest --out res_collapsing
    It is recommended to perform CMC test with --genoctrl option, to adjust its p-value by genomic control method.
  • Obtaining the detailed information with --verbose option
  • CMC test C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --indep --genoctrl --genetest --verbose --out res_collapsing
    If there are too many variants within a gene or MAFs of some variants are large, the probability that at least a single variant has rare allele become large and then CMC test does not follow the chi-square distribution under the null hypothesis. In such a case, CMC test cannot be utilized and Fisher's exact test must be applied.


Weighted-sum test [top]

Weighted-sum test (WST) was suggested by Madsen and Browning (Plos Genetics 2009) and can be applied to dichotomous phenotypes. WST tests whether weighted rare allele counts ares associated with phenotypes and is efficient if the effects of all rare variants on phenotypes are in the same direction. Significance is calculated by comparing the weighted rare allele counts between cases and controls.

Example code

  • Calculating WST by using --genetest and --wsum options
  • Weighted-sum test C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --genetest --wsum --out res_wsum


KBAC test [top]

The kernel-based adaptive cluster (KBAC) test was suggested by Liu and Leal, and can be applied for dichotomous phenotypes. KBAC test categorizes set of rare variants depending on the pattern of rare alleles, and it may be efficient if there is a joint interaction between rare variants

Adjusting alpha level for KBAC test

Perform KBAC test with alpha level=0.1 C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --genetest --kbac --kbacalpha 0.1 --out res_kbac_0_1

Example code

  • Calculating KBAC test by using


Collapsing-based test [top]

Collapsing-based test was suggested by Morris and Zeggini (Genet Epi 2010) and can be applied to dichotomous and quantitative phenotypes. Collapsing-based tests check whether weighted rare allele counts ares associated with phenotypes and is efficient if the effects of all rare variants on phenotypes are in the same direction.

Collapsing-based test incorporates the weighted rare allele counts as a covariate, and for dichotomous and quantitative phenotypes, logistic and linear regressions are respectively used.

Example code

  • Collapsing test can be performed by WISARD in default of --genetest option.
  • Collapsing test C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --indep --genetest --out res_collapsing


Variable-Threshold test [top]

Variable-threshold(VT) test was suggested by Price et al (Am J Hum Genet 2010). For rare variant analysis the definition of rare variants is unclear and different MAF thresholds are used. In this reason, VT test selects the MAF threshold which maximize the significance. VT method can be applied for dichotomous and quantitative phenotypes, and it may be useful if rarer variants have strong effect on phenotype.

Final p-values for VT test is calcualted with permutation and the number of iteration should be decided with the significance level. For instance if you are interested in the 0.05 significance level, then we suggest to iterate at least 1/0.05 *10 times.

Example codes

  • Calculating VT test by using --genetest and --vt options
    Performing VT test C:\Users\WISARD> wisard --ped test_miss0.ped --genetest --indep --vt --out result_vt --set test_gene.txt --out res_vt
  • Calculating VT test with 100,000 iteration for permutation
    Performing VT test with specified number (100,000) of permutations C:\Users\WISARD> wisard --ped test_miss0.ped --genetest --indep --vt --out result_vt --set test_gene.txt --nperm 100000 --out res_vt_100Kperm
  • Default number of interation is 1,000 and the maximum number of iteration is 2^32 - 1.


SKAT method [top]

SNP-set/Sequence kernel association test (SKAT) was suggested by Wu et al (Am J Hum Genet 2011) and can be applied for dichotomous and quantitative phenotypes. SKAT is efficient if rare variants with positive and negative effect on phenotype are grouped as a set and results from SKAT are usually similar with C-alpha test.

SKAT approximately follows the mixture of chi-square distribution if sample size is sufficiently large, and p-values for SKAT are calculated with numerical algorithm by Liu et al (Com Stat Data Anal 2009)

Example codes

  • Calculating SKAT method by using --genetest and --skat options
  • SKAT method C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname sbp --set test_gene.txt --indep --genetest --skat --out res_skat


SKAT-o method [top]

SNP-set/Sequence Kernel Association Test-optimal(SKAT-o) is an extension of SKAT and was suggested by Lee et al (Biostatistics 2012). It is a mixture of burden-type test and SKAT, and can be applied to dichotomous and quantitative phenotypes. comes from an integration of mixture probability distribution function of that statistics. SKAT-o is a method robust to the direction of effects of rare variants on phenotype.

Custom optimal weights

WISARD borrows an idea of optimal selection weights from SKAT package of R. In default, the weights for optimal selection is $0$, $0.1^2$, $0.2^2$, $0.3^2$, $0.4^2$, $0.5^2$, 0.5 and 1. The weights can be altered in two ways: Dividing the range from 0 to 1 with given number of equal segments(--skatondiv) and assigning user-defined weights(--skatodivs).

Perform SKAT-o with ten equal weight division(0, 0.1, ... , 1) C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --indep --sampvar test_miss0_phen.txt --pname sbp --genetest --skato --skatondiv 10 --out res_skato_div10
Perform SKAT-o with three user-defined weights(0, 0.5 and 1) C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --indep --sampvar test_miss0_phen.txt --pname sbp --genetest --skato --skatodivs 0,0.5,1 --out res_skato_customdiv3

Example codes

  • Calculating SKAT-o with --genetest and --skato options
  • SKAT-o test C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --sampvar test_miss0_phen.txt --pname sbp --indep --genetest --skato --out res_skato
    NOTE!
    SKAT-o method will not be performed on the gene has single variant.


Q-test [top]

Q-test was suggested by Lee et al. and is a test combining collasping-based test and SKAT test. Q-test can be applied to quantitative phenotype. Q-test has a similar property as SKAT-o test, and Wald-type test while SKAT-o is a score-type test. Q-test may be more efficient if the number of rare variants are not large.

Example codes

  • Calculating Q-test with --qtest
  • Perform Q-test C:\Users\WISARD> wisard --qtest --bed test_miss0.bed --set test_gene.txt --sampvar test_miss0_phen.txt --pname height --out res_qtst
  • Q-test different MAF cutoff
  • Using rare variants of which MAFs are between 0 and 0.1 C:\Users\WISARD> wisard --qtest --bed test_miss0.bed --set test_gene.txt --qtestrange "(0,0.1)" --sampvar test_miss0_phen.txt --pname height --out res_qtst_10per
    --qtestrange option makes WISARD use rare variants of which MAFs are in a certain range. By default, Q-test implemented in WISARD uses variants of which MAFs are less than 0.05.
  • Calculating Q-test with covariate effect adjustment
  • Perform Q-test with covariate adjustment C:\Users\WISARD> wisard --qtest --sampvar test_miss0_phen.txt --pname height --cname age,weight --bed test_miss0.bed --set test_gene.txt --out res_qtst_cov
    Infomration for the covariates is in cov.txt file, and AGE and SEX are included as covariates.
  • Gene set Q-test
  • Perform set level Q-test C:\Users\WISARD> wisard --qtest --bed test_miss0.bed --set test_gene.txt --geneset gs_def.txt --out res_qtst_geneset
    Gene sets are sets of genes that have something in common, and Q-test is extended to test the assocation between gene sets and phenotype. For gene set test, genes which belong to the same pathway are grouped. Q-test can be performed with set level. Gene set Q-test requires to specify the set file by using --geneset option. The file for gene-set definition is specified by using --geneset option. Each line in gene-set file has two column, and first element is for gene-set name, followed by lists of genes belonging to the gene set.
    Example 5 : Example of gene-set definition file
    GENE_SET1 GENE1 GENE2 GENE3
    GENE_SET2 GENE2 GENE4
    GENE_SET3 GENE5 GENE6 GENE7 GENE8


Gene-level test with longitudinal data [top]

WISARD can conduct the gene-level with a longitudinal data and this analysis can be applied only for SKAT and SKAT-o. For longitudinal data analysis, there exists the correlation between repeated measurements. Estimation of correlation matrix can be conducted with the statistical software such as R and SAS under the assumption that genotypes do not have any effect on phenotype. Then WISARD loads the correlation matrix and the score-type test for gene-level test can be calculated.

Example codes

  • Specifying correlation matrix between repeated measurements by using --longitudinal option
  • Gene-level test with longitudinal data C:\Users\WISARD> WISARD --ped test_miss0.ped --sampvar test_miss0_phen2.txt --pname bmi,bmi2,bmi3 --longitudinal test_varcov.txt --genetest --skato --set test_gene.txt
    The file format for correlation matrix is
    Example 6 : Variance-covariance matrix file (namely test_varcov.txt)
    1 0.5 0.25
    0.5 1 0.5
    0.25 0.5 1
    NOTE!
    Correlation matrix should be symmetric!
  • Phenotype file is specified by using --sampvar option.
  • Phenotype names for repeated measurement is specified by using --pname option.


Edit this page
Last modified : 2017-09-09 13:56:52