WISARD[wɪzərd]
Workbench for Integrated Superfast Association study with Related Data
HOME  |   DOWNLOAD  |   OPTIONS  |   TROUBLE?  |   LOGIN
 

Pathway-level Association Analysis

Available statistics [top]

Unlike other existing pathway-level association tests, the pathway-level association analyses (PHARAOH) that are implemented in WISARD supports three extra features.

  • Overall-type test: While the existing pathway-level analyses test an association between phenotype(s) and a single pathway, the PHARAOH analysis tests associations between phenotype(s) and all pathways at once, using a single integrated model.
  • Hierarhical analysis: The PHARAOH analysis constructs a large hierarchy that reflects natural structure of human genome, which starts from individual rare variant to genes, and then to phenotypes via pathways.
  • Penalized analysis: Since the pathways are highly correlated due to the overlaps of genes, their correlation must be considered to avoid high false positives. For this purpose, the PHARAOH analysis supports penalization scheme which reduces correlation between pathways.

  1. Methods for pathway-level analysis
    • PHARAOH test: can be applied for both dichotomous and continuous single phenotype.
    • PHARAOH-multi test: can be applied for multiple continuous phenotypes.
    • PHARAOH-GEE test: can be applied for multiple dichotomous and continuous phenotypes.
    • *NEW* HisCoM-Kernel test: supports kernel-type testing of both dichotomous and continuous single phenotype.


Quantitative/dichotomous phenotype [top]

WISARD automatically determines whether each phenotype is either quantitative or dichotomous. By default if only 1, or 2 are observed as phenotypic values, it is assumed to be dichotomous by WISARD and otherwise it is to be quantitative phenotype. With some options, phenotypes with different values can be assumed to be dichotomous.

  • --1case: this option makes the phenotype of which observation is either 0 or 1 be dichotomous by WISARD.
  • --cact value1,value2: this option makes the phenotype of which observation is value1 or value2 be dichotomous. For instance, "--cact 1,0 " option has same meaning as "--1case".

Rare variant association is tested with a set of rare variants simultaneously because of large false negative rate, and thus a set of rare variants should be defined. WISARD supports four types of set file format, and it can be selected by using --set option.

NOTE!
--set option is mandatory for running gene-level analysis!

Type-I file format

For type-I format, each line consists of two columnes for gene set name (e.g. SET_A) and variant name respectively, and they should be separated with whitespace (space or tab). Gene set name might be a gene name.

Example 1 : Type-I set file format
SET_A rs172
SET_A rs29445
SET_A SNP_A-1924825
SET_B rs2851
SET_B SNP_A-124
SET_B rs38985
NOTE!
Variants which belong to the same gene should be contiguously placed!

Type-II file format

Type-II file format is equal to t he set definition used in PLINK(see here for plink). Each set must start with a set name which can not have any spaces in it. The name is followed by a list of variants in that gene set, and the keyword END specifies the end of that particular set. You also can refer below example:

Example 2 : Type-II set file format
SET_A
rs172
rs29445
SNP_A-1924825
END
SET_B
rs2851
SNP_A-124
rs38985
END
NOTE!
Do not use END as a name of variant!

Type-III file format

Type-III file format is similar to the type-I definition, but all variants for each set should be enumerated in a single line. Type-III file format is equal to the set definition used in EPACTS.

Example 3 : Type III set
SET_A rs172 rs29445 SNP_A-1924825
SET_B rs2851 SNP_A-124 rs38985

Type-IV file format

Type-IV file format is different with the other three types of set. It defines a set of multiple variants by allocating specific region to each set. Each set can be overlapped among other sets, and a variant which is placed on overlapped region will be assigned to every sets that occupies that region.

Type-V file format (refGenes format)

In many analysis toolsets such as Rvtests uses an existing format for representing gene information.



Pathway definition [top]

Similar to the gene definition, the pathway must be defined in order to perform pathway-level analysis, to declare relationships between genes and pathways. A simple plain-text format can be used to define the pathway mapping.

WISARD provides several gene-set- or pathway-level analysis, which investigates association between pathway(s) and phenotype(s). In order to perform these type of analyses, an extra input similar to the gene-variant mapping, named 'pathway-gene mapping', is required.

Since the pathway-gene mapping(or gene-set) file associates a pathway(or a gene-set) with multiple genes, its format is similar to the gene-variant mapping. Currently, WISARD supports type-III format, which defines one pathway(or one gene-set) per a line by enumerating their beloging genes.

Example 4 : Pathway file format
PATHWAY1 GENE1 GENE2 GENE5
PATHWAY2 GENE3 GENE4 GENE6 GENE7
PATHWAY3 GENE9 GENE10 GENE13
PATHWAY4 GENE13 GENE14 GENE15 GENE6

In the above example, three pathways are defined with their genes. Unlike gene-variant mapping, genes in the pathway not necessarily contiguous or within same chromosome.

Currently, the following analyses utilizes the pathway-gene mapping information.

  • PHARAOH: (binary/continuous, genome) Single phenotype - multiple pathways (S Lee, S Choi et al., 2016) [link]
  • PHARAOH-multi: (continuous, genome) Multiple phenotypes - multiple pathways (S Lee et al., 2018)
  • CPA-SEM: (continuous, transcriptome) Single phenotype - single pathway (with pathway structure) (S Choi, S Lee et al., 2015) [link]


PHARAOH test [top]

Pathway-based approach using HierArchical components of collapsed RAre variants Of High-throughput sequencing data (PHARAOH) was suggested by Lee et al. (Bioinformatics 2016), and it can be applied to both dichotomous and continuous phenotype, by using the structure of Generalized Linear Model (GLM).

Example code

  • Perform PHARAOH test with --pharaoh option
  • Default run of PHARAOH analysis C:\Users\WISARD> wisard --bed test_miss0.bed --pharaoh --sampvar test_miss0_phen.txt --pname age --geneset test_pw.txt --set test_gene.txt --nperm 1000 --cv 5

PHARAOH-multi test [top]

PHARAOH-multi is a multivariate extension of PHARAOH test that was suggested by Lee et al. (BMC Bioinformatics 2018), and it can be applied to multiple continuous phenotypes.

Example code

  • Perform PHARAOH-multi test with --pharaoh option and assignment of multiple phenotypes
  • Default run of PHARAOH analysis C:\Users\WISARD> wisard --bed test_miss0.bed --pharaoh --sampvar test_miss0_phen.txt --pname sbp,dbp --geneset test_pw.txt --set test_gene.txt --nperm 1000 --cv 5

PHARAOH-GEE test [top]

PHARAOH-GEE is a Generalized Estimating Equations(GEE)-based multivariate extension of PHARAOH test, and it can be applied to multiple dichotomous or continuous phenotypes.

Example code

  • Perform PHARAOH-GEE test with --pharaohgee option and assignment of multiple phenotypes
  • Default run of PHARAOH analysis C:\Users\WISARD> wisard --bed test_miss0.bed --pharaohgee --sampvar test_miss0_phen.txt --pname sbp,dbp --geneset test_pw.txt --set test_gene.txt --nperm 1000 --cv 5

HisCoM-Kernel test [top]

HisCoM-Kernel is a kernel-type extension of PHARAOH test, and it can be applied to both dichotomous or continuous phenotype.

Example code

  • Perform HisCoM-Kernel test with --pharaoh and --hiscomkernel option and assignment of single phenotype, and uses continuous dataset (assigned by --expression) instead of genotype data.
  • Default run of HisCoM-Kernel analysis C:\Users\WISARD> wisard --expression test_miss0.txt --pharaoh --hiscomkernel --sampvar test_miss0_phen.txt --pname medi01 --set test_gene.txt --nperm 1000 --prolambda 500
NOTE!
HisCoM-Kernel does not fully support detailed PHARAOH-related options and requires --prolambda for penalization
NOTE!
HisCoM-Kernel is not compatible with genotype dataset. Use continuous dataset instead.


Edit this page
Last modified : 2022-01-05 02:46:12