WISARD[wɪzərd] Workbench for Integrated Superfast Association study with Related Data |
|
Unlike other existing pathway-level association tests, the pathway-level association analyses (PHARAOH) that are implemented in WISARD supports three extra features.
WISARD automatically determines whether each phenotype is either quantitative or dichotomous. By default if only 1, or 2 are observed as phenotypic values, it is assumed to be dichotomous by WISARD and otherwise it is to be quantitative phenotype. With some options, phenotypes with different values can be assumed to be dichotomous.
Rare variant association is tested with a set of rare variants simultaneously because of large false negative rate, and thus a set of rare variants should be defined. WISARD supports four types of set file format, and it can be selected by using --set option.
NOTE! |
--set option is mandatory for running gene-level analysis! |
For type-I format, each line consists of two columnes for gene set name (e.g. SET_A) and variant name respectively, and they should be separated with whitespace (space or tab). Gene set name might be a gene name.
NOTE! |
Variants which belong to the same gene should be contiguously placed! |
Type-II file format is equal to t he set definition used in PLINK(see here for plink). Each set must start with a set name which can not have any spaces in it. The name is followed by a list of variants in that gene set, and the keyword END specifies the end of that particular set. You also can refer below example:
NOTE! |
Do not use END as a name of variant! |
Type-III file format is similar to the type-I definition, but all variants for each set should be enumerated in a single line. Type-III file format is equal to the set definition used in EPACTS.
Type-IV file format is different with the other three types of set.
It defines a set of multiple variants by allocating specific region to each set.
Each set can be overlapped among other sets, and a variant which is placed on overlapped region
will be assigned to every sets that occupies that region.
In many analysis toolsets such as Rvtests uses an existing format for representing gene information.
Similar to the gene definition, the pathway must be defined in order to perform pathway-level analysis, to declare relationships between genes and pathways. A simple plain-text format can be used to define the pathway mapping.
WISARD provides several gene-set- or pathway-level analysis, which investigates association between pathway(s) and phenotype(s). In order to perform these type of analyses, an extra input similar to the gene-variant mapping, named 'pathway-gene mapping', is required.
Since the pathway-gene mapping(or gene-set) file associates a pathway(or a gene-set) with multiple genes, its format is similar to the gene-variant mapping. Currently, WISARD supports type-III format, which defines one pathway(or one gene-set) per a line by enumerating their beloging genes.
In the above example, three pathways are defined with their genes. Unlike gene-variant mapping, genes in the pathway not necessarily contiguous or within same chromosome.
Currently, the following analyses utilizes the pathway-gene mapping information.
Pathway-based approach using HierArchical components of collapsed RAre variants Of High-throughput sequencing data (PHARAOH) was suggested by Lee et al. (Bioinformatics 2016), and it can be applied to both dichotomous and continuous phenotype, by using the structure of Generalized Linear Model (GLM).
Example code
PHARAOH-multi is a multivariate extension of PHARAOH test that was suggested by Lee et al. (BMC Bioinformatics 2018), and it can be applied to multiple continuous phenotypes.
Example code
PHARAOH-GEE is a Generalized Estimating Equations(GEE)-based multivariate extension of PHARAOH test, and it can be applied to multiple dichotomous or continuous phenotypes.
Example code
HisCoM-Kernel is a kernel-type extension of PHARAOH test, and it can be applied to both dichotomous or continuous phenotype.
Example code
NOTE! |
HisCoM-Kernel does not fully support detailed PHARAOH-related options and requires --prolambda for penalization |
NOTE! |
HisCoM-Kernel is not compatible with genotype dataset. Use continuous dataset instead. |