WISARD[wɪzərd]
Workbench for Integrated Superfast Association study with Related Data
HOME  |   DOWNLOAD  |   OPTIONS  |   TROUBLE?  |   LOGIN
 

Genes and pathways

This section describes about

  • Gene-variant mapping(set) file format
    • Type-I file format
    • Type-II file format
    • Type-III file format
    • Type-IV file format
    • Type-V file format (refGenes format)
  • Summarizing information about variants in each gene
    • Pathway-Gene mapping(or gene-set) file format

      Gene-variant mapping(set) file format [top]

      Rare variant association is tested with a set of rare variants simultaneously because of large false negative rate, and thus a set of rare variants should be defined. WISARD supports four types of set file format, and it can be selected by using --set option.

      NOTE!
      --set option is mandatory for running gene-level analysis!

      Type-I file format

      For type-I format, each line consists of two columnes for gene set name (e.g. SET_A) and variant name respectively, and they should be separated with whitespace (space or tab). Gene set name might be a gene name.

      Example 1 : Type-I set file format
      SET_A rs172
      SET_A rs29445
      SET_A SNP_A-1924825
      SET_B rs2851
      SET_B SNP_A-124
      SET_B rs38985
      NOTE!
      Variants which belong to the same gene should be contiguously placed!

      Type-II file format

      Type-II file format is equal to t he set definition used in PLINK(see here for plink). Each set must start with a set name which can not have any spaces in it. The name is followed by a list of variants in that gene set, and the keyword END specifies the end of that particular set. You also can refer below example:

      Example 2 : Type-II set file format
      SET_A
      rs172
      rs29445
      SNP_A-1924825
      END
      SET_B
      rs2851
      SNP_A-124
      rs38985
      END
      NOTE!
      Do not use END as a name of variant!

      Type-III file format

      Type-III file format is similar to the type-I definition, but all variants for each set should be enumerated in a single line. Type-III file format is equal to the set definition used in EPACTS.

      Example 3 : Type III set
      SET_A rs172 rs29445 SNP_A-1924825
      SET_B rs2851 SNP_A-124 rs38985

      Type-IV file format

      Type-IV file format is different with the other three types of set. It defines a set of multiple variants by allocating specific region to each set. Each set can be overlapped among other sets, and a variant which is placed on overlapped region will be assigned to every sets that occupies that region.

      Type-V file format (refGenes format)

      In many analysis toolsets such as Rvtests uses an existing format for representing gene information.

      Summarizing information about variants in each gene [top]

      • --genesummary option
      • WISARD provides several summary measure for each gene and it can be conducted by using --genesummary option.
      • --gmapsummary option
      • For dichotomous phenotype, it is often useful to summarize the number of rare alleles in cases and controls, and it can be conducted by using --gmapsummary option.
      • --makeset option
      • Monomorphic variants or variants of which MAFs are larger than threshold are excluded from the analysis, and rare variants which are actually used for the statistic for each gene can be listed by using --makeset option. Output file can be provide in type-I, type-II, or type-III file formats and it can be chosen by using --settype option.
        Generate type III gene-variant definition C:\Users\WISARD> wisard --bed test_miss0.bed --set test_gene.txt --makeset --settype 3
        NOTE!
        Generated set from --makeset contains existing variants in given dataset


      Pathway-Gene mapping(or gene-set) file format [top]

      WISARD provides several gene-set- or pathway-level analysis, which investigates association between pathway(s) and phenotype(s). In order to perform these type of analyses, an extra input similar to the gene-variant mapping, named 'pathway-gene mapping', is required.

      Since the pathway-gene mapping(or gene-set) file associates a pathway(or a gene-set) with multiple genes, its format is similar to the gene-variant mapping. Currently, WISARD supports type-III format, which defines one pathway(or one gene-set) per a line by enumerating their beloging genes.

      Example 4 : Pathway file format
      PATHWAY1 GENE1 GENE2 GENE5
      PATHWAY2 GENE3 GENE4 GENE6 GENE7
      PATHWAY3 GENE9 GENE10 GENE13
      PATHWAY4 GENE13 GENE14 GENE15 GENE6

      In the above example, three pathways are defined with their genes. Unlike gene-variant mapping, genes in the pathway not necessarily contiguous or within same chromosome.

      Currently, the following analyses utilizes the pathway-gene mapping information.

      • PHARAOH: (binary/continuous, genome) Single phenotype - multiple pathways (S Lee, S Choi et al., 2016) [link]
      • PHARAOH-multi: (continuous, genome) Multiple phenotypes - multiple pathways (S Lee et al., 2018)
      • CPA-SEM: (continuous, transcriptome) Single phenotype - single pathway (with pathway structure) (S Choi, S Lee et al., 2015) [link]


      Edit this page
      Last modified : 2018-03-04 13:31:59