WISARD[wɪzərd]
Workbench for Integrated Superfast Association study with Related Data
HOME  |   DOWNLOAD  |   OPTIONS  |   TROUBLE?  |   LOGIN
 

Allele Frequency

This section describes about

  • Various allele frequencies
    • Founder-only
    • All-individuals
    • BLUE
  • Filtering variants with MAF

    Various allele frequencies [top]

    For single variant analysis, statistical power is related with minor allele frequency (MAF), and the genome-wide significance is possible with extremely large samples if MAF is small. In addition, most statistical methods are based on central limit theorem, but if MAF is small, normaility of statistics is hardly met. This problem can be very serious for genome-wide association study. In this context, MAF is often used for quality control for single variant analysis. WISARD provides three different methods to calculate MAF for each variant.

    It should be noted that estimates from all methods are same for population-based samples (such as case-control design), but for family-based samples, their estimates can be substantially different. For instance, let's consider the following example data:

    FAM_1 SAMP1_1 0 0 C A C A A A A A A A C A
    FAM_1 SAMP2_1 0 0 C C A A C A C A A A A A
    FAM_1 SAMP3_1 SAMP1_1 SAMP2_2 A A C A A A C A A A A A
    FAM_1 SAMP4_1 0 0 C A C A A A C A A A C A
    FAM_1 SAMP5_1 0 0 0 0 C C A A A A C A C A
    FAM_1 SAMP6_1 SAMP4_1 SAMP5_1 C A C A A A C A A A A A
    FAM_1 SAMP7_1 SAMP3_1 SAMP6_2 A A 0 0 A A C A A A A A
    FAM_1 SAMP8_1 SAMP3_1 SAMP6_1 C C 0 0 A A A A A A A A
    FAM_1 SAMP9_1 SAMP3_1 SAMP6_2 C C A A A A C C A A A A
    FAM_1 SAMP10_1 SAMP3_1 SAMP6_1 C C C A A A C C A A A A

    Founder-only

    Founder-only indicates that MAF for each variant is estimated by using only founders and it can be calculated with WISARD by using --freq option. For example data, SAMP_1_1, samp_2_1, SAMP4_1 and SAMP5_1 are founders, and therefore MAF are 0.333(2/6) 0.5(4/8) 0.125(1/8) 0.25(2/8) 0.125(2/8) 0.375(3/8). This approach is computationally fast and easy to compute. However if there are many founders with missing genotype, this approach is not efficient any more. WISARD calculates MAF in this way by default and the output file extension is "founders.maf".

    Getting founder-only MAF estimates C:\Users\WISARD> wisard --bed test_miss0.bed --freq founder --out res_maf_founder
    .
    founders.maf is... A computed MAF using only founder samples (TSV)
    Column Format Modifier Description
    VARIANT string NONE Tested variant name
    ANNOT string --annogene Annotation for the variant
    MAJOR string --annogene Annotation for the variant
    MINOR string --annogene Annotation for the variant
    MAF real NONE Minor allele frequency for the variant, with given MAF computing criterion
    MAC integer NONE Minor allele count for the variant, with given MAF computing criterion
    NIND integer NONE Number of samples used to compute MAF

    All-individuals

    All individuals are used to estimate MAF, and in the previous example, MAFs are 0.388(7/18) 0.438(7/16) 0.05(1/20) 0.45(9/20) 0.05(1/20) 0.15(3/20). This approach is computationally fast and easy to compute. However nonfounders' genotype is not informative for MAF if founders' genotype is known. For population-based samples, the estimated MAFs using all individuals are equivalent to those using founder only. If family sizes are heterogeneous, the estimaed MAF using all individuals can be inefficient. In order to calculate MAF using all individuals, use option "--freq all".

    Getting entire-population MAF estimates C:\Users\WISARD> wisard --bed test_miss0.bed --freq all --out res_maf_all
    all.maf is... A computed MAF using all samples (TSV)
    Column Format Modifier Description
    VARIANT string NONE Tested variant name
    ANNOT string --annogene Annotation for the variant
    MAJOR common::MAJOR
    MINOR common::MINOR
    MAF real NONE Minor allele frequency for the variant, with given MAF computing criterion
    MAC integer NONE Minor allele count for the variant, with given MAF computing criterion
    NIND integer NONE Number of samples used to compute MAF

    BLUE

    NOTE!
    This approach is ONLY applicable to family dataset!

    McPeek (Biometrics 2004) suggested the best linear unbiased estimator(BLUE) for MAF. Even though this estimate needs intensive computation, it is more efficient than the other two approaches if there are many founders with missing genotype or family sizes are heterogeneous. We let $\Phi$ be the familial relationshp matrix and $X$ be a genotype vector. If we let $1$ be a column vector whose all elements are 1, BLUE for MAF is expressed by


    $\left( 1' \Phi^{-1} 1 \right)^{-1} 1' \Phi^{-1} X$, where $\Phi$ is (2 * kinship coefficient).

    BLUE for MAF can be estimated with WISARD by using "--freq blue" option, as below.

    Get MAF with BLUE method C:\Users\WISARD> wisard --bed test_miss0.bed --kinship --freq blue --out res_maf_blue
    Output index for extension [blue.maf] is currently not available

    Filtering variants with MAF [top]

    WISARD provides a simple option to filter variants of which MAF are in a certain range by using --filfreq option.

    NOTE!
    The parameter of this option supports range type parameter
    Exclude by minor allele frequency lower than 2% C:\Users\WISARD> wisard --filfreq "<0.02" --ped test_miss0.ped
    Include by minor allele frequency greater or equal than 5% C:\Users\WISARD> wisard --incfreq [0.05,0.5] --ped test_miss0.ped
    Exclude variants having minor allele 0~2 C:\Users\WISARD> wisard --filmac [0,2] --ped test_miss0.ped
    Include variants having minor allele more than 5 C:\Users\WISARD> wisard --incmac ">5" --ped test_miss0.ped

    In these examples, --freq option is not specified and thus maf is calculated by the default method, founder-only. If you want to filter variants by using BLUE for MAF, "--freq blue" option must be added.



    Edit this page
    Last modified : 2017-08-29 08:49:21