WISARD[wɪzərd] Workbench for Integrated Superfast Association study with Related Data |
|
This section describes about
For single variant analysis, statistical power is related with minor allele frequency (MAF), and the genome-wide significance is possible with extremely large samples if MAF is small. In addition, most statistical methods are based on central limit theorem, but if MAF is small, normaility of statistics is hardly met. This problem can be very serious for genome-wide association study. In this context, MAF is often used for quality control for single variant analysis. WISARD provides three different methods to calculate MAF for each variant.
It should be noted that estimates from all methods are same for population-based samples (such as case-control design), but for family-based samples, their estimates can be substantially different. For instance, let's consider the following example data:
FAM_1 SAMP1_1 0 0 C A C A A A A A A A C AFAM_1 SAMP2_1 0 0 C C A A C A C A A A A AFAM_1 SAMP3_1 SAMP1_1 SAMP2_2 A A C A A A C A A A A AFAM_1 SAMP4_1 0 0 C A C A A A C A A A C AFAM_1 SAMP5_1 0 0 0 0 C C A A A A C A C AFAM_1 SAMP6_1 SAMP4_1 SAMP5_1 C A C A A A C A A A A AFAM_1 SAMP7_1 SAMP3_1 SAMP6_2 A A 0 0 A A C A A A A AFAM_1 SAMP8_1 SAMP3_1 SAMP6_1 C C 0 0 A A A A A A A AFAM_1 SAMP9_1 SAMP3_1 SAMP6_2 C C A A A A C C A A A AFAM_1 SAMP10_1 SAMP3_1 SAMP6_1 C C C A A A C C A A A A
Founder-only indicates that MAF for each variant is estimated by using only founders and it can be calculated with WISARD by using --freq option. For example data, SAMP_1_1, samp_2_1, SAMP4_1 and SAMP5_1 are founders, and therefore MAF are 0.333(2/6) 0.5(4/8) 0.125(1/8) 0.25(2/8) 0.125(2/8) 0.375(3/8). This approach is computationally fast and easy to compute. However if there are many founders with missing genotype, this approach is not efficient any more. WISARD calculates MAF in this way by default and the output file extension is "founders.maf".
.founders.maf is... | A computed MAF using only founder samples (TSV) | ||
Column | Format | Modifier | Description | VARIANT | string | NONE | Tested variant name | ANNOT | string | --annogene | Annotation for the variant | MAJOR | string | --annogene | Annotation for the variant | MINOR | string | --annogene | Annotation for the variant | MAF | real | NONE | Minor allele frequency for the variant, with given MAF computing criterion | MAC | integer | NONE | Minor allele count for the variant, with given MAF computing criterion | NIND | integer | NONE | Number of samples used to compute MAF |
---|
All individuals are used to estimate MAF, and in the previous example, MAFs are 0.388(7/18) 0.438(7/16) 0.05(1/20) 0.45(9/20) 0.05(1/20) 0.15(3/20). This approach is computationally fast and easy to compute. However nonfounders' genotype is not informative for MAF if founders' genotype is known. For population-based samples, the estimated MAFs using all individuals are equivalent to those using founder only. If family sizes are heterogeneous, the estimaed MAF using all individuals can be inefficient. In order to calculate MAF using all individuals, use option "--freq all".
all.maf is... | A computed MAF using all samples (TSV) | ||
Column | Format | Modifier | Description | VARIANT | string | NONE | Tested variant name | ANNOT | string | --annogene | Annotation for the variant | MAJOR | common::MAJOR | MINOR | common::MINOR | MAF | real | NONE | Minor allele frequency for the variant, with given MAF computing criterion | MAC | integer | NONE | Minor allele count for the variant, with given MAF computing criterion | NIND | integer | NONE | Number of samples used to compute MAF |
---|
NOTE! |
This approach is ONLY applicable to family dataset! |
McPeek (Biometrics 2004) suggested the best linear unbiased estimator(BLUE) for MAF. Even though this estimate needs intensive computation, it is more efficient than the other two approaches if there are many founders with missing genotype or family sizes are heterogeneous. We let $\Phi$ be the familial relationshp matrix and $X$ be a genotype vector. If we let $1$ be a column vector whose all elements are 1, BLUE for MAF is expressed by
BLUE for MAF can be estimated with WISARD by using "--freq blue" option, as below.
Output index for extension [blue.maf] is currently not availableWISARD provides a simple option to filter variants of which MAF are in a certain range by using --filfreq option.
NOTE! |
The parameter of this option supports range type parameter |
In these examples, --freq option is not specified and thus maf is calculated by the default method, founder-only. If you want to filter variants by using BLUE for MAF, "--freq blue" option must be added.