WISARD[wɪzərd] Workbench for Integrated Superfast Association study with Related Data |
|
This section is about
WISARD defines 'missing' data in several ways:
In order to perform an analysis with more diverse possibility and reduced time, WISARD provides various way to treat all of above "missing" situations.
Basically, WISARD retains entire pedigree information, regardless some of samples are pruned out by the filter, it did not exist from the beginning, or whatever situation. From this reason, some samples might not have genotype data even if they have family information (father/mother/offspring). However, this information sometimes gives a lot of insight about pedigrees' structure, since it is possible to reconstruct the original pedigree structure based on those "missing samples".
Even if the samples have their independent records, usually considered as non-missing, it is possible to categorize them as missing samples. If there are some samples with their genotype entirely missing, those are categorized into 'missing' samples. It sometimes occurs in the dataset since they could be a link that connects two generation within same family, or in a partial (i.e., chromosome-wise) dataset since their genotype is unavailable from some mistake and quality processing, or genetic disorder.
NOTE! |
If there are no variants in dataset, all samples with their records are considered non-missing! |
In WISARD, missing samples are extensively utilized, in order to maximize usage of given information. However, missing samples are also omitted from several analysis reports except for some. In addition, WISARD automatically constructs entire family structure and nuclear family-wise structure in order to utilize this information. In this process, WISARD detects below 'abnormalities' of family structure:
In WISARD, currently three types of genotype imputation is supported. Note that some of imputation scheme has limitations and an optimal imputation scheme is automatically determined.
Since an imputed data is dosage, which is not a series of normal genotype value (0, 1, or 2), an imputation scheme requires large memory space, say 1 megabyte per 125,000 genotypes.