WISARD[wɪzərd]
Workbench for Integrated Superfast Association study with Related Data
HOME  |   DOWNLOAD  |   OPTIONS  |   TROUBLE?  |   LOGIN
 

Missing treatment?

This section is about

  • Type of missing
    • Missing founders
      • Missing samples
        • Missing genotype

          Type of missing [top]

          WISARD defines 'missing' data in several ways:

          1. For sample, genotype data (non-missing) exists or don't exist (missing).
          2. For sample, the data has complete observation of interested phenotype(s) and covariates (non-missing), or has at least one missing (missing).
          3. For founder, genotype data (available founder) exists, or don't have genotype data (missing founder).
          4. For variant, all genotypes are completely available (non-missing), or at least one genotype is unavailable (missing).

          In order to perform an analysis with more diverse possibility and reduced time, WISARD provides various way to treat all of above "missing" situations.

          Missing founders [top]

          Basically, WISARD retains entire pedigree information, regardless some of samples are pruned out by the filter, it did not exist from the beginning, or whatever situation. From this reason, some samples might not have genotype data even if they have family information (father/mother/offspring). However, this information sometimes gives a lot of insight about pedigrees' structure, since it is possible to reconstruct the original pedigree structure based on those "missing samples".

          Missing samples [top]

          Even if the samples have their independent records, usually considered as non-missing, it is possible to categorize them as missing samples. If there are some samples with their genotype entirely missing, those are categorized into 'missing' samples. It sometimes occurs in the dataset since they could be a link that connects two generation within same family, or in a partial (i.e., chromosome-wise) dataset since their genotype is unavailable from some mistake and quality processing, or genetic disorder.

          NOTE!
          If there are no variants in dataset, all samples with their records are considered non-missing!

          In WISARD, missing samples are extensively utilized, in order to maximize usage of given information. However, missing samples are also omitted from several analysis reports except for some. In addition, WISARD automatically constructs entire family structure and nuclear family-wise structure in order to utilize this information. In this process, WISARD detects below 'abnormalities' of family structure:

          • Sex abnormalities
          • As described in data management section, it is possible to gather the sex information of given sample if it is included in PAT or MAT section of other samples. From here, WISARD warns if some sample lacked its sex information while it is possible to induce its sex. In addition, it is also possible to encounter sex information conflict from several reason (e.g., some male-marked sample is someone's mother). In such confliction, WISARD halts execution and report which sample is problem.

          Missing genotype [top]

          In WISARD, currently three types of genotype imputation is supported. Note that some of imputation scheme has limitations and an optimal imputation scheme is automatically determined.

          • With --impute option and independent dataset (i.e., does not have pedigree), a simple imputation based on the minor allele frequency is applied for unrelated dataset.
          • With --impute option and family dataset, more advanced imputation based on the pedigree structure is applied.
          • With --fastimpute option, more complex but fast imputation scheme same as PLINK's is applied, regardless of an existence of pedigree information.

          Since an imputed data is dosage, which is not a series of normal genotype value (0, 1, or 2), an imputation scheme requires large memory space, say 1 megabyte per 125,000 genotypes.



          Edit this page
          Last modified : 2014-03-04 16:00:07