WISARD[wɪzərd]
Workbench for Integrated Superfast Association study with Related Data
HOME  |   DOWNLOAD  |   OPTIONS  |   TROUBLE?  |   LOGIN
 

Sample selection from pedigree

In prior to the sequencing or genotyping, samples should be sequenced or genotyped can be chosen using WISARD, according to maximize an efficiency of sequencing or genotyping.

Sample selection theory [top]

In WISARD, the sample selection procedure focus on family members whose phenotypes are known. Let X1 and X2 be genotype vectors for typed and untyped individuals respectively. Under the assumption of the number of ungenotyped individuals with known phenotype is $X1$ and the number of genotyped individuals is $X2$, their conditional and marginal variance matrix is approximately $var\left(X\right)=2p\left(1-p\right)\Phi_X$ and $var\left(X2|X1\right)=2p\left(1-p\right)\left\{\Phi_{X2}-\Phi_{X2X1}\Phi_{X1X1}^{-1}\Phi_{X1X2}\right\}$. Then the informativity of $X1$, $\lambda\left(X1\right)$, can be defined by the amount of RV as follows:

$\lambda\left(X1\right)=\frac{tr\left\{var\left(X1\right)\right\} - tr\left\{var\left(X1|X2\right)\right\}}{tr\left\{var\left(X1\right)\right\}} = 1 - \frac{1}{n_{X2}}tr\left(\Phi_{X2X1}\Phi_{X1X1}^{-1}\Phi_{X1X2}\right)$


Therefore the most informative individuals when $n_{X1}$ is fixed can be selected by minimizing $tr\left(\Phi_{X2X1}\Phi_{X1X1}^{-1}\Phi_{X1X2}\right)$.

How the sample selection procedure works [top]

In order to select m samples out of n samples, following procedures are performed.

  • For each sample in the pedigree, a step-wise method that minimizing $tr\left(\Phi_{X2X1}\Phi_{X1X1}^{-1}\Phi_{X1X2}\right)$ is performed with X2=selected samples.
  • If above procedure found exact number of samples, stop and report.
  • Otherwise the number of chosen samples are around the number of samples, enumerate entire combination and find best combination and report.
  • Otherwise, a specific number of random selection based on the combination gave minimum variance are performed, and report.

Sample selection from user-defined pedigree structure [top]

In order to perform sample selection, below options are required.

  • --sxa performs the procedure of sample selection.
  • --kinship lets WISARD to use kinship coefficient as sample relatedness.
  • --fam assigns pedigree structure that samples to be chosen from.
  • --nsamp determines the number of samples to be chosen.

Note that if the number of samples to be chosen exceeds the number of samples in the pedigree, it will do nothing but exit.

Select ten samples from given pedigree structure 'sample.fam' C:\Users\WISARD> WISARD --sxa --fam sample.fam --nsamp 10

Additional options

With sample selection procedure, below options can be used.

  • --cor option replaces kinship coefficient to user-defined sample relatedness matrix.
  • --remsamp/--selsamp can be used to filtering out specific samples.
  • --usemf allows an inclusion of missing founders to the analysis.

Output files [top]

By performing sample selection procedure, below files are generated in default.

sampvar.res is... A result of computation of conditional variance for each sample (TSV)
Column Format Modifier Description
FID 0/1 --mqlsconsec/--set Generalized inverion used(1) or not(0) in the test
IID 0/1 --mqlsconsec/--set Generalized inverion used(1) or not(0) in the test
CONDVAR real NONE Conditional variance of given the sample
top.sampvar.res is... (TSV)
Column Format Modifier Description
RANK Positive integer NONE The rank of least conditional variance
FID Positive integer NONE The rank of least conditional variance
IID Positive integer NONE The rank of least conditional variance
CONDVAR Positive integer NONE The rank of least conditional variance
Output index for extension [top.sampvar2.res] is currently not available

Edit this page
Last modified : 2014-02-25 21:48:12