WISARD official site

Select O/S : [?]

Case tutorial

Sample selection from pedigree

In prior to the sequencing or genotyping, samples should be sequenced or genotyped can be chosen using WISARD, according to maximize an efficiency of sequencing or genotyping.

Sample selection theory [top]

In WISARD, the sample selection procedure focus on family members whose phenotypes are known. Let X1 and X2 be genotype vectors for typed and untyped individuals respectively. Under the assumption of the number of ungenotyped individuals with known phenotype is $X1$ and the number of genotyped individuals is $X2$, their conditional and marginal variance matrix is approximately $var\left(X\right)=2p\left(1-p\right)\Phi_X$ and $var\left(X2|X1\right)=2p\left(1-p\right)\left\{\Phi_{X2}-\Phi_{X2X1}\Phi_{X1X1}^{-1}\Phi_{X1X2}\right\}$. Then the informativity of $X1$, $\lambda\left(X1\right)$, can be defined by the amount of RV as follows:

$\lambda\left(X1\right)=\frac{tr\left\{var\left(X1\right)\right\} - tr\left\{var\left(X1|X2\right)\right\}}{tr\left\{var\left(X1\right)\right\}} = 1 - \frac{1}{n_{X2}}tr\left(\Phi_{X2X1}\Phi_{X1X1}^{-1}\Phi_{X1X2}\right)$

Therefore the most informative individuals when $n_{X1}$ is fixed can be selected by minimizing $tr\left(\Phi_{X2X1}\Phi_{X1X1}^{-1}\Phi_{X1X2}\right)$.

How the sample selection procedure works [top]

In order to select m samples out of n samples, following procedures are performed.

For each sample in the pedigree, a step-wise method that minimizing $tr\left(\Phi_{X2X1}\Phi_{X1X1}^{-1}\Phi_{X1X2}\right)$ is performed with X2=selected samples.
If above procedure found exact number of samples, stop and report.
Otherwise the number of chosen samples are around the number of samples, enumerate entire combination and find best combination and report.
Otherwise, a specific number of random selection based on the combination gave minimum variance are performed, and report.

Sample selection from user-defined pedigree structure [top]

In order to perform sample selection, below options are required.

--sxa performs the procedure of sample selection.
--kinship lets WISARD to use kinship coefficient as sample relatedness.
--fam assigns pedigree structure that samples to be chosen from.
--nsamp determines the number of samples to be chosen.

Note that if the number of samples to be chosen exceeds the number of samples in the pedigree, it will do nothing but exit.

Select ten samples from given pedigree structure 'sample.fam' C:\Users\WISARD> WISARD --sxa --fam sample.fam --nsamp 10

Additional options

With sample selection procedure, below options can be used.

--cor option replaces kinship coefficient to user-defined sample relatedness matrix.
--remsamp/--selsamp can be used to filtering out specific samples.
--usemf allows an inclusion of missing founders to the analysis.

Output files [top]

By performing sample selection procedure, below files are generated in default.

Column	Format	Modifier	Description
*sampvar.res* is...			A result of computation of conditional variance for each sample (TSV)
FID	0/1	--mqlsconsec/--set	Generalized inverion used(1) or not(0) in the test
IID	0/1	--mqlsconsec/--set	Generalized inverion used(1) or not(0) in the test
CONDVAR	real	NONE	Conditional variance of given the sample

Column	Format	Modifier	Description
*top.sampvar.res* is...			(TSV)
RANK	Positive integer	NONE	The rank of least conditional variance
FID	Positive integer	NONE	The rank of least conditional variance
IID	Positive integer	NONE	The rank of least conditional variance
CONDVAR	Positive integer	NONE	The rank of least conditional variance

Output index for extension [top.sampvar2.res] is currently not available

Edit this page

Last modified : 2014-02-25 21:48:12