WISARD official site

Select O/S : [?]

Case tutorial

Study design
Data management
Summary statistics
Association analysis
- Relationship matrix
- Population stratification
- Association analysis
- Association analysis with population stratification
- Family-based association analysis
- Epistasis analysis
- Gene-level association
- Gene-level association with population stratification
- Family-based gene-level association
- Pathway-level association
- Meta-analysis
Miscellaneous
- Output control
- Link with other tools

Family-based association tests

This section describes about

Available statistics
Transmission Disequilibrium Test (TDT)
Sibship TDT (SDT)
MQLS
FQLS
- Using proband information
- Example codes
GEMMA
Generalized score test
MFQLS

Available statistics [top]

The most efficient statistic depends on the ascertainment condition, the property of phenotypes (dichotomous/quantitative), the presence of covariates and the absence/presence of population stratification. In this section, we illustrate statistics for family-based samples.

Summary for available statistics:

Test name	Speed	Phenotype	Covariates	Description
Transmission Disequilibrium Test (TDT)	(fast)	Dichotomous	Can't adjust	It is always robust against the population stratification but because parental genotypes are not used, it is often statistically inefficient.
Sibship TDT (SDT)	(fast)	Dichotomous	Can't adjust	The original TDT is unapplicable if parental genotypes are unknown. SDT overcomes by using variant data from unaffected sibs (Speilman and Ewens Am J Hum Genet 1998).
MQLS	(slow)	Dichotomous	Can't adjust	It is an extended Cochran Armitage test for family-based samples (Thronton et al Am J Hum Genet 2007). It is for dichotomous phenotypes, and for ascertained families it is usually efficient. Covariate effects cannot be adjusted, but some modification enables adjustment of covariate effects.
Family QLS (FQLS)	(moderate)	Dichotomous/continuous	Can't adjust	It is an extended Cochran Armitage test for family-based samples (Thronton et al Am J Hum Genet 2007). It is for dichotomous phenotypes, and for ascertained families it is usually efficient. Covariate effects cannot be adjusted, but some modification enables adjustment of covariate effects.
EMMA/GEMMA	(fast)	Continuous	Adjust	they are Wald test for linear mixed model where variance-covariance matrix is parameterized by kinship coefficient matrix. They are for quantitative phenotypes and usually efficient for randomly selected samples.
Generalized score test	(fast)	Continuous	Adjust	generalized score test for linear mixed model for EMMAX/GEMMA. It can be applied to quantitative phenotypes and usually efficient for randomly selected sample.
MFQLS	(fast)	Continuous & multivariate	Adjust	It is an extended MQLS for joint analysis with multiple phenotypes and multiple variants.

Transmission Disequilibrium Test (TDT) [top]

TDT(Speilman et al Am J Hum Genet 1993) is an association test for family-based samples and TDT tests whether transmitted alleles are different between cases.

TDT is always robust against population stratification. For large-scale genetic data, several statistics such as genomic controls, EIGENSTRAT, etc are robust against population stratification. However if the number of variants is not sufficiently large, they are not robust aginst population stratification but TDT is still robust. TDT does not utilize founders' genotypes and it is often statistically inefficient. In this reason, TDT is often used for candidate gene analysis.

WISARD can perform TDT by using --tdt option.

Example code

Analysis with TDT C:\Users\WISARD> wisard --bed test_miss0.bed --tdt --out res_tdt

Output file

Column	Format	Modifier	Description
*tdt.res* is...			A result of Transmission Disequilibrium Test (TSV)
CHR	integer	NONE	Proportion of missingess for the sample
VARIANT	integer	NONE	Proportion of missingess for the sample
POS	integer	NONE	Proportion of missingess for the sample
ALT	integer	NONE	Proportion of missingess for the sample
ANNOT	integer	NONE	Proportion of missingess for the sample
PHENO	string	--sampvar,--pname	Tesed phenotype
STAT	real	NONE	Statistic from TDT
P_TDT	real	NONE	p-value from TDT

Sibship TDT (SDT) [top]

TDT can be performed only when both parent and child's genotype are available. However, parental genotypes can sometimes be unavailable. SDT overcomes by using variant data from unaffected sibs (Speilman and Ewens Am J Hum Genet 1998).

The general statistical property is similar with TDT, and WISARD can perform SDT by using --sdt option.

Example code

Analysis with SDT C:\Users\WISARD> wisard --bed test_miss0.bed --sdt --out res_sdt

Output file

Column	Format	Modifier	Description
*sdt.res* is...			A result of Sibship Disequilibrium Test (TSV)
CHR	real	NONE	p-value from TDT
VARIANT	real	NONE	p-value from TDT
POS	real	NONE	p-value from TDT
ALT	real	NONE	p-value from TDT
ANNOT	real	NONE	p-value from TDT
PHENO	string	--sampvar,--pname	Tesed phenotype
STAT	real	NONE	Statistic from SDT
P_TDT	real	NONE	p-value from SDT

MQLS [top]

MQLS is an extended Cochran Armitage test and is suggested for family-based samples (Thornton and McPeek Am J Hum Genet 2007). It is a score test based on quasi-likelihood. MQLS under the presence and the absence of population stratification are same other than the choice of relationship matrix; under the presence of population stratification, genetic relationship matrix should be incorporated and under the absence of popoulation stratification, kinship coefficient matrix should be used.

For MQLS, affected and unaffected individuals are coded as 1 and 0 respectively, and if phenotype is missing, their phenotypes are coded as prevalence. This scheme indicates that individuals with missing genotypes may be affected with a probability of prevalence. Individuals with missing genotype are excluded from analysis or missing genotypes can be replaced with 2$\times$MAF.

Because MQLS is an extension of Cochran Armitage test for family-based design, it is efficient for case-control design but if there are some covariates that need to be adjusted or samples are randomly selected, some modification is necessary. In such a case, the residuals from the linear mixed model can be utilized as response even though phenotypes are dichotomous (Won and Lange Stat in Med 2013).

Example codes

Running MQLS

Run MQLS by setting prevalence 5% C:\Users\WISARD> wisard --ped test_miss0.ped --mqls --kinship --prevalence 0.05

--mqls

--prevalence

Randomly selected samples and covariate effect adjustment
Under the presence of population stratification, statistical analyses with MQLS for population-based and family-based designs are completely same and see the association analysis under the presence of population stratification for the example code.

FQLS [top]

Family-based quasi-likelihood score(FQLS) test is an extended MQLS and is more efficient than MQLS if each family is ascertained by some probands. When each family is ascertained by some probands, the ascertainment bias depends on the relationship with probands, and it is heterogeneous. The heterogeneity of ascertainment bias is substantial for large family and FQLS adjusts the heterogeneity bias by liability model. If heritability is large, the power improvement is substantial.

FQLS can be applied to both dichotomous or continuous phenotypes, and modification is necessary if there are some covariante effects to be adjusted or phenotypes are quantitative.

FQLS using WISARD can be performed with an assignment of --fqls option.
In default, --fqls requires below additional options, and performed with an imputation of missinge genotype as 2*maf, where maf is minor allele frequency of given variant from founders.

In default, FQLS computes offset based on the following factors: pedigree structure, proband status, heritability and prevalence. Among of those factors, prevalence should be omitted when the phenotype is not dichotomous.

Using proband information

WISARD provides two ways for estimating offset: Assume each family member is potential proband, or there is an exact information who are the probands. If there is an information of proband status for each individual, it is possible to utilize that information into FQLS analysis. In order to do that, sample variable(with --sampvar) is required in default. As introduced in the sample variable section, there are a number of 'reserved' column name for the other usage, and proband status is one of them. In default, if there is a column named 'PROBAND' in the provided sample variable file, WISARD automatically detects it and retrieve it as proband information for each sample. To assign correct proband status, some conditions as the below are required.

A column name for proband status should be 'PROBAND' without quote. It should be case-sensitive. If the column name is different, --probandcol should be given to let WISARD know which column is that.
Proband status should be given as an integer, either 0(not proband) or 1(proband). Otherwise it will be misinterpreted.
At least one of proband should be exist per one family. Otherwise it will raise an error.

Example codes

Setting prevalence of dichotomous disease and heritability

Calculate FQLS with prevalence 5%, heritability 80% C:\Users\WISARD> wisard --ped test_miss0.ped --fqls --kinship --prevalence 0.05 --heri 0.8

For family-based association analysis, kinship coefficient matrix should be used as a relationship matrix by using --kinship option. If there exists population stratification, genetic relationship matrix must be incorporated.

Setting heritability only for continuous phenotype

Calculate FQLS for continuous phenotype with heritability 80% C:\Users\WISARD> wisard --ped test_miss0.ped --fqls --kinship --heri 0.8 --sampvar test_miss0_phen.txt --pname height

Assign proband information with dichotomous phenotype

Perform FQLS analysis with proband information C:\Users\WISARD> wisard --ped testdat.ped --fqls --kinship --prevalence 0.05 --heri 0.8 --sampvar proband.txt

Perform FQLS analysis with proband information and assign non-default column name for proband C:\Users\WISARD> wisard --ped testdat.ped --fqls --kinship --prevalence 0.05 --heri 0.8 --sampvar proband.txt --probandcol T2D_PROBAND

Missing genotype handling

Calculate FQLS with prevalence 5%, heritability 80%, and set p-value threshold 10.0^-7 C:\Users\WISARD> wisard --ped testdat.ped --fqls --retestthr 1e-7 --kinship --prevalence 0.05 --heri 0.8

--availonly

--retestthr

NOTE!

--retestthr and --availonly cannot be used simultaneously!

GEMMA [top]

For family-based design, GEMMA can be utilized under both presence and absence of population stratification. Under the presence of population stratification, exactly same statistics and codes for GEMMA with population-based samples under the presence of population stratification (see the association analysis under the presence of population stratification for example code). Difference of statistics and WISARD code for family-based design under presence and absence of population stratification is only the choice of relationship matrix, and the kinship coefficient matrix instead of ibs matrix should be used under the absence of population stratification.

GEMMA are known to be the most efficient approach for quantitative phenotypes (Kang et al Nat Genet 2010), and if polygenic effect is substantially large(e.g. height), the improvement by them can be substantial. If it is not clear whether polygenic effect is large, heritability for quantitative phenotypes can be alternatively used. Even though parameter estimation in a linear mixed model is usually computationally intensive, computationally efficient algorithm proposed by both approaches enable the genome-wide association anlaysis in a short time. GEMMA provides computationally much efficient strategies, and by default, WISARD calculate GEMMA.

Example codes

GEMMA
Perform GEMMA C:\Users\WISARD> wisard --bed test_miss0.bed --gemma --out res_gemma
GEMMA with covariate
Perform GEMMA with covariate adjustment C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --cname age,height --gemma --out res_gemma_cov

Generalized score test [top]

For family-based design, generalized score test can be utilized under both presence and absence of population stratification. Under the presence of population stratification, exactly same statistics and codes for population-based samples under the presence of population stratification (see the association analysis under the presence of population stratification for example code). Difference of statistics and WISARD code for family-based design under presence and absence of population stratification is only the choice of relationship matrix, and the kinship coefficient matrix instead of ibs matrix should be used under the absence of population stratification.

WISARD supports generalized score test for linear mixed mode. Linear mixed models for EMMAX/GEMMA and generalized score test are same. EMMAX/GEMMA are Wald tests and Wald tests are known to be statistically more efficient than score tests. However, EMMAX/GEMMA are more sensitive to the normality than generalized score test, and thus if nonnormality is expected, generalized score test may be reasonble choice.

Example codes

Generalized score test
Perform generalized score test for family-based dataset C:\Users\WISARD> wisard --bed test_miss0.bed --scoretest --kinship

NOTE!

If the dataset is not family-based, generalized score test will not be performed!

MFQLS [top]

For family-based design, MFQLS can be utilized under both presence and absence of population stratification. Under the presence of population stratification, exactly same statistics and codes for population-based samples under the presence of population stratification (see the association analysis under the presence of population stratification for example code). Difference of statistics and WISARD code for family-based design under presence and absence of population stratification is only the choice of relationship matrix, and the kinship coefficient matrix instead of ibs matrix should be used under the absence of population stratification.

MFQLS is an extended MQLS for multiple phenotypes and variants. MQLS can be applied to the dataset having multiple phenotype or multiple variant, such as gene set. Hence, WISARD supports a functionality for applying MQLS to such analysis.

NOTE!

When WISARD executed with --fqls and --mqls concurrently, multiple phenotype/variant cannot be used!

Example codes

MFQLS
Perform MFQLS C:\Users\WISARD> wisard --bed test_miss0.bed --mfqls --out res_mfqls
Adjusting covariate effects in MFQLS
Perform MFQLS with covariate adjustment C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --cname age,height --mfqls --out res_mfqls_cov

MFQLS with multiple phenotypes

Perform MFQLS with multiple phenotypes C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname sbp,dbp --mfqls --out res_mfqls_multi

Edit this page

Last modified : 2017-09-11 15:57:23