WISARD official site

Select O/S : [?]

Case tutorial

Covariates

Related options : --sampvar, --cname, --fname, --baseline, --nosampvarhdr, --twincol, --probandcol, --pc2cov, --variant2cov, --makecov, --filcov, --inccov

This section describes about

Load covariates
Modify covariates
Assign covariates
- Assign multiple covariates
- Assign covariates from multiple files
Covariate types
- Factor-type covariate encoding
- Factor covariates and related options
More ways to make covariates
- Include specific variants as covariates
- Include Principal Component(PC)s to the analyses
- Include Gene-Environmental interaction(GxE) to the analyses
Export retrieved covariate(s) of actually loaded samples

Load covariates [top]

WISARD provides several ways to incorporate covariates to the analyses. In brief, two ways are possible.

Import covariates from the other file (using --sampvar, --cname and --fname)
Generate covariate from the input dataset (using --pca and --pc2cov, or --variant2cov)

For the detail of way to get covariates from file, see Phenotype section.

Modify covariates [top]

Covariates can be modified via expression. For example, below code remains age, but squaring height and make new covariate 'age', 'height^2'.

Modified sample variables C:\Users\WISARD> wisard --ped test_miss0.ped --sampvar test_miss0_phen.txt --cname "height^2,age" --makecov

NOTE!

Factor-type covariate cannot be modified!

Assign covariates [top]

Example 1 : A sample variable file named test_miss0_phen.txt

FID   IID     height weight age region ...
FAM_1 SAMP1_1 170    29     38  EUROPE ...
FAM_1 SAMP2_1 164    39     36  EUROPE ...
FAM_1 SAMP3_1 176    55     42  EUROPE ...
...
FAM_2 SAMP1_2 194    54     40  ASIA   ...
FAM_2 SAMP2_2 169    42     49  ASIA   ...
FAM_2 SAMP3_2 163    62     30  ASIA   ...
FAM_2 SAMP4_2 176    51     40  ASIA   ...
...

An assignment of covariates can be accomplished with both options: --sampvar and --cname. An example is shown below with above sample covariates.

Covariate assignment C:\Users\WISARD> wisard --ped test_miss0.ped --sampvar test_miss0_phen.txt --cname age

Above command will retrieve covariates from test_miss0_phen.txt and take AGE column as covariates.

Assign multiple covariates

If an assignment of multiple covariates is required, there are two ways to achieve it. First is assigning multiple column names with separator as comma (,) character. For example, below command will retrieve covariates from test_miss0_phen.txt and take height, weight and age as covariates.

Assignment of multiple covariates C:\Users\WISARD> wisard --ped test_miss0.ped --sampvar test_miss0_phen.txt --cname height,weight,age

NOTE!

There is no whitespace in separator because any whitespace will be accepted as distinguished parameter by command prompt!

Assigning multiple covariates by dash(-) directive C:\Users\WISARD> wisard --ped test_miss0.ped --sampvar test_miss0_phen.txt --cname height-age

NOTE!

The hyphen (-) character can be used as column name, but this sort of naming can produce error!

Assign covariates from multiple files

WISARD supports an assignment of covariates from multiple files. See below example.

Assignment of multiple covariate files C:\Users\WISARD> wisard --ped test_miss0.ped --sampvar test_miss0_phen.txt,test_miss0_phen2.txt --cname age,bmi,sbp

Above code loads two sample variable files test_miss0_phen.txt and test_miss0_phen2.txt, then load age and sbp from test_miss0_phen.txt and bmi from test_miss0_phen2.txt. Below are some precautions for loading multiple sample variable files.

Every column name MUST BE UNIQUE across all sample variable files, even for unused columns.
For the ranged covariates assignment, the variable names indicating 'start' and 'end' must exist in the same file.
All of constraints for sample variable file are applied to each of sample variable files assigned.
Under an assignment of --nosampvarhdr option, columns will be named from 'V1' sequentially. For example, if there are two sample variable files 'p1.val' and 'p2.val' with 7 columns and 8 columns, respectively. Then 7 columns in 'p1.val' will be named from V1 to V7, and 8 columns in 'p2.val' will be named from V8 to V15.
Any options related to covariates can be used regardless of its defined files.

NOTE!

Column names MUST be unique across sample variable files!

Covariate types [top]

WISARD supports two kinds of covariates: numeric type and factor type. During the process of covariate file, WISARD automatically determines the type of assigned covariates, and reports which types have determined against assigned covariates.

NOTE!

Please be careful to read the log file because sometimes WISARD could make wrong decision if there is some erroneous expression of numeric value or wrongly added extra characters!

Factor-type covariate encoding

There are two ways to retrieve covariate as factor, using --cname and --fname. When using --cname, covariate type is automatically determined by its composition. If specific covariate only consists of non-numerical literals, it is regarded as factor covariate. However, this way does not work in the case of retrieving numerical categorical covariate as factor. In this case, --fname is desired to assign that covariate as factor.

Factor covariates and related options

For the case of factor type covariate, WISARD apply contrast encoding, which requires a baseline. It is determined by the level that appeared at first for the given factor. If it is required to the specific selection of baseline; it is possible to adjust the baseline for the factor via --baseline option. However, some constraints are required to use --baseline option.

1. A parameter of --baseline option should be single or multiple entry, divided by comma (,) with no whitespaces.
2. Each entry in the parameter should be formed as [NAME]=[VALUE], where [NAME] is the name of column that is specified by --cname and [VALUE] is an existing value for given [NAME] column.

NOTE!

If all samples having specific [VALUE] are excluded by the assigned filters, WISARD will produce error since there is no sample that can be a baseline for assigned covariates.

3. All [NAME]s assigned in --baseline also should be assigned to --cname, otherwise WISARD will halt.
4. A baseline of any covariates assigned in --cname but not assigned in --baseline will be determined as default.

An encoded covariates by WISARD can be found as same format via --makecov option, as below example.

Factor covariate assignment and check their encoding C:\Users\WISARD> wisard --ped test_miss0.ped --sampvar test_miss0_phen.txt --makecov --cname region

As shown in below output that is produced by above command, covariate REGION assigned from --cname option was translated to factor-type covariate since it does not contains any numeric value. In addition, since the value of REGION that appears first time was Busan, it became the baseline of REGION. However, the baseline can be altered into Daegu if the option with parameter --baseline region=ASIA is added to the above command.

Factor covariate assignment and check their encoding C:\Users\WISARD> wisard --ped test_miss0.ped --sampvar test_miss0_phen.txt --makecov --cname region --baseline region=ASIA

Example 2 : A short example of encoding of factor-type covariates

FID     IID     region=AMERICA region=EUROPE
FAM_1   SAMP1_1 0              1
FAM_1   SAMP2_1 0              1
FAM_1   SAMP3_1 0              1
...
FAM_2   SAMP1_2 0              0
FAM_2   SAMP2_2 0              0
FAM_2   SAMP3_2 0              0
...

More ways to make covariates [top]

WISARD provides several ways to make covariates automatically. This section describes ways to utilize those functions.

Include specific variants as covariates

In particular case, a subset of variants can be incorporated to covariates as a form of conditional analysis. The code below shows an example.

Perform regression analysis with SNP30 as a covariate C:\Users\WISARD> wisard --ped test_miss0.ped --variant2cov SNP30 --regression

Above code will incorporate variant 'SNP30' as covariate, and perform regression analysis. If the variants are listed as a file, it also can be passed to WISARD, using this code:

Perform regression analysis with variants in 'test_variant_list.txt' as covariates C:\Users\WISARD> wisard --ped test_miss0.ped --variant2cov test_variant_list.txt --regression

In this case, test_variant_list.txt should contain single variant name per line.

Include Principal Component(PC)s to the analyses

WISARD performs Principal Component Analysis (PCA) with --pca option, but it will just be performed and make an output. However, it is possible to immediately incorporate those computed PCs into subsequent analyses, via --pc2cov. For example, below command will add 5 PCs as covariates into the regression analyses following PCA.

Include PC1 to PC3 to regression analysis C:\Users\WISARD> wisard --ped test_miss0.ped --pca --npc 3 --pc2cov --regression

Include Gene-Environmental interaction(GxE) to the analyses

For variant-level analyses can accept covariates, it is possible to add Gene-Environmental interaction (GxE) term as covariates. --gxe option will enable this functionality, and all the variant-level tests with covariates will be performed with GxE terms. Since this option incorporates all GxE interactions against all assigned covariates, it is possible to choose the covariates for GxE interactions via --gxecovs option.

Export retrieved covariate(s) of actually loaded samples [top]

Export covariates C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --cname age,age^2,height --filgind [0,0.95] --makecov

Edit this page

Last modified : 2017-09-13 13:13:05