WISARD[wɪzərd]
Workbench for Integrated Superfast Association study with Related Data
HOME  |   DOWNLOAD  |   OPTIONS  |   TROUBLE?  |   LOGIN
 

Covariates

  • Related options : --sampvar, --cname, --fname, --baseline, --nosampvarhdr, --twincol, --probandcol, --pc2cov, --variant2cov, --makecov, --filcov, --inccov

This section describes about

  • Load covariates
    • Modify covariates
      • Assign covariates
        • Assign multiple covariates
        • Assign covariates from multiple files
      • Covariate types
        • Factor-type covariate encoding
        • Factor covariates and related options
      • More ways to make covariates
        • Include specific variants as covariates
        • Include Principal Component(PC)s to the analyses
        • Include Gene-Environmental interaction(GxE) to the analyses
      • Export retrieved covariate(s) of actually loaded samples

        Load covariates [top]

        WISARD provides several ways to incorporate covariates to the analyses. In brief, two ways are possible.

        1. Import covariates from the other file (using --sampvar, --cname and --fname)
        2. Generate covariate from the input dataset (using --pca and --pc2cov, or --variant2cov)

        For the detail of way to get covariates from file, see Phenotype section.

        Modify covariates [top]

        Covariates can be modified via expression. For example, below code remains age, but squaring height and make new covariate 'age', 'height^2'.

        Modified sample variables C:\Users\WISARD> wisard --ped test_miss0.ped --sampvar test_miss0_phen.txt --cname "height^2,age" --makecov
        NOTE!
        Factor-type covariate cannot be modified!

        Assign covariates [top]

        Example 1 : A sample variable file named test_miss0_phen.txt
        FID IID height weight age region ...
        FAM_1 SAMP1_1 170 29 38 EUROPE ...
        FAM_1 SAMP2_1 164 39 36 EUROPE ...
        FAM_1 SAMP3_1 176 55 42 EUROPE ...
        ...
        FAM_2 SAMP1_2 194 54 40 ASIA ...
        FAM_2 SAMP2_2 169 42 49 ASIA ...
        FAM_2 SAMP3_2 163 62 30 ASIA ...
        FAM_2 SAMP4_2 176 51 40 ASIA ...
        ...

        An assignment of covariates can be accomplished with both options: --sampvar and --cname. An example is shown below with above sample covariates.

        Covariate assignment C:\Users\WISARD> wisard --ped test_miss0.ped --sampvar test_miss0_phen.txt --cname age

        Above command will retrieve covariates from test_miss0_phen.txt and take AGE column as covariates.

        Assign multiple covariates

        If an assignment of multiple covariates is required, there are two ways to achieve it. First is assigning multiple column names with separator as comma (,) character. For example, below command will retrieve covariates from test_miss0_phen.txt and take height, weight and age as covariates.

        Assignment of multiple covariates C:\Users\WISARD> wisard --ped test_miss0.ped --sampvar test_miss0_phen.txt --cname height,weight,age
        NOTE!
        There is no whitespace in separator because any whitespace will be accepted as distinguished parameter by command prompt!

        Assigning multiple covariates by dash(-) directive C:\Users\WISARD> wisard --ped test_miss0.ped --sampvar test_miss0_phen.txt --cname height-age
        NOTE!
        The hyphen (-) character can be used as column name, but this sort of naming can produce error!

        Assign covariates from multiple files

        WISARD supports an assignment of covariates from multiple files. See below example.

        Assignment of multiple covariate files C:\Users\WISARD> wisard --ped test_miss0.ped --sampvar test_miss0_phen.txt,test_miss0_phen2.txt --cname age,bmi,sbp
        Above code loads two sample variable files test_miss0_phen.txt and test_miss0_phen2.txt, then load age and sbp from test_miss0_phen.txt and bmi from test_miss0_phen2.txt. Below are some precautions for loading multiple sample variable files.

        1. Every column name MUST BE UNIQUE across all sample variable files, even for unused columns.
        2. For the ranged covariates assignment, the variable names indicating 'start' and 'end' must exist in the same file.
        3. All of constraints for sample variable file are applied to each of sample variable files assigned.
        4. Under an assignment of --nosampvarhdr option, columns will be named from 'V1' sequentially. For example, if there are two sample variable files 'p1.val' and 'p2.val' with 7 columns and 8 columns, respectively. Then 7 columns in 'p1.val' will be named from V1 to V7, and 8 columns in 'p2.val' will be named from V8 to V15.
        5. Any options related to covariates can be used regardless of its defined files.
        NOTE!
        Column names MUST be unique across sample variable files!

        Covariate types [top]

        WISARD supports two kinds of covariates: numeric type and factor type. During the process of covariate file, WISARD automatically determines the type of assigned covariates, and reports which types have determined against assigned covariates.

        NOTE!
        Please be careful to read the log file because sometimes WISARD could make wrong decision if there is some erroneous expression of numeric value or wrongly added extra characters!

        Factor-type covariate encoding

        There are two ways to retrieve covariate as factor, using --cname and --fname. When using --cname, covariate type is automatically determined by its composition. If specific covariate only consists of non-numerical literals, it is regarded as factor covariate. However, this way does not work in the case of retrieving numerical categorical covariate as factor. In this case, --fname is desired to assign that covariate as factor.

        Factor covariates and related options

        For the case of factor type covariate, WISARD apply contrast encoding, which requires a baseline. It is determined by the level that appeared at first for the given factor. If it is required to the specific selection of baseline; it is possible to adjust the baseline for the factor via --baseline option. However, some constraints are required to use --baseline option.

        1. 1. A parameter of --baseline option should be single or multiple entry, divided by comma (,) with no whitespaces.
        2. 2. Each entry in the parameter should be formed as [NAME]=[VALUE], where [NAME] is the name of column that is specified by --cname and [VALUE] is an existing value for given [NAME] column.
        3. NOTE!
          If all samples having specific [VALUE] are excluded by the assigned filters, WISARD will produce error since there is no sample that can be a baseline for assigned covariates.
        4. 3. All [NAME]s assigned in --baseline also should be assigned to --cname, otherwise WISARD will halt.
        5. 4. A baseline of any covariates assigned in --cname but not assigned in --baseline will be determined as default.

        An encoded covariates by WISARD can be found as same format via --makecov option, as below example.

        Factor covariate assignment and check their encoding C:\Users\WISARD> wisard --ped test_miss0.ped --sampvar test_miss0_phen.txt --makecov --cname region

        As shown in below output that is produced by above command, covariate REGION assigned from --cname option was translated to factor-type covariate since it does not contains any numeric value. In addition, since the value of REGION that appears first time was Busan, it became the baseline of REGION. However, the baseline can be altered into Daegu if the option with parameter --baseline region=ASIA is added to the above command.

        Factor covariate assignment and check their encoding C:\Users\WISARD> wisard --ped test_miss0.ped --sampvar test_miss0_phen.txt --makecov --cname region --baseline region=ASIA
        Example 2 : A short example of encoding of factor-type covariates
        FID IID region=AMERICA region=EUROPE
        FAM_1 SAMP1_1 0 1
        FAM_1 SAMP2_1 0 1
        FAM_1 SAMP3_1 0 1
        ...
        FAM_2 SAMP1_2 0 0
        FAM_2 SAMP2_2 0 0
        FAM_2 SAMP3_2 0 0
        ...

        More ways to make covariates [top]

        WISARD provides several ways to make covariates automatically. This section describes ways to utilize those functions.

        Include specific variants as covariates

        In particular case, a subset of variants can be incorporated to covariates as a form of conditional analysis. The code below shows an example.

        Perform regression analysis with SNP30 as a covariate C:\Users\WISARD> wisard --ped test_miss0.ped --variant2cov SNP30 --regression

        Above code will incorporate variant 'SNP30' as covariate, and perform regression analysis. If the variants are listed as a file, it also can be passed to WISARD, using this code:

        Perform regression analysis with variants in 'test_variant_list.txt' as covariates C:\Users\WISARD> wisard --ped test_miss0.ped --variant2cov test_variant_list.txt --regression

        In this case, test_variant_list.txt should contain single variant name per line.

        Include Principal Component(PC)s to the analyses

        WISARD performs Principal Component Analysis (PCA) with --pca option, but it will just be performed and make an output. However, it is possible to immediately incorporate those computed PCs into subsequent analyses, via --pc2cov. For example, below command will add 5 PCs as covariates into the regression analyses following PCA.

        Include PC1 to PC3 to regression analysis C:\Users\WISARD> wisard --ped test_miss0.ped --pca --npc 3 --pc2cov --regression

        Include Gene-Environmental interaction(GxE) to the analyses

        For variant-level analyses can accept covariates, it is possible to add Gene-Environmental interaction (GxE) term as covariates. --gxe option will enable this functionality, and all the variant-level tests with covariates will be performed with GxE terms. Since this option incorporates all GxE interactions against all assigned covariates, it is possible to choose the covariates for GxE interactions via --gxecovs option.

        Export retrieved covariate(s) of actually loaded samples [top]

        Export covariates C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --cname age,age^2,height --filgind [0,0.95] --makecov


        Edit this page
        Last modified : 2017-09-13 13:13:05