WISARD[wɪzərd]
Workbench for Integrated Superfast Association study with Related Data
HOME  |   DOWNLOAD  |   OPTIONS  |   TROUBLE?  |   LOGIN
 

Sample variable file

  • Related options : --sampvar, --cname, --fname, --baseline, --nosampvarhdr, --twincol, --probandcol, --filcov, --inccov, --makecov, --makepheno, --filpheno, --incpheno

In WISARD, the variables for each samples are considered sample variables, and maintained as a form of table. The categories of sample variables used in WISARD are:

  • Phenotype(s)
  • Quantitative covariates
  • Qualitative covariates
  • Sample-wise information (For filtering & annotation)

--sampvar option allows retrieval of sample variables into WISARD, but has no effect by itself.

Assign sample varaible C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt

Above code will not do anything. It must be used with other options to make some effect. Below options are related to --sampvar option.

  • --pname assigns sample variable(s) as phenotype(s) by its column name.
  • --cname assigns sample variable(s) as covariate(s) by its column names. Covariate type is automatically determined by whether it is number or string.
  • --fname assigns sample variable(s) as qualitative covariate(s) regardless of its type.
  • --baseline determines baseline value(s) for assigned qualitative covariate(s).
  • --makecov exports retrieved covariate(s) to file.
  • --filpheno, --incpheno excludes/includes samples satisfying specific condition(s).
  • --filcov, --inccov excludes/includes samples satisfying specific condition(s).

All columns in sample variable(s) uniquely distinguished with its name or sequence in the sample variable file. Let a sample variable 'var.txt' looks like below:

More details with working example [top]

Example 1 : A sample variable file named var.txt
FID IID AGE ONSET_AGE BMI REGION HDL PROBAND
f1 i11 47 28 19 Busan 82 0
f1 i12 50 32 33 Busan 66 0
f1 i13 23 21 28 Busan 42 1
f1 i14 22 22 38 Busan 53 0
f2 i21 55 48 22 Seoul 28 0
f2 i22 58 49 35 Seoul 95 0
f2 i23 35 38 27 Seoul 55 1
f3 i31 58 27 23 Daegu 20 0

To use the column named AGE or 3(Since it is third column in the dataset, except for 'FID' and 'IID') must be used in the argument of the option(s), to use AGE column in WISARD.

NOTE!
It is recommended to not use numbers (0~9) as the first character of column name!

As shown in above example, the covariate file consist of multiple rows with same number of columns, but the first row represents the name of columns. There are three restrictions for the right retrieval of covariates file into WISARD.

  1. If the sample variable file has header, the name of first and second column in the first line must be FID and IID, respectively. If there is no header, --nosampvarhdr option must be specified.
  2. All the column names and values must not include any whitespaces (space, tab, linefeed, return). Also, it is highly recommended to make all column names only with English ascii characters, number ascii characters, and underline (_). Non-ascii character and special characters might work, but it also might fail in some systems or under specific conditions.
  3. --cname can be any columns' names that exist in the first line of given covariates file, except PROBAND, TWIN and POP_GROUP since those names are reserved. This reservation can be changed via --fst/--popuniq, --probandcol and --twincol options.
NOTE!
The third constraint can be avoided via --fst/--popuniq, --twincol and --probandcol, which alternate the mandatory column names into assigned names!

Modify sample variable [top]

Covariates can be modified via expression. For example, below code remains age, but squaring height and make new covariate 'age', 'height^2'.

Modified sample variables C:\Users\WISARD> wisard --ped test_miss0.ped --sampvar test_miss0_phen.txt --cname "height^2,age" --makecov
NOTE!
Factor-type covariate cannot be modified!

File with no header [top]

WISARD also accepts the sample variable with no header, via --nosampvarhdr option. Note that the column name of sample variable file should be defined from V1 to Vn, where n is a number of columns in the sample variable file.

Perform analysis with the first column as phenotype, and second to fifth columns as covariates C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen_noheader.txt --nosampvarhdr --pname V1 --cname V2-V4
NOTE!
When performing an analysis with multiple sample variable files, all of sample variable files must be one of the two; HAVE header or NOT!

Column name recommendations [top]

WISARD have some reserved column names, which cannot be directly used for usual column names.

  • PROBAND is reserved for the proband information, but can be altered with --probandcol option.
  • TWIN is reserved for the notation of twin relationship, but can be altered with --twincol option.
  • POP_GROUP is reserved for an assignment of population group, which is used in --fst and --popuniq. --fst or --popuniq should be used with parameter to altering the column name of population group.

Assign multiple files [top]

Multiple sample variable files are allowed in WISARD, like --sampvar var1.txt,var2.txt. In default, any sample variable file must satisfy below conditions.

  • The first row must be a header describes representative name for each column. --nosampvarhdr option allows no header, then the columns will be named from V1, V2, ... by its file sequence.
  • The first and second column must stand for Family ID and Individual ID, respectively. And their name must be FID and IID, respectively.
  • Each column name must be unique. In addition, same column name in multiple sample variable files is also not allowed.
  • Samples in the analyzed dataset must be a subset of sample variable file, but its sequence is irrelevant. If there is a sample without sample variable, WISARD will halt.
  • The first character of column name is recommended to be an English ascii character. Otherwise WISARD might behave unexpected way.
  • Below column names must not be used as column name since they are reserved, but can be used if there is a way to specify other name.
    1. FID, cannot be replaced.
    2. IID, cannot be replaced.
    3. TWIN means a unique number for each twin members, and samples having same twin ID should place in same nuclear family. It can be replaced with --twincol option.
    4. PROBAND means a status of whether the sample is proband (1) or not (0). It can be replaced with --probandcol option.
    5. POP_GROUP means the population group of the sample included. It can be replaced with --fst or --popuniq with parameter.

Phenotypes

WISARD can

  1. Assign phenotype when there are no default phenotype
  2. Alter phenotype instead of default phenotype
  3. Ignoring all phenotypes and generating random phenotype

In order to perform (1) and (2), loading sample variable and selecting which one should be phenotype are required. See the below example.

(1) Perform an analysis by assigning phenotype for no default phenotype case C:\Users\WISARD> wisard --vcf test_miss0.vcf --fam test_miss0.fam --sampvar test_miss0_phen.txt --pname t2d
(2) Perform an analysis by altering default phenotype C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname height

This altered phenotype will be incorporated as default phenotype when exporting dataset. However, only the first phenotype will be incorporated when multiple phenotypes were assigned.

Perform analysis with phenotypes- T2D and HEIGHT- and generate BED file with T2D as default phenotype C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname t2d,height --makebed --out res

Above example produces 'res.bed/res.bim/res.fam' with phenotype T2D. In other way, phenotype can be randomly generated for some purposes, such as simulation or validation. WISARD provides both of dichotomous (--randbinpheno) and continuous phenotype (--randpheno, assumes standard normal distribution) generation.

(3) Perform an analysis with randomly generated binary phenotype and genotypes from sample.vcf C:\Users\WISARD> wisard --vcf test_miss0.vcf --randbinpheno

Export retrieved phenotype(s) of actually loaded samples [top]

Example C:\Users\WISARD> wisard --bed test_miss2.bed --sampvar test_miss0_phen.txt --pname t2d,height --filgind [0,0.8] --makepheno


Edit this page
Last modified : 2017-09-13 13:10:37