WISARD official site

Select O/S : [?]

Case tutorial

Manipulation

This section describes

Summary for manipulation
Transform phenotype
- Inverse normalization
- Standardization
Sorting dataset with sample/variant
Reordering dataset with specific order
Generate missing on the dataset
- Random genotype missing
- Nullify selected samples
- Nullify random samples
Updating sample/variant information
Change phenotype/genotype coding
Altering phenotype/genotype

Summary for manipulation [top]

WISARD provides various functionalities to handle/manipulate/update given dataset. Currently below functionalities are supported.

Transform phenotype
Sorting dataset with sample/variant
Reordering dataset with specific sequence
Updating sample/variant information
Dataset filtering (See here)
Dataset shrinkage
Convert dataset (See here)
Generate simulated dataset
Provide genotype information
Genotype clumping
Change phenotype/genotype coding
Altering phenotype/genotype
Exploring dataset
Invalid dataset coding corretion

Transform phenotype [top]

Input phenotype(s) can be transformed several ways.

Inverse normalization

With --invnorm, given phenotype(s) is/are inverse-normally transformed.

Perform regression analysis after inverse normalization C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname height --invnorm --regression --out res_HEIGHT_invnorm

Standardization

With --phenostdize, given phenotype(s) is/are standardized with its mean and variance.

Perform regression analysis after phenotype standardization C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname height --phenostdize --regression --out res_HEIGHT_stdize

Sorting dataset with sample/variant [top]

Sample or variants within given data can be automatically reordered with options below, or arbitrarily arranged in certain order. In case of automatic reordering, an argument either of asc (sort ascending order) or desc (sort descending order) is available.

Generate ascending-sorted dataset by IID C:\Users\WISARD> wisard --bed test_miss0_mixed.bed --sortiid asc --makebed --out sample_iid_sorted

Generate descending-sorted dataset by FID & IID C:\Users\WISARD> wisard --bed test_miss0_mixed.bed --sortsample desc --makebed --out sample_sample_sorted

Generate descending-sorted dataset by variant name C:\Users\WISARD> wisard --bed test_miss0_mixed.bed --sortvariant desc --makebed --out sample_variant_sorted

Generate ascending-sorted dataset by chromosome & position C:\Users\WISARD> wisard --bed test_miss0_mixed.bed --sortpos asc --makebed --out sample_pos_sorted

Since above ordering schemes cannot think like human, sometimes an unexpected result may be produced, as shown below.

Example 1 : FAM file of unsorted input

F2	S4	0	0	1	2
F1	S1	0	0	1	2
F10	S4	0	0	1	2
F1	S2	0	0	2	1
F1	S3	S1	S2	1	1
F10	S5	0	0	2	1
F10	S6	S4	S5	1	1

If user sorts the above dataset using --sortsample asc and want to order from F1, F2, and F10, it will be ordered as below.

Example 2 : FAM file, incorrectly sorted with F1, F10 and F2

F1	S1	0	0	1	2
F1	S2	0	0	2	1
F1	S3	S1	S2	1	1
F10	S4	0	0	1	2
F10	S5	0	0	2	1
F10	S6	S4	S5	1	1
F2	S4	0	0	1	2

It is because computer's sorting algorithm first compares characters in same position. Adding --natural will produce intended output.

Example 3 : FAM file, correctly sorted with F1, F2 and F10

F1	S1	0	0	1	2
F1	S2	0	0	2	1
F1	S3	S1	S2	1	1
F2	S4	0	0	1	2
F10	S4	0	0	1	2
F10	S5	0	0	2	1
F10	S6	S4	S5	1	1

Reordering dataset with specific order [top]

If it is required to reorder sample or variants with user-specific order, --sampleorder and --variantorderoption can be used. Note that those options require a path of the file that contains a list of FID & IID pair (--sampleorder) or variant name (--variantorder) for each line. Also those files must contain all samples or variants. An error will be raised otherwise.

Generate missing on the dataset [top]

Random genotype missing

A specified number or proportion of genotypes can be marked as missing with --nageno option. If an argument of--nageno is ranged between 0 and 1, it is considered a proportion. Other integer argument will be considered as the number of genotypes missing. If the argument is over the number of existing genotypes, it will be clamped.

Generate 20% of missing and export the result as BED format C:\Users\WISARD> wisard --bed test_miss0.bed --nageno 0.2 --makebed --out test_random_miss0.2

Nullify selected samples

In other way to generate genotype missing is using --nasamp, an option that nullifies listed samples. Here, nullifying means all genotypes of a sample is set to NA.

Nullifying genotypes of sample SAMP5_8 and generate BED C:\Users\WISARD> wisard --bed test_miss0.bed --nasamp SAMP5_8 --makebed --out res_SAMP5_8_null

Nullifying genotypes of samples listed in test_sample_list.txt and generate BED C:\Users\WISARD> wisard --bed test_miss0.bed --nasamp test_sample_list.txt --makebed --out res_LISTnull

Nullify random samples

Nullification of random samples is also provided with --randnasamp option. Similar to the parameter of --nageno, its parameter have two different meanings according to its value.

When the value is a real number and ranged from 0 to 1, it means the proportion against entire sample size.
When the value is a positive integer lower than the number of samples, it means an exact number of samples.

Randomly nullifying 10% of samples and generate BED C:\Users\WISARD> wisard --bed test_miss0.bed --randnasamp 0.1 --makebed --out res_null10p

Randomly nullifying thirty samples and generate BED C:\Users\WISARD> wisard --bed test_miss0.bed --randnasamp 30 --makebed --out res_null30

Updating sample/variant information [top]

Updating entire variant information

NOTE!

When using --updvariant, variant name matching will not be performed, will only base on its sequence!

Updating individual fields with variant name matching

NOTE!

--updpos/--updgdist/--updname/--updchr can be used in same time!

Updating allele information

With --updallele, it is possible to update alleles of variant(s) in the dataset. It should be accomplished with a file that contains an information of variants to be updated, original alleles of the variants and alleles to be updated, as below.

Example 4 : An example of input file for --updallele, namely upd.txt

rs192845	C	G	G	T
rs8123875	A	G	T	G
rs13853	C	T	T	C
rs9438342	G	T	A	C

For consistency of and integrity of --updallele, the command confirms the below conditions. However, variants that are not exist in --updallele file will keep their original alleles.

Whether the file contains exact number of columns as the above
Whether original alleles are identical to the alleles in --updallele file

Retrieve dataset and update alleles according to upd.txt, and export it C:\Users\WISARD> wisard --bed sample.bed --updallele upd.txt --makebed --out sample_conv

Change phenotype/genotype coding [top]

Do regression with coding 1=case,0=control C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname medi01 --1case --regression --out res_regr_1case

Do regression with coding 1=case,-1=control C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname medi1m1 --cact 1,-1 --regression --out res_regr_ca1_ct-1

Convert PED file into binary PED file, with an assumption of 1/2/3/4 -> A/C/G/T, respectively C:\Users\WISARD> wisard --ped test_miss0_1234.ped --acgt 1234 --makebed --out res_acgt_converted

Coding phenotype missing as NA when generating BED file (default -9) C:\Users\WISARD> wisard --bed test_miss0.bed --makebed --outmispheno NA --out res_misP_NA

Coding genotype missing as N when generating PED file (default 0) C:\Users\WISARD> wisard --bed test_miss2.bed --makeped --outmisgeno N --out res_misG_N

Altering phenotype/genotype [top]

Generate new BED file with altered phenotype SBP C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname sbp --makebed --out res_newpheno

Edit this page

Last modified : 2017-09-13 13:09:00