WISARD[wɪzərd]
Workbench for Integrated Superfast Association study with Related Data
HOME  |   DOWNLOAD  |   OPTIONS  |   TROUBLE?  |   LOGIN
 

Manipulation

This section describes

  • Summary for manipulation
    • Transform phenotype
      • Inverse normalization
      • Standardization
    • Sorting dataset with sample/variant
      • Reordering dataset with specific order
        • Generate missing on the dataset
          • Random genotype missing
          • Nullify selected samples
          • Nullify random samples
        • Updating sample/variant information
          • Change phenotype/genotype coding
            • Altering phenotype/genotype

              Summary for manipulation [top]

              WISARD provides various functionalities to handle/manipulate/update given dataset. Currently below functionalities are supported.

              • Transform phenotype
              • Sorting dataset with sample/variant
              • Reordering dataset with specific sequence
              • Updating sample/variant information
              • Dataset filtering (See here)
              • Dataset shrinkage
              • Convert dataset (See here)
              • Generate simulated dataset
              • Provide genotype information
              • Genotype clumping
              • Change phenotype/genotype coding
              • Altering phenotype/genotype
              • Exploring dataset
              • Invalid dataset coding corretion

              Transform phenotype [top]

              Input phenotype(s) can be transformed several ways.

              Inverse normalization

              With --invnorm, given phenotype(s) is/are inverse-normally transformed.

              Perform regression analysis after inverse normalization C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname height --invnorm --regression --out res_HEIGHT_invnorm

              Standardization

              With --phenostdize, given phenotype(s) is/are standardized with its mean and variance.

              Perform regression analysis after phenotype standardization C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname height --phenostdize --regression --out res_HEIGHT_stdize

              Sorting dataset with sample/variant [top]

              Sample or variants within given data can be automatically reordered with options below, or arbitrarily arranged in certain order. In case of automatic reordering, an argument either of asc (sort ascending order) or desc (sort descending order) is available.

              Generate ascending-sorted dataset by IID C:\Users\WISARD> wisard --bed test_miss0_mixed.bed --sortiid asc --makebed --out sample_iid_sorted
              Generate descending-sorted dataset by FID & IID C:\Users\WISARD> wisard --bed test_miss0_mixed.bed --sortsample desc --makebed --out sample_sample_sorted
              Generate descending-sorted dataset by variant name C:\Users\WISARD> wisard --bed test_miss0_mixed.bed --sortvariant desc --makebed --out sample_variant_sorted
              Generate ascending-sorted dataset by chromosome & position C:\Users\WISARD> wisard --bed test_miss0_mixed.bed --sortpos asc --makebed --out sample_pos_sorted

              Since above ordering schemes cannot think like human, sometimes an unexpected result may be produced, as shown below.

              Example 1 : FAM file of unsorted input
              F2 S4 0 0 1 2
              F1 S1 0 0 1 2
              F10 S4 0 0 1 2
              F1 S2 0 0 2 1
              F1 S3 S1 S2 1 1
              F10 S5 0 0 2 1
              F10 S6 S4 S5 1 1

              If user sorts the above dataset using --sortsample asc and want to order from F1, F2, and F10, it will be ordered as below.

              Example 2 : FAM file, incorrectly sorted with F1, F10 and F2
              F1 S1 0 0 1 2
              F1 S2 0 0 2 1
              F1 S3 S1 S2 1 1
              F10 S4 0 0 1 2
              F10 S5 0 0 2 1
              F10 S6 S4 S5 1 1
              F2 S4 0 0 1 2

              It is because computer's sorting algorithm first compares characters in same position. Adding --natural will produce intended output.

              Example 3 : FAM file, correctly sorted with F1, F2 and F10
              F1 S1 0 0 1 2
              F1 S2 0 0 2 1
              F1 S3 S1 S2 1 1
              F2 S4 0 0 1 2
              F10 S4 0 0 1 2
              F10 S5 0 0 2 1
              F10 S6 S4 S5 1 1

              Reordering dataset with specific order [top]

              If it is required to reorder sample or variants with user-specific order, --sampleorder and --variantorderoption can be used. Note that those options require a path of the file that contains a list of FID & IID pair (--sampleorder) or variant name (--variantorder) for each line. Also those files must contain all samples or variants. An error will be raised otherwise.

              Generate missing on the dataset [top]

              Random genotype missing

              A specified number or proportion of genotypes can be marked as missing with --nageno option. If an argument of--nageno is ranged between 0 and 1, it is considered a proportion. Other integer argument will be considered as the number of genotypes missing. If the argument is over the number of existing genotypes, it will be clamped.

              Generate 20% of missing and export the result as BED format C:\Users\WISARD> wisard --bed test_miss0.bed --nageno 0.2 --makebed --out test_random_miss0.2

              Nullify selected samples

              In other way to generate genotype missing is using --nasamp, an option that nullifies listed samples. Here, nullifying means all genotypes of a sample is set to NA.

              Nullifying genotypes of sample SAMP5_8 and generate BED C:\Users\WISARD> wisard --bed test_miss0.bed --nasamp SAMP5_8 --makebed --out res_SAMP5_8_null
              Nullifying genotypes of samples listed in test_sample_list.txt and generate BED C:\Users\WISARD> wisard --bed test_miss0.bed --nasamp test_sample_list.txt --makebed --out res_LISTnull

              Nullify random samples

              Nullification of random samples is also provided with --randnasamp option. Similar to the parameter of --nageno, its parameter have two different meanings according to its value.

              • When the value is a real number and ranged from 0 to 1, it means the proportion against entire sample size.
              • When the value is a positive integer lower than the number of samples, it means an exact number of samples.
              Randomly nullifying 10% of samples and generate BED C:\Users\WISARD> wisard --bed test_miss0.bed --randnasamp 0.1 --makebed --out res_null10p
              Randomly nullifying thirty samples and generate BED C:\Users\WISARD> wisard --bed test_miss0.bed --randnasamp 30 --makebed --out res_null30

              Updating sample/variant information [top]

            • Updating entire variant information
            • NOTE!
              When using --updvariant, variant name matching will not be performed, will only base on its sequence!
            • Updating individual fields with variant name matching
            • NOTE!
              --updpos/--updgdist/--updname/--updchr can be used in same time!
            • Updating allele information
            • With --updallele, it is possible to update alleles of variant(s) in the dataset. It should be accomplished with a file that contains an information of variants to be updated, original alleles of the variants and alleles to be updated, as below.

              Example 4 : An example of input file for --updallele, namely upd.txt
              rs192845 C G G T
              rs8123875 A G T G
              rs13853 C T T C
              rs9438342 G T A C

              For consistency of and integrity of --updallele, the command confirms the below conditions. However, variants that are not exist in --updallele file will keep their original alleles.

              • Whether the file contains exact number of columns as the above
              • Whether original alleles are identical to the alleles in --updallele file
              Retrieve dataset and update alleles according to upd.txt, and export it C:\Users\WISARD> wisard --bed sample.bed --updallele upd.txt --makebed --out sample_conv

              Change phenotype/genotype coding [top]

              Do regression with coding 1=case,0=control C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname medi01 --1case --regression --out res_regr_1case
              Do regression with coding 1=case,-1=control C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname medi1m1 --cact 1,-1 --regression --out res_regr_ca1_ct-1
              Convert PED file into binary PED file, with an assumption of 1/2/3/4 -> A/C/G/T, respectively C:\Users\WISARD> wisard --ped test_miss0_1234.ped --acgt 1234 --makebed --out res_acgt_converted
              Coding phenotype missing as NA when generating BED file (default -9) C:\Users\WISARD> wisard --bed test_miss0.bed --makebed --outmispheno NA --out res_misP_NA
              Coding genotype missing as N when generating PED file (default 0) C:\Users\WISARD> wisard --bed test_miss2.bed --makeped --outmisgeno N --out res_misG_N

              Altering phenotype/genotype [top]

              Generate new BED file with altered phenotype SBP C:\Users\WISARD> wisard --bed test_miss0.bed --sampvar test_miss0_phen.txt --pname sbp --makebed --out res_newpheno


              Edit this page
              Last modified : 2017-09-13 13:09:00