WISARD can convert retrieved dataset into many other formats.
Below are supported file formats from WISARD and their description.
In below example, represents their parents are missing, and means the sample is founder.
PLINK PED format
The option --makeped, generates .ped (genotype & pedigree, sex and single phenotype) and .map (marker info. without allele info.).
Binary PED format
The option --makebed, generates .bed (binary-coded genotype), .fam (pedigree, sex and phenotype), and .bim (marker info. with allele info.).
NOTE!
In default, BED file is generated with SNP-major format. It can be transposed as individual-major format with --sampmajor option.
Transposed PED format
The option --maketped, generates .tped (Marker info. and genotype) and .tfam (pedigree, sex and single phenotype).
Long file format
The option --makelgen, generates .lgen (FID, IID and genotype) and .map (marker info. without allele info.).
Number-coded format
Generates .raw (minor allele, number-coded genotype, pedigree, sex, and single phenotype) Available codings are additive (--makeraw), dominant (--makedom) and recessive (--makerec).
NOTE!
In default, this format includes header. It can be omitted with --outnoheader option!
NOTE!
In default, this format exports six mandatory columns (FID, IID, parental IID, sex, and default phenotype). Phenotype can be separately exported with --outphenoonly option!
Variant Calling Format (VCF)
!!! Experimental function !!!
The option --makevcf, generates .vcf (genotype and others) and .fam (pedigree, sex and phenotype). For details, see this page.
GEN file format
The option --makegen, generates .gen (allele info. and probability-coded genotype) and .sample (FID, IID and multiple phenotypes and covariates).
Binary GEN file format
With the option --makebgen, binary GEN format dataset is generated, made of .bgen (equivalent to .gen but binary-coded) and .sample (FID, IID and multiple phenotypes and covariates)
NOTE!
In default, genotype probability data of Binary GEN file is not zipped, but stored in computationally efficient form. In order to reduce the size of dataset, --zipbgen option can be used to shrink it!
In addition to widely used file format, following subsets can be generated also:
Phenotype file: First-line header for phenotype names and following phenotype value records equivalent to the number of samples in the dataset. (FID, IID and phenotypes)
Covariates file: Same as phenotype file but covariates value records.
Genotype file: A plain matrix file with n by p dimension, where n is number of samples and p is number of markers.
Sometimes, especially for the large-scale NGS dataset, the entire volume of dataset is too massive.
In such situation, splitting the dataset with specific criterion could allow more flexible dataset handling.
WISARD provides easy splitting functionality via --split option.
Above command produces chromosome-wise Binary PED file from test_miss0.bed.
Above command produces family-wise Binary PED file from test_miss0.bed.
WISARD supports multiple file loading, via --merge option.
Merging multiple files in WISARD assumes two components,
(1) Base dataset, assigned by ordinary input-related options, and
(2) Datasets to be merged with base dataset.
--merge option assigns an information of second component,
and let WISARD know that there is/are dataset(s) to be merged into.
In order to use this function, an argument should be assigned to --merge option.
It can be (1) sequence of paths divided by comma with no separator, or (2) a path of file containing multiple files to be merged.
In case of (1), only one dataset can be assigned, while (2) can define multiple datasets.
Every paths for --merge option must satisfy conditions below.
Every file path should be made of absolute path or appropriate relative path
Each dataset must be represented by a set of paths with given sequence
( Binary PED file format : bed, bim and fam )
( PED file format : ped and map )
( Long file format : lgen and map )
( Transposed PED file format : tped and tfam )
( VCF format : vcf and fam )
For the first file, its extension must be equivalent to requirement
If following paths of dataset have exactly same name but extension, it can be omitted
In case of (2), the contents of file must be set of lines, and each line must indicate single dataset
NOTE!
--merge does not export merged dataset itself like other options, unless any export-related option is assigned
In order to allow more flexible merging options, --mergemode can be further utilized.
Currently, below merging modes are possible. Note that this merging strategy only applies to the overlapped genotypes.
Consensus mode (default): If the genotype in the base dataset is missing, it is replaced. If the non-missing genotype or replaced genotype is not concordant with following genotype, it is marked as NA and never replaced.
Replace missing only: If the genotype in the dataset is missing, it is replaced. Otherwise do nothing.
Replace all non-missing: If the genotype in the dataset to be merged, it is replaced. Otherwise do nothing.
Do not replace at all: All data (any conflicting/missing/whatsoever overlap) is preserved.
Replace without condition: Any overlapping genotype will be replaced.
Below example comprises the difference among each merge mode.
While in the merge mode, any report for confliction/replacement is not made until assignment of --mergereport option, due to efficiency.