WISARD[wɪzərd]
Workbench for Integrated Superfast Association study with Related Data
HOME  |   DOWNLOAD  |   OPTIONS  |   TROUBLE?  |   LOGIN
 

File formats

This section describes about

  • Accepted data formats
    • PLINK PED file format
      • RAW file format
        • Long file format
          • TPED file format
            • Binary PED file format
              • VCF file format
                • Binary VCF file format

                  WISARD handles the above genotype data formats as listed below. According to the format, some additional options are allowed for more flexibility of the file. To see more detailed description about these additional options, please refer to the following section.

                  Accepted data formats [top]

                  WISARD accepts the below genetic/dosage/expression dataset formats.

                    Genetic dataset format
                  • PLINK PED format (.ped and .map)
                  • Binary PED format (.bed, .bim and .fam)
                  • Number-coded genotype format (.raw)
                  • Long genotype format (.lgen)
                  • Transposed PED format (.tped and .tfam)
                  • Variant Calling File format (.vcf)
                  • Binarized VCF format (.bcf)
                  • Other general(CSV/TSV) genotype format (.tsv, .csv or .txt)
                  • Dosage dataset format
                  • BEAGLE dosage format
                  • MaCH dosage format
                  • GEN dosage format
                  • Binary GEN dosage format
                  • Other general(CSV/TSV) dosage format (.tsv, .csv or .txt)
                  • Expression dataset format
                  • Gene Expression Omnibus(GEO) experiment format
                  • Other general(CSV/TSV) expression format (.tsv, .csv or .txt)

                  PLINK PED file format [top]

                  PED file format is one of the most widely used format for encoding large-scale genomic variant data. WISARD fully supports this format.

                  • Required files: .ped(genotype and pedigree structure) and .map(variant information)
                  • Related options: --ped, --map (if the name of MAP file is not same with corresponding PED file), --nomap (if there is no corresponding MAP file), --indel (if at least one variant is indel), --acgt (if want to convert specific allele coding into A/C/G/T coding), --1234 (if an allele coding is 1,2,3,4 and they are correspond to A,C,G,T), --sepallele (if there is a special delimeter for two alleles for a genotype), --consecallele (if there is no delimeter for a genotype) --nskip (if there is some header on the dataset file)
                  • Related options (FAM): --ignoreparent, --ignorefid, --nofid, --noparent, --nosex, --nopheno, --dupnaming

                  RAW file format [top]

                  Compared to PED or binary PED file format, RAW file format is advantageous because there is no need for more than one file. This type of file format includes family/genotype information and variant names in one file, but cannot include extra variant information.

                  Long file format [top]

                  Long file format (LGEN) is another type of plain text genotype file. WISARD fully supports this format.

                  • Required files: .lgen(genotype), .fam(pedigree structure) and .map(variant information)
                  • Related options: --lgen, --map (if the name of MAP file is not same with corresponding LGEN file), --fam (if the name of FAM file is not same with corresponding LGEN file), --indel (if at least one variant is indel), --acgt (if want to convert specific allele coding into A/C/G/T coding), --1234 (if an allele coding is 1,2,3,4 and they are correspond to A,C,G,T), --sepallele (if there is a special delimeter for two alleles for a genotype), --consecallele (if there is no delimeter for a genotype) --nskip (if there is some header on the dataset file)
                  • Related options (FAM): --ignoreparent, --ignorefid, --nofid, --noparent, --nosex, --nopheno, --dupnaming

                  TPED file format [top]

                  TPED (Transposed PED) file format is essentially transpose of PED file format. Since some of programs export is output as TPED file format, WISARD supports this format.

                  • Required files: .tped(genotype and variant information) and .fam(pedigree structure)
                  • Related options: --tped, --fam (if the name of TFAM file is not same with corresponding TPED file), --indel (if at least one variant is indel), --acgt (if want to convert specific allele coding into A/C/G/T coding), --1234 (if an allele coding is 1,2,3,4 and they are correspond to A,C,G,T), --sepallele (if there is a special delimeter for two alleles for a genotype), --consecallele (if there is no delimeter for a genotype) --nskip (if there is some header on the dataset file)
                  • Related options (FAM): --ignoreparent, --ignorefid, --nofid, --noparent, --nosex, --nopheno, --dupnaming

                  Binary PED file format [top]

                  Binary PED (BED) file format is a binarized format of PED file format. It is also widely used due to its spatial efficiency, about 1/8 lighter than PED file format. WISARD also fully supports this format, including SNP-first/individual-first format.

                  • Required files: .bed(binary-coded genotype), .fam(pedigree structure) and .bim(variant information)
                  • Related options: --bed, --fam (if the name of FAM file is not same with corresponding BED file), --bim (if the name of BIM file is not same with corresponding BED file), --nomap (if there is no corresponding MAP file), --indel (if at least one variant is indel)
                  • Related options (FAM): --ignoreparent, --ignorefid, --nofid, --noparent, --nosex, --nopheno, --dupnaming

                  VCF file format [top]

                  Variant Call Format (VCF) is at first introduced at 1000 Genomes project.
                  Even though this file format requires a huge amount of disk space, this format is frequently used due to its extendibility. WISARD also supports this format, but still have an input/output limit.

                  • Required files: .vcf(VCF file) and .fam(optional, pedigree structure)
                  • Related options: --vcf, --fam (If there is an additional FAM file for corresponding VCF file), --vcfqc (If want to retrieve the variants only the value of QC field is PASS), --acgt (if want to convert specific allele coding into A/C/G/T coding), --1234 (if an allele coding is 1,2,3,4 and they are correspond to A,C,G,T), --filqual (if want to retrieve variants only the value of QUAL field is NOT within given range), --incqual (if want to retrieve variants only the value of QUAL field IS within given range), --filgeno (if want to retrieve genotypes ONLY NOT satisfying specific condition), --incgeno (if want to retrieve genotypes ONLY satisfying specific condition)
                  • Related options (FAM): --ignoreparent, --ignorefid, --nofid, --noparent, --nosex, --nopheno, --dupnaming

                  NOTE!
                  WISARD only accepts bi-allelic variants, otherwise it will be skipped and will not incorported into analysis.
                  !!! Experimental function !!!

                  Binary VCF file format [top]



                  Edit this page
                  Last modified : 2017-05-26 09:58:38