WISARD[wɪzərd]
Workbench for Integrated Superfast Association study with Related Data
HOME  |   DOWNLOAD  |   OPTIONS  |   TROUBLE?  |   LOGIN
 

Data storage

  • Related options : --data, --makevcf, --ref, --markercheck, --markerupdate

This section describes about

  • What is the data storage?
    • Set an arbitrary path for the data storage
      • Annotation check and update
        • Generating VCF file with reference sequence
          NOTE!
          This functionality is supported from version 1.1.0.8!

          What is the data storage? [top]

          For some analyses, WISARD requires a database called 'data storage'. The location of data storage can be anywhere which is accessible from the system. The default path of data storage is a directory named 'wisard_data', located in the same path of executable file.

          If some external data (reference sequence file, SNV annotation file, gene definition file, etc) is required in order to do some analysis, WISARD will first search it on the data storage of local server. If it is not exists on the data storage, WISARD will attempt to download required file(s) into the data storage.

          NOTE!
          If the data storage does not contains the required files, an internet connection and a permission to write down something in the data storage are required!

          Set an arbitrary path for the data storage [top]

          However, it could be inconvenient if WISARD is installed on the shared path which is not writable by normal user. For this situation, this path can be altered with --data option. In the below example,

          Altering WISARD storage C:\Users\WISARD> wisard --data ~/my_wisard_data --bed my_dataset --makevcf

          WISARD will refer data stroage in order to make VCF file, because VCF file encodes genotype as reference allele and alternative allele. By assigning --data ~/my_wisard_data, WISARD will search the reference sequence data in the path ~/my_wisard_data, not the default path.

          Annotation check and update [top]

          WISARD can refer the reference information of Single Nucleotide Variants (SNVs) and check or update the annotation of input dataset using the reference information.

          Since it requires the data storage interface of WISARD, the detail on data storage can be found in this page.

          In default, WISARD uses NCBI dbSNP database in BED format to annotate SNV information. Since those information are available for all of species which are supported in WISARD, an annotation of SNVs is possible regardless of species.

          Here we note that the result of annotation can be differ along with the timepoint, because the reference annotation data is continuously updated. The basic command to do this function is --markercheck. This command will behave with the following sequences.

          1. Find out which chromosomes are available in the input dataset. Markers with no chromosome will be excluded at this step.
          2. For the chromosomes have at least one variant, load its annotation information from the data storage of WISARD.
          3. Categorize variants in the input dataset into one of three. (1) The variant does not have corresponding rsID. (2) The variant already have its rsID but it is incorrect rsID. (3) The variant already have its rsID but there is no corresponding rsID. (3) The variant does not have its rsID, but there is corresponding one.
          4. While --markercheck option only generates the report for cases (2) ~ (4), --markerupdate option additionally annotates (2) ~ (4) to have correct rsID or variant ID which named as var_[chr]_[pos].

          Generating VCF file with reference sequence [top]

          As described in this page, WISARD can convert an input dataset into VCF file. However, unlike with other types of genetic data format, VCF contains more information about the reference allele and alternative allele. Because of this feature, a reference sequence is required to generate VCF file.

          Unless --ref is assigned to provide a reference sequence directly, WISARD attempts to acquire this information from the data storage, according to the species analyzed.



          Edit this page
          Last modified : 2014-08-29 11:16:09