We provide a sample dataset to try the PHARAOH program.
The sample dataset can be downloaded from the below links.
pharaoh_sample (zip file)
pharaoh_sample (tar.gz file)
The sample dataset consists of five files: dataset.ped, dataset.map, gene.set, pathway.set and pheno.txt.
dataset.ped | Contains (1)information of samples and (2)genotypes of the samples. |
dataset.map | Contains information of genetic variants. |
gene.set | Contains the mapping between the genetic variants and the gene. |
pathway.set | Contains the mapping between the genes and the pathway. |
pheno.txt | Contains the phenotypes and the covariates of the samples. |
Let’s do simplest run using the PHARAOH program.
Preparation
After downloading the sample dataset, it is required to uncompress the downloaded files to the directory where the PHARAOH program was extracted.
Note: For Microsoft Windows, we strongly recommend to use other GUI-based archive software, such as BandiZip.
The following commands for Linux distributions will extract the sample dataset into the directory where the zipped file exists.
# For Linux distributions (zip) pharaoh@JOB1:~/foo$ unzip pharaoh_sample.zip # For Linux distributions (tar.gz) pharaoh@JOB1:~/foo$ tar xvfz pharaoh_sample.tar.gz
Full-automatic running
In full-automatic mode, only two parameters that PHARAOH requires are
- (--cv) Number of cross-validations to perform to determine an optimal lambda(λ) value.
- (--nperm) Number of permutations to get the p-values. In default, 1000 is recommended value.
In this mode, the PHARAOH program first determines an optimal value of λ via cross-validation, and next run PHARAOH analysis.
# For Linux distributions pharaoh@JOB1:~/foo$ ./pharaoh --pharaoh --cv 2 --nperm 1000 --ped dataset.ped --set gene.set --geneset pathway.set --out test_res # For Microsoft Windows C:\foo> pharaoh --pharaoh --cv 2 --nperm 1000 --ped dataset.ped --set gene.set --geneset pathway.set --out test_res--pharaoh command let the PHARAOH program performs PHARAOH analysis.
--cv and --nperm are the parameter that are required to perform PHARAOH analysis.
--ped, --set and --geneset indicate the dataset, variants-gene mapping and genes-pathway mapping, respectively.
--out determines the prefix of results.
Results
The PHARAOH program generates the following files after the normal run.
[prefix].pharaoh.pathway.res The main result file. [prefix].pharaoh.gene.res The gene-level result file. The details of the output files are as follows.
Note that the columns with red color should be focused on the interpretation.
[prefix].pharaoh.pathway.res Column name Description PATHWAY Name of pathway NPERM Number of performed permutations NGENE Number of genes included in the pathway NVARIANT Total number of collapsed rare variants in the pathway P_PHARAOH Permutation p-value of the pathway
[prefix].pharaoh.gene.res PATHWAY The pathway name that the gene is included GENE Name of gene NPERM Number of genes included in the pathway NVARIANT Total number of collapsed rare variants in the gene P_PHARAOH_GENE_LOCAL p-value of the gene computed within the pathway P_PHARAOH_GENE_MARGINAL p-value of the gene computed marginally