Step 2. Running PHARAOH

We provide a sample dataset to try the PHARAOH program.
The sample dataset can be downloaded from the below links.

pharaoh_sample (zip file)
pharaoh_sample (tar.gz file)

The sample dataset consists of five files: dataset.ped, dataset.map, gene.set, pathway.set and pheno.txt.

dataset.ped Contains (1)information of samples and (2)genotypes of the samples.
dataset.map Contains information of genetic variants.
gene.set Contains the mapping between the genetic variants and the gene.
pathway.set Contains the mapping between the genes and the pathway.
pheno.txt Contains the phenotypes and the covariates of the samples.

Let’s do simplest run using the PHARAOH program.

Preparation

After downloading the sample dataset, it is required to uncompress the downloaded files to the directory where the PHARAOH program was extracted.

Note: For Microsoft Windows, we strongly recommend to use other GUI-based archive software, such as BandiZip.

The following commands for Linux distributions will extract the sample dataset into the directory where the zipped file exists.

# For Linux distributions (zip)
pharaoh@JOB1:~/foo$ unzip pharaoh_sample.zip

# For Linux distributions (tar.gz)
pharaoh@JOB1:~/foo$ tar xvfz pharaoh_sample.tar.gz

Full-automatic running

In full-automatic mode, only two parameters that PHARAOH requires are

  • (--cv) Number of cross-validations to perform to determine an optimal lambda(λ) value.
  • (--nperm) Number of permutations to get the p-values. In default, 1000 is recommended value.

In this mode, the PHARAOH program first determines an optimal value of λ via cross-validation, and next run PHARAOH analysis.

# For Linux distributions
pharaoh@JOB1:~/foo$ ./pharaoh --pharaoh --cv 2 --nperm 1000 --ped dataset.ped --set gene.set --geneset pathway.set --out test_res

# For Microsoft Windows
C:\foo> pharaoh --pharaoh --cv 2 --nperm 1000 --ped dataset.ped --set gene.set --geneset pathway.set --out test_res

--pharaoh command let the PHARAOH program performs PHARAOH analysis.

--cv and --nperm are the parameter that are required to perform PHARAOH analysis.

--ped, --set and --geneset indicate the dataset, variants-gene mapping and genes-pathway mapping, respectively.

--out determines the prefix of results.

Results

The PHARAOH program generates the following files after the normal run.

[prefix].pharaoh.pathway.res The main result file.
[prefix].pharaoh.gene.res The gene-level result file.

The details of the output files are as follows.
Note that the columns with red color should be focused on the interpretation.

[prefix].pharaoh.pathway.res
Column name Description
PATHWAY Name of pathway
NPERM Number of performed permutations
NGENE Number of genes included in the pathway
NVARIANT Total number of collapsed rare variants in the pathway
P_PHARAOH Permutation p-value of the pathway
[prefix].pharaoh.gene.res
PATHWAY The pathway name that the gene is included
GENE Name of gene
NPERM Number of genes included in the pathway
NVARIANT Total number of collapsed rare variants in the gene
P_PHARAOH_GENE_LOCAL p-value of the gene computed within the pathway
P_PHARAOH_GENE_MARGINAL p-value of the gene computed marginally