Usage (C/C++ version)
HisCoM-Kernel is written by R and can be installed by the following steps.
HisCoM-Kernel software written by C/C++ can be run by WISARD program. To run the HisCoM-Kernel, you should first download the program from http://statgen.snu.ac.kr/wisard/?act=download.
As inputs, three data files are required: phenotype data file, omics data file, and pathway-biomarker (e.g. gene, metabolite) matching file. Example data with the three data files are included in the ‘ExampleC_Input’zip file.
Pathway-biomarker matching file (.set) |
Contains (1) pathway name and (2) biomarker name. It should be made in the same format as ‘ExampleC_pathway_matching.set’ file. |
Omics data file (.raw) |
Contains gene expression or metabolite level values. The format is same as the ‘ExampleC_data.raw’file. |
Phenotype data file (.txt) |
Contains (1) sample ID and (2) Phenotype. To add covariates to the model, this file should contain covariate information. The format is same as the ‘ExampleC_ID_pheno.txt’file. |
Pathway-biomarker matching file
Each line consists of two columns for pathway name and biomarker name, respectively. Here, biomarkers can be genes or metabolites. When reading the matching file, there should be no column names such as “ExampleC_pathway_matching.set” file.
Omics data file
The first six columns have a format consistent with PLINK’s covariate table. Specifically, FID, IID, PAT, MAT, SEX, and PHENOTYPE indicate family ID, individual ID, paternal individual ID, maternal individual ID, sex (1=male, 2=female), and binary outcome (control=1, case=2). Starting from the 7th column, biomarker (e.g. gene, metabolite) name should be included. Each cell represents the gene expression or metabolite level value for each individual ID.
Trait file
Each row consists of two columns for family ID, individual ID, phenotypic information, and covariates, respectively. The family ID and individual ID should match the ID column order in the omics data file. For phenotype (i.e. binary outcome), case = 2, control = 1. Covariates should be included starting with the 3rd column. The example below shows age and AFP as covariates.
If you have any pathways that contain only a single biomarker, these pathways should be included after covariates, as in the example below.
wisard --expression ExampleC_data.raw --pharaoh --hiscomkernel --sampvar ExampleC_ID_pheno.txt --pname PHENOTYPE --geneset ExampleC_pathway_matching --nperm 1000 --prolambda 500 –verbose –out test
Following the tutorial in Step 3, the outputs are as follows:
If there is error, please let us know about the running problem to solve the problem and improve our program.
Contact : suhyun8695@gmail.com