Biologically informed variational autoencoders allow predictive modeling of genetic and drug-induced perturbations

Presentation date: April 04, 2024

Presenter: Jun Sik Kim

PGEE-M

What is PGEE_M?

Penalized Generalized Estimating Equation of Multinomial Responses is a method for identifying important variables and estimation of their regression coefficient simultaneously for high-dimensional longitudinal multinomial responses.
For variable selection as well as for the estimation of high-dimensional longitudinal data, PGEE_M uses two non-convex penalties such as the SCAD and the MCP penalty.
To estimate model parameters, PGEE_M adopts an iterative algorithm, which combines with the minorization-maximization (MM) algorithm for the nonconvex penalty with the Fisher-scoring algorithm.
Detailed algorithm is described in the below original article: “Penalized generalized estimating equations approach to longitudinal data with multinomial responses”.
This PGEE_M software can only produce results for independent correlation structure.
To create the PGEE_M software, we used some part of code from the “PGEE” package (https://github.com/cran/PGEE) and the “multgee” package (https://github.com/AnestisTouloumis/multgee).

Sample Dataset

The sample dataset contains 500 samples, with each subject being evaluated at 4 different time points, a total of 100 covariates and the number of categories of response variable is 5.

Contact

The PGEE_M program has developed and maintained by

  • Md. Kamruzzaman (kzaman1@isrt.ac.bd) at Bioinformatics and Biostatistics Lab., Dept. of Statistics in Seoul National University.

Download

An example R script is linked to here
An example data is linked to here

AucPR

Introduction

This fold includes R source files for implementation of the numerical study in the manuscript submitted to Bioinformatics, 2014, titled “AucPR: An AUC-based approach using penalized regression for disease prediction with high-dimensional omics data.”.

Usage

The following source codes are included:

AucPR.R —— List all functions needed.

Simu_Setting.R —— Generate setting for simulations.

Case_Study.R ——- A simulation study and a real example study are considered.

mhsauc_tgdr.f90 —– The fortran code to implement Ma & Huang’s method (MSauc), and mhsauc_tgdr.dll is its dll version. We call it from R.

For detail, please see the R codes.

Download

You can download a zipped file contains source codes and this README from this link : Codes_Penalized_AUC

Q-Fish

Q-FISH (Quantification method based on Finding the Identical Spectral set for a Homogenous peptide) is to estimate the peptide’s abundance from its tandem mass spectrometry (MS/MS) spectra through the direct comparison of experimental spectra.

Quantification of protein expression by means of mass spectrometry (MS) has been introduced in various proteomics studies. In particular, two label-free quantification methods, namely spectral counting and spectra feature analysis, have been extensively demonstrated on a wide variety of proteomes. The cornerstone of both methods is peptide identification based on protein database search and subsequent estimation of peptide retention time. However, they may suffer from restrictive database searching and inaccurate estimation of the liquid chromatography (LC) retention time. Furthermore, conventional peptide identification method of the database searching or the spectral library searching algorithms such as SEQUEST or SpectraST can provide neither the best match nor high-scored matches, which are reliable indicators of protein targets. Lastly, peptides cannot be identified unless they have been previously generated and stored into the database or spectral libraries.
To overcome these limitations, we propose a novel method called Q-FISH to estimate the peptide’s abundance from its tandem mass spectrometry (MS/MS) spectra through the direct comparison of experimental spectra. Because our proposed Q-FISH method compares all possible pairs of experimental spectra, it is possible to identify modified as well as unknown peptides.

R code for Q-FISH : Q-FISH.R

Reference : Lee S, Kwon MS, Lee HJ, Paik YK, Tang H, Lee JK and Park T, Enhanced peptide quantification using spectral count clustering and cluster abundance, BMC Bioinformatics (2011), PMCID PMC3234305.

Bis-class

Bis-Class is a tool which is made for classifying methylation status from BS-seq data. This method works best especially when whole methylation level is low and coverage is also low. This method uses bayes classifier and local methylation information to improve sensitivity given that the error rate is controlled.

Our code and example data are below.

Calling_code : Bis-Class and other functions for calling.Code_manual: Manual for Bis-Classchrexam: Data of ranges of chromosomes
examdata: Data of C and T read counts and location information

Download program and dataset : Bis-class_updated

Reference : Huh I, Yang X, Park T and Yi SV, Bis-class: a new classification tool of methylation status using bayes classifier and local methylation information, BMC Genomics (2014), PMID 25037738.

Oct 7 2015: Program updated to ver 2