Genomics & Bioinformatics Projects

Genomic Variant Calling Pipeline

Genomic Variant Calling Pipeline

Engineered a Snakemake pipeline automating variant calling from raw sequencing data, achieving 92.46% alignment efficiency and optimized performance through parallelization.

SnakemakePythonRBowtie2SAMtoolsBCFtoolsConda
View Project
Mappability Analysis Pipeline

Mappability Analysis Pipeline

Built a comprehensive pipeline evaluating sequencing read mappability factors (read length, alignment modes, genome versions) using statistical analysis and visualizations.

SnakemakeBowtie2SamtoolsBedtoolsggplot2RConda
View Project
ChIP-seq Epigenetic Analysis

ChIP-seq Epigenetic Analysis

Identified and visualized genomic peaks and chromatin states with MACS2 and deepTools, using IGV for exploratory analysis, ensuring reproducibility with Conda and GitHub.

MACS2IGVdeepToolsRSnakemakeConda
View Project
AI-Driven Sepsis Prediction System

AI-Driven Sepsis Prediction System

Developed an onset sepsis prediction system using an AI model.

PythonRandom ForestLigthGBM3hrs EarlyDouble derivativesBackward data fillingOptimizing AUC/ROCHyperparameter tuningscikit-learnMatPlotLibJupyter/RPandasNumpy
View Project
Integrated Analysis of Lung Squamous Cancer Patient Data

Integrated Analysis of Lung Squamous Cancer Patient Data

Conducted a comprehensive analysis of lung squamous cancer patient data including clinical, RNA, and mutation data, performing PCA, clustering, differential expression analysis among other techniques, to infer impacts of several factors on survival.

RRmarkdownPrincipal Component AnalysisCLusteringDifferential Expression AnalysisVercel
View Project
AI-Driven Ischemic Heart Disease Prediction

AI-Driven Ischemic Heart Disease Prediction

Developed a TensorFlow-based neural network predicting ischemic heart disease risk from 320,000+ patient records, using SMOTE for imbalance correction and Keras Tuner for hyperparameter optimization, achieving improved sensitivity and robust performance.

TensorFlowKerasSMOTEKeras TunerTensorBoardPython
View Project
Differential Accessibility Analysis

Differential Accessibility Analysis

Performed statistical analysis and identified genomic regions significantly affected by pharmacological treatments with edgeR, using tidyverse and ggplot2 for visualization.

RedgeRggplot2GenomicRangesGREATRMarkdownGitHub
View Project
Gene Expression Prediction from Sequences

Gene Expression Prediction from Sequences

Developed and optimized supervised machine learning models (TensorFlow/Keras) to predict gene expression levels from promoter sequences, incorporating clustering analyses (PCA, t-SNE).

PythonTensorFlowKerasscikit-learnPCAt-SNEggplot2
View Project
GWAS and Polygenic Risk Score Analysis

GWAS and Polygenic Risk Score Analysis

Conducted genome-wide association studies and computed polygenic risk scores (PRS), integrating PCA for ancestry analysis and ensuring reproducibility through GitHub and Conda environments.

PLINKRGWASPRSPCAGitHubConda
View Project