Engineered a Snakemake pipeline automating variant calling from raw sequencing data, achieving 92.46% alignment efficiency and optimized performance through parallelization.
Built a comprehensive pipeline evaluating sequencing read mappability factors (read length, alignment modes, genome versions) using statistical analysis and visualizations.
Identified and visualized genomic peaks and chromatin states with MACS2 and deepTools, using IGV for exploratory analysis, ensuring reproducibility with Conda and GitHub.
Developed an onset sepsis prediction system using an AI model.
Conducted a comprehensive analysis of lung squamous cancer patient data including clinical, RNA, and mutation data, performing PCA, clustering, differential expression analysis among other techniques, to infer impacts of several factors on survival.
Developed a TensorFlow-based neural network predicting ischemic heart disease risk from 320,000+ patient records, using SMOTE for imbalance correction and Keras Tuner for hyperparameter optimization, achieving improved sensitivity and robust performance.
Performed statistical analysis and identified genomic regions significantly affected by pharmacological treatments with edgeR, using tidyverse and ggplot2 for visualization.
Developed and optimized supervised machine learning models (TensorFlow/Keras) to predict gene expression levels from promoter sequences, incorporating clustering analyses (PCA, t-SNE).
Conducted genome-wide association studies and computed polygenic risk scores (PRS), integrating PCA for ancestry analysis and ensuring reproducibility through GitHub and Conda environments.