Project Summary: Machine learning has been utilized in many applications from biomedical imaging to business analytics. Machine learning is stipulated to be a strong method for diagnostics and even for determining therapeutics in future as we move to personalized medicine. MegaR provides an unprecedented opportunity to develop machine learning models from metagenomic data available publicly as well as to perform classification of data samples based on the optimal model we developed.
Skills: R, R-shiny, Machine Learning (Scikit-Learn), Deep Learning (Keras, TensorFlow)
Paper: [BMC Bioinfo 2021],
Poster: [ACM-BCB 2020]
Project Summary: T-cells are vital to the adaptive immune system, recognizing pathogens through the T-cell receptor (TCR).
For each T-cell, the TCR loci undergoes genetic rearrangement which can act as a unique biomarker, cataloging immunological history.
iCAT is a user-friendly, graphical-interface software that analyses TCR-specific sequencing data from exposed (positive) or unexposed (negative) individuals to identify TCR sequences statistically associated with positive but not negative samples.
We are recently developing a novel method to analyze complex and large scale human datase using deep learning techniques.
Skills: R, R-shiny, Machine Learning (Scikit-Learn), Deep Learning (Keras)
Collaborator: Dr. Richard DiPaolo (Saint Louis Univ. School of Medicine)
Paper: [F1000 Research 2021],
[ACM-BCB 2020],
[Cell Reports 2018]
Project Summary: The advent of high-throughput DNA sequencing techniques (next-generation sequencing) has permitted very high quality de novo assemblies of genomes, but raise an issue of requiring large amounts of computer memory to resolve the large genome graphs.
To address these limitations, we present a novel algorithmic approach; Scalable Overlap-graph Reduction Algorithms (SORA).
SORA adapts string graph reduction algorithms for the genome assembly using a distributed computing platform.
The experimental results show that SORA can process a nearly one billion edge graph in a distributed cloud cluster as well as smaller graphs on a local cluster with a short turnaround time.
Skills: Apache Spark, GraphX, Amazon Cloud, Scala, Python, Shell
Paper:
[Human Genomics 2019],
[IEEE-BMBM 2018],
Poster:
[ACM-BCB 2017]
Project Summary: Gene Length-Dependent Expression Analysis Tool in Neuronal Cells
Skills: R, R-shiny, BioMart, GO analysis
Collaborator: Dr. Andrew Yoo, Washington Univ. School of Medicine
Paper: [Bioinformatics 2018]
Project Summary: Strain-level genome identification algorithm for biosurveillance using high-performance computing
Skills: C++, Python, MPI, OpenMP, Supercomputer
Paper:
[Bioinformatics 2015]
Project Summary: An overlap-graph de novo metagenome assembler
Skills: C++, Python
Paper:
[Bioinformatics 2014]