Dr. Kwoh Chee Keong, PBM

PhD, DIC, MSc(ISE), Beng(EE), PGDIG,

Senior Member, IEEE

Senior Member IES

Life Member ICAAS

Member AMBIS

School of Computer Science and Engineering

Block N4, Level 2, Section A, Room 29 Nanyang Avenue, Singapore 639798

T: +65 6790 6057 (SCSE-CoE)

F: +65 6792 6559

E: asckkwoh@ntu.edu.sg

W: https://personal.ntu.edu.sg/asckkwoh/

I think the nicest, most sincere compliments that I have received are those from my students and people I did not expect.

Notes from Students and Friends

HONORS AND AWARDS

National Day Awards

Public Service Medal (National Day Award)

National Day Awards 2008, The Public Service Medal (PBM) http://www.pmo.gov.sg/NationalHonoursandAwards/Pingat+Bakti+Masyarakat+(The+Public+Service+Medal).htm , conferred by the President of Singapore, Mr S R Nathan

Ministry of Education Long Service Medal (National Day Award) 2016

Other Awards (NTU)

Best Faculty Mentor Award from Temasek Foundation (TF) LEARN 2014

Best Faculty Mentor Award from Temasek Foundation (TF) LEARN 2013

OPPORTUNITY

I am looking for a versatile, highly motivated Research Fellow/Pos-doc PhD candidates. The successful candidates will build on the ongoing research directed and will help define and explore this exciting area of research.

Applicants must have a strong background in Computer Science and/or closely related areas (e.g. Mathematics, Computer Science, Bioinformatics, Statistics and Physics) and excellent skills in both written and spoken English, as the working language of the Faculty is English.

For PhD application, please visit the Graduate Studies by Research at NTU before writing. Please note that PhD program is a very intensive program and the applicant must have a strong interest, strong analytical mind, technically sound in the area of data mining, learning theory, algorithms and computer programming. You must be highly independent with good initiatives and aspire to publish in top-tier journals. If you are interested and suitable,

Enquiries about these vacancies can be sent to asckkwoh@ntu.edu.sg (the deadlines are flexible) with your CV, your proposed research area with at least 3 references (either your own publications or papers that inspired you to do research).

RESEARCH EXPERTISE

My main interests lie in our desire to making sense of big heterogeneous data for real application in engineering, life science, and medical.

Data Science (Bioinformatics) Research

High throughput biological measurements and experiments in life science and healthcare have resulted in the explosion of data available from sequencing and micro-arrays, ChIP-Microarrays (ChIP-chip). This has led to the interdisciplinary science called Bioinformatics. Which use Data Science in solving biological and life science problems.

Machine Learning and Statistical Inference

Machine Learning and statistical modelling techniques that can learn from data to enables the making of the decision and simply a classification, this has application in almost every area. There are many approaches and each has its own merits. The following have been heavily used in my group: support vector machines, decision tree learning, artificial neural networks, Bayesian networks, genetic and mimetic algorithms. Example include application in supertype-specific HLA Class I binding peptides, protein-ligand binding affinity.

Learning with Unlabeled Data

In the context of machine learning, PU learning is a collection of semi-supervised techniques for training binary classifiers on positive (P) and unlabeled (U) examples. To improve performance, it is important to partition U into four sets, namely, reliable negative set RN, likely positive set LP, likely negative set LN, and weak negative set WN. Such an approach has been used to identifying disease genes from the human genome is an important but challenging task in biomedical research.

Meta and Ensemble learning

Meta-learning is where automatic learning algorithms are applied to meta-data about machine learning experiments. The main goal is to use meta-data to understand how automatic learning can become flexible in solving different kinds of learning problems and enrich the knowledge discovered. Coupled with ensemble methods that that integrates results of multiple predictive methods into one system, these approach has found to be instrumental in improving predictive performance. Application of this approach has been widely used in big data such as bioinformatics and medical informatics. An example includes multiple kernel learning for heterogeneous data fusion and sparse learning in genome-wide association study (GWAS), and drug-target interaction prediction.

Ontology for Knowledge Representation

Ontology is a formal representation of a set of concepts within a domain and the relationships between those concepts. There have been major initiatives in medical and bioinformatics to standardize the representation of terminology and relationship across diseases, species and databases. The controlled vocabulary of terms and description of product characteristics are the main outcome to improve annotation for representation of meta-concept that greatly enhances analysis. Specifically, it has been extensively applied in complex mining, inferring gene-disease-phenotype association.

Computational Structural Biology

This is where Computational Science and Engineering finds its applications in understanding the molecular structure of biological macromolecules. Dealing with computational models and simulations, the field makes use of high-performance computing to solve complex and expensive problems in biology. The result includes the acceleration of docking and its application in the understanding of influenza and other viruses.

SOFTWARE

CovalentDock Cloud: a web server for automated covalent docking

Covalent binding is an important mechanism for many drugs to gain its function. We developed a computational algorithm to model this chemical event and extended it to a web server, the CovalentDock Cloud, to make it accessible directly online without any local installation and configuration. It provides a simple yet user-friendly web interface to perform covalent docking experiments and analysis online.

Software for Accelerating Autodock Vina

Quickvina: This project aims at accelerating Autodock Vina, a program for protein-ligand docking. The main idea is to skip some of the local searches which are not promising in finding a better solution.

Quick Vina 2 is a fast and accurate molecular docking tool, attained at accurately accelerating AutoDock Vina. It was tested against 195 protein-ligand complexes that compose the core set of the 2014 release of the PDBbind using default exhaustiveness level of 8, QVina 2 successfully attained up to 20.49-fold acceleration over Vina.

Software for Learning with Unlabeled Data

1. PUDI (2013) - a Positive-Unlabeled (PU) learning based method aiming to address the problem of disease gene identification

Software – for complexes

1. CACHET- Discovery of Protein Complexes with Core-Attachment Structures from TAP Data

2. COACH- COre-AttaCHment based Complex Mining

Software – for Computational Structural Biology

1. CovalentDock Cloud (2013) - This web server allows the researchers and scientists to perform protein-ligand covalent docking.

2. CovalentDock: Automated covalent docking with parameterized covalent linkage energy estimation and molecular geometry constrains

3. QuickVina: Accelerating AutoDock Vina Using Gradient-based Heuristics for Global Optimization

MY GRANTS

Current Active Grants (updated 2023)

· Host-pathogen protein-protein interaction approaches for predicting virulence

· The discovery of neutralizing antibodies for potential novel coronavirus through machine learning approaches

· Explainable AI for Multimodal Predictive Maintenance of Jet Engines with Smart HCI

· Hybrid Finite Element Method And Mixedlevel Coarse GrainingMolecular Dynamics Simulation

· Computational Virulence Model With Functional Information For Influenza Viruses

· Structural analysis and characterization of protein complexes

· Towards direct and rapid mapping of RNA modifications with nanopore sequencing

· Untangling cancer re-wiring: Pan-Cancer mapping of transcription factor driven dysregulatory hotspots using AlphaFold2 and integrative machine learning

· Investigating the regulation of 3D genome organization using machine learning

· Predict the solubility of proteins using machine learning

· Hodge Laplacian based deep learning models for drug design

· Challenge-Learn: Developing and Assessing an Andragogical Programme and System based on Co-Skilling to Enhance Employability and Learning

· Artificial intelligence for the prediction of alternative splicing from epigenomics and transcriptomics data in cancer

· Computational Systems Biology of Synthetic Lethality Towards New Cancer Medicine

List of Research Grants

· Untangling cancer re-wiring: Pan-Cancer mapping of transcription factor driven dysregulatory hotspots using AlphaFold2 and integrative machine learning

· Predict the solubility of proteins using machine learning

· Hodge Laplacian based deep learning models for drug design

· Challenge-Learn: Developing and Assessing an Andragogical Programme and System based on Co-Skilling to Enhance Employability and Learning

· Host-pathogen protein-protein interaction approaches for predicting virulence

· The discovery of neutralizing antibodies for potential novel coronavirus through machine learning approaches

· Artificial intelligence for the prediction of alternative splicing from epigenomics and transcriptomics data in cancer

· Structural analysis and characterization of protein complexes

· AI Enhanced Creativity In Education

· Hybrid Finite Element Method And Mixedlevel Coarse GrainingMolecular Dynamics Simulation

· Computational Virulence Model With Functional Information For Influenza Viruses

· Computational Systems Biology of Synthetic Lethality towards New Cancer Medicine

· CloudDock: Molecular Docking Platform on Cloud

· Methodological Investigation for Automatic Detection of Primary Angle Closure Condition (PAC) and PAC induced Glaucoma

· Bioinformatics Algorithms for Detecting Genetic and Epigenetic Determinants of Meiotic Recombination Hotspots from Genomic Data

· Core-Attachment based Mining Technique: to detect Protein Complexes and Protein-Small Molecule Interactions

· Core-Attachment based Mining for Protein Complexes & Small-molecule Interactions

· Improved Design via Evoltionary Algorithms

· The Protein Binding Hot Spots Are Water Free?

· Neural Systems Modeling with functional MRI

· Function MR Time-Series Analysis

· Augmented reality for prosthesis cup placement

· Cardiovascular & respiratory systems' signal simulation, processing and analysis for ICU, or and telemedicine applications.Computational Virulence Model with Functional Information for Influenza Viruses

· Protein binding hotspots are water-free?

· Analysis of Past DRG data for the study of LOS for better utilization of Hospital Resources

· Data Warehousing and Data Mining Analysis of Staphylococcus Aureus

· A novel approach for inter- to intra- network analysis of genetic diseases using high-throughput data

· Neural Systems modelling with functional MRI

· SCE incubator proposal for “Evolutionary and Complex Systems Lab”

· The Application of ultrasound-based augmented reality with the directional vacuum-assisted breast biopsy device in the treatment of breast cancer

· Distributed Diagnosis and Home Healthcare (D2H2)

· Development of a robotic semi-automated remote handling system for radioiodine dispensing

· Functional MR Time-Series Analysis

· Augmented Reality for Prosthesis Cup Placement

· Robotic Skull Based Surgery

· Cardiovascular and Respiratory Systems' Signal Simulation, Processing and Analysis for ICU, OR and Telemedicine Applications.

· Strategic research: Interventive augmented reality for medical applications.

· Surgeon Assistant Robot for a Selected urological disorder.

MY GRADUATE STUDENTS

· Tan Lai Heng

· Emadeldeen Ahmed Ibrahim Ahmed Eldele

· Zhang Yu

· Mohamed Ragab Mohamed Adam

· Lin Zhuoyi

· Hou Yubo

· Tjio Ci'en Gabriel

· Li Xinya

· Yin Rui

· Zhou Xinrui

· Ata Kircali Sezin

· Amr Ali Mokhtar Alhossary

· Aly Mohamed Alaaeldin Aly Ezzat

· Pradhan Mohan Rajan

· Pan Hong – DNA methylation biomarkers of personal disease risk (PhD, 2012)

· Luay Aswad - A molecular basis of the 5-gene breast tumour aggressiveness grading signature (AGS) and its network – PhD, (2012 -)

· Han Xu - Constructing the Semantic Web for Biomedical Literature (PhD, 2011 - )

· Ouyang Xuchang - Automated and Accelerated Covalent Docking and Covalent Virtual Screening (PhD, 2010–)

· Thidathip Wongsurawat - Computational Analysis and Prediction of Specific Genomic Regions Forming R-loop Structure and Chromosomal Variations Associated with Cancer - (PhD, 2015)

· Zhang Zhou - Knowledge Discovery In Post Genome-Wide Association Study For Glaucoma (PhD, 2015)

· Su Tran To Chinh - Improving the Discrimination of Near-Native Complexes for Protein Rigid Docking by Implementing Interfacial Water into Protein Interfaces (PhD, 2015)

· Yang Peng - Computational Approaches for Disease Gene Identification (PhD, 2014)

· Wu Min - Mining Protein Complexes From Protein Interaction Data (PhD, 2012)

· Zhang Tianyou - Contact Network Based Framework For Infectious Disease Interventions (PhD, 2015)

· Stephanus Daniel Handoko - Constrained-Oriented Refinement-Efficacious Memetic Algorithms for Efficient Optimization of Computationally-Expensive Problems (PhD, 2014)

· Adrianto Wirawan - Whole-Genome Discovery Of Transcriptional Regulator Binding Sites (PhD, 2011)

· Zhang Guanglan- Computational Epitope-Driven Vaccine Design (PhD, 2008)

· Zheng Yun- Design Of Gene Expression Networks From Microarray Data (PhD, 2006)

· Zhao Ying- Efficient Model And Feature Selection For SVM In Biomedical Data Analysis (M Eng, -2004)

· Zhao Jianhui- Human Animation from Motion Recognition, Analysis and Optimisation ( PhD, 2003)

· Chen Yintao - Image Processing For Ultrasound Guidance System In Breast Lump Operation (M Eng, 2002)

· Wang Yan - Image-Based Indexing And Retrieval Of Trademark Logos, (M Eng, 2001)

· Veena Mohan Bhajammanavar - Image Processing Of The Digital Mammogram For Segmentation And Characterization Of Microcalcifications, (M Eng, 2000)

· Misra Sabita - Time Series Analysis Of ECG For Detection Of Premature Ventricular Contraction (M Eng, 2000)

· Zou Qingsong - Object-Based Volume Visualisation For Medical Imaging (PhD, 2001)

TEACHING

Planned and lectured subjects in

CZ4032 Data Analytics and Mining (2014,15): Data Mining is an analytic process designed to explore big data in search of consistent patterns and/or systematic relationships between variables, and then to validate the findings by applying the algorithms to new data.
CE7411 Bioinformatics (2015): This course covers basic bioinformatics concepts, databases, tools and applications. Introduction: cell biology's central dogma, biological technologies for collecting and storing genomic sequence data; databases that store these data and strategies to extract information from them; Pairwise sequence alignment for assessment of similarity to infer homology; Fundamental of Scoring matrices to understand the assigned scores when performing alignment; The popular heuristic search tool - Basic Local Alignment Search Tool (BLAST) and advanced database searching; Multiple sequence alignment and phylogenetic trees to complete the coverage from genomic sequences. Functional genomics with the introduction to gene expression. Processes for microarray data analysis; Feature selection and classification for microarray data analysis. Protein families & proteomics; Protein structure and structural genomics; and Molecular evolution and phylogeny.
BI6123 Methods and Tools of Proteomics (2007): Proteomics study and identify protein structure, interactions of protein/protein and protein/DNA and biology of organisms. We will further introduce the newly developed technology for the quantitative analysis of protein expression and function on a genome-wide scale.
BI1602 and SC448 Introductory Bioinformatics (2005,06): Basic bioinformatics concepts. Databases, tools and applications.
BI1603 Computational Biology (2006) Introduce the applications of the techniques of computer science, applied mathematics, and statistics to address problems inspired by biology. Major computational techniques used in biology include Bayes, HMM, MI etc.
BG3011 Biocomputing (2005, 06): Introduction the new course of biocomputing for students in SCBE, the subject is first offered in July 2005; It covers Concepts; Bioinformatics databases; Sequence alignment; Phylogeny and protein structure prediction.
BI6104 Biostatistics: First offered in July 2003, this course equipped the students in MSc with Knowledge of statistics, experimental design and statistical learning.
Curriculum for MSc in Bioinformatics: From August 2001 to June 2002, I worked with Vice-Dean (Academic) SCE, Head, Natural Science of NIE, Vice-Dean (Academic) of SBS and Professor from MPE and EEE to structure the new MSc in Bioinformatics.
SC104 Mathematics I Fundamental of mathematics for Engineering include statistics and calculus
CE307 Computer Peripherals: In 1996-2000, re-design the course to include start-of-the-art techniques such as PRML, USB and Bluetooth.
M495 & M6524 Medical Assist Surgery (2000-2002): Co-planed and lectured the final year and MSc elective for Biomedical Engineering.
Digital Signal Processing (1992): Planed and lectured the final year elective for the computer engineering.

MY PHD THESIS

GRADUATE ADVISORS: Prof Duncan Fyfe Gillies - Professor of Biomedical Data Analysis, Department of Computing, Imperial College London

My PhD thesis Probabilistic Reasoning From Correlated Objective Data, University of London, Imperial College

PUBLICATIONS

From Google Scholar

https://scholar.google.com.sg/citations?hl=en&user=jVn0wDMAAAAJ&view_op=list_works&sortby=pubdate