A common problem in bioinformatics is how to extract informative features from biomolecular sequences, such as DNA and proteins, to feed into classification or regression models to achieve high accuracy. Traditional feature extraction was done by manual craft based on expert knowledge. Advances in data mining and machine learning techniques have enabled systematic and automatic ways of extracting features. In this talk, I will give a brief overview of such successful feature extraction methods in bioinformatics, including string kernels and deep learning. I will then introduce our works that overcome certain bottlenecks of these methods. I will show one classification application on polyadenylation motif prediction and one regression application on transcription factor-DNA binding affinity prediction.
Dr. Xin Gao is an associate professor of computer science at King Abdullah University of Science and Technology (KAUST), Saudi Arabia. He is also a PI in the Computational Bioscience Research Center at KAUST, an adjunct faculty member at David R. Cheriton School of Computer Science at University of Waterloo Canada and a Chair professor at Hangzhou Dianzi University. Prior to joining KAUST, he was a Lane Fellow at Carnegie Mellon University, U.S.. He earned his bachelor degree in Computer Science in 2004 from Tsinghua University, China, and his Ph.D. degree in Computer Science in 2009 from University of Waterloo, Canada.
Dr. Gao’s research interests are building computational models, developing machine learning techniques, and designing efficient and effective algorithms, with particular focus on applications to key open problems in structural biology, systems biology and synthetic biology. He has co-authored more than 100 research articles in the fields of bioinformatics and machine learning.