Data Clustering Algorithms on the GPU: Challenges and Benefits

Wednesday, October 6, 2010 - 16:30
TH 331
Kai Kohlhoff, Ph.D. (Stanford)
Data clustering techniques are an essential component of most data analysis toolkits. With the availability of simplified APIs for using graphics processors as general purpose computing platforms, speed-ups of one to two orders of magnitude for certain types of algorithms can be achieved. While an initial, correct implementation of any given algorithm in GPU-specific code may closely follow an equivalent CPU code, achieving significant speed-ups often requires a complete rethinking of the underlying design decisions. This in turn necessitates a thorough understanding of the challenges involved in programming such massively parallel architectures. Focusing on Nvidias graphics processors and the CUDA compute platform the talk will include a brief introduction to GPU computing before providing an overview of the design considerations and challenges that have to be addressed when writing highly optimized code. The benefits will be discussed by the specific example of a new open-source data clustering library that is being developed jointly by groups at Stanford University and SFSU. This discussion will center around examples such as hierarchical clustering, which is parallelizable only to a certain extent, as well as parallel K-means, an algorithm that is capable of delivering up to two orders of magnitude speed-up for large data sets when compared to serial CPU code.

Dr. Kai Kohlhoff is a Simbios Distinguished Postdoctoral Fellow in Stanford's Bioengineering Department. He is working with Prof. Vijay Pande and Prof. Russ Altman on physics-based simulation of biological systems. Apart from running Molecular Dynamics simulations on proteins that act as receptors in the cell-membrane, one of his key interests is the development of new algorithms that make use of the massively parallel architecture of GPUs. This interest was first sparked by an undergraduate research project during which he wrote assembly code for running particle simulations on the vector processors of a Playstation 2. In collaboration with the Hsu group at SFSU he is working on creating the open-source Campaign data clustering library.

After receiving first degrees in computer science, bioinformatics, and biology from Jacobs University Bremen in Germany, Dr. Kohlhoff completed a Master's in computational biology at the Department of Applied Mathematics and Theoretical Physics at the University of Cambridge, UK. While doing his PhD at the same university he developed a new method to incorporate experimental data from NMR spectroscopy in Molecular Dynamics simulations. His interests include computer graphics, software engineering, analytical geometry, genetics, and the science of aging.