GEM: A Pattern Recognition Approach to Functional Genomics
The functional state of an organism is determined largely by the pattern of expression of its genes. The analysis of gene expression data from gene chips has primarily revolved around clustering and classification of the data using machine learning techniques based on the intensity of expression alone with the time-varying pattern mostly ignored. In this talk, I will describe a pattern recognition-based approach to capturing similarity in genes based on finding salient changes in the time-varying expression patterns of genes. Such changes can give clues about important events, such as gene regulation by cell-cycle phases, or even signal the onset of a disease. The key idea I present is that dis-similarity between time series is revealed by the sharp twists and bends produced in a higher-dimensional curve formed from the constituent signals. Scale-space analysis is used to detect the sharp twists and turns and their relative strength with respect to the component signals is estimated to form a shape similarity measure between time profiles. A mean-shape clustering algorithm is presented to cluster gene profiles using the scale-space distance as a similarity metric. Multi-dimensional curves formed from time series within clusters are used as cluster prototypes or indexes to the gene expression database, and are used to retrieve the functionally similar genes to a query gene profiles.
Dr. Tanveer Syeda-Mahmood is a Research Staff Member at IBM Almaden Research Center leading a project on Federated Mining and Information Integration for Life Sciences. Prior to working at IBM, Dr. Syeda-Mahmood led the image indexing research program at Xerox Research. Dr. Syeda-Mahmood got her Ph.D in Computer Science from the AI Lab at MIT. She is a well-known researcher in pattern recognition, multimedia and signal processing. She has chaired several international workshops on Content-based Retrieval and has edited special issues of journals. Dr. Syeda-Mahmood has published over 50 refereed publications and over 35 patents. Her current research interests are in the design and application of pattern recognition techniques to Life Science data sets including gene expression data, and medical images.