CE-14.15

Title: 

Taking WebFEATURE to the next level: Re-architecture & integration of multiple machine learning algorithms

Author(s): 

Ashwin Neurgaonkar

Oral Defence Date: 

08/08/2014

Location: 

TH 434

Committee: 

Profs. Dragutin Petkovic,James Wong; and Mike Wong

Abstract: 

Machine Learning (ML) algorithms became integral part of bioinformatics data analysis. Today there are many ML algorithms to chose from and it is often necessary to use and experiment with number of them at the same time. This requires not only the theoretical and algorithmic effort but also a system architecture which enables those ML software implementations to be easily integrated into one system. The purpose of this study is to design, implement and evaluate new architecture for joint SFSU and Stanford WebFEATURE bioinformatics application which can accommodate multiple ML software implementations. The work also involves modifying WebFEATURE user interface sub-system to accommodate multiple ML approaches. To solve this problem, we started by integrating Support Vector Machine scanning module into the existing application, and used this exercise to identify architectural and semantic issues in the underlying code. We then designed and implemented new software architecture with focus on modularity and reduced coupling using multiple dispatch mechanism. This new architecture is tested by integrating scanning modules for Support Vector Machines as well as Random Forest. Similar architectural and design changes were also made on the user interface layers to make it possible to display multiple sets of predictions. Our validation experiments using modern software engineering metrics proved that the new architecture is significantly more modular than the previous version. Static code analysis of the new scan system showed a significant improvement in code quality over the previous version and higher adherence to industry coding guidelines. Lastly, the ease of integration of new modules, measured as the percentage of integration code required per module, showed an 80% improvement over the previous version.

Keywords: 

bioinformatics, machine learning, software integration, WebFEATURE

Copyright: 

Ashwin Neurgaonkar