Cloud Computing for Bioinformatics: An EC2 Case Study
Oral Defence Date:
Dragutin Petkovic, Jozo Dujmovic, & Mike Wong
Genomic and proteomic sequencing data have grown tremendously in size because of advances in sequencing technology and current trends support continued upwards scaling of data sets. Consequently, it requires high performance computing systems. Users may consider cluster computing as a potential solution to their computing needs. But building and operating even a relatively small cluster can be a formidable undertaking, requiring not just money but physical space, cooling, power, and management resources. Recently, cloud computing is drawing attentions for its cost effectiveness and flexibility since users can create a high performance cluster of any number of virtual computing servers in an on-demand manner. The goals of this study are: a) to investigate the applicability of cloud computing platforms like Amazon Elastic Cloud Computer for typical Bioinformatics applications such as BLAST ( http://blast.ncbi.nlm.gov/Blast.cgi ), for performance and ease of use and b) to develop tutorials and software template for non expert users to get started using Amazon Elastic Cloud Computer. Our contributions in the research include performance analysis for Amazon EC2 in using BLAST, with experiments ranging from 10 to 60 virtual nodes. Our results show that cloud is a viable alternative for bioinformatics applications like BLAST since it offers almost perfect paralleization but that it is still complex in terms of its usability. To help users in using Amazon EC2 system we developed a detailed tutorial document.
cloud computing, BLAST, ease of use