Provenance Management for Scientific Data


Pamela Das

Oral Defence Date: 



TH 331


Professors Marguerite Murphy & Dragutin Petkovic


Current tools for web server benchmarking perform basic functions, such as conducting tests and saving the test results into files. Autobench/httperf is a widely used web server benchmarking tool. There are no provisions made by this tool to record each step needed to configure Autobench tests; such as collecting user information, configuring kernel parameters and configuring the parameter values for Autobench. In addition, test results are produced as simple tab separated value (TSV) and JPEG files that are stored in the file system with arbitrary filenames. Retrieval of test results and associated configuration data proves to be very challenging when there are thousands of tests conducted and the test results are not stored in an organized manner. Hence, it is easy for users to incorrectly analyze their experimental data, and users frequently cannot reuse the results of previously conducted experiments. This calls for an effective data provenance management system to keep track of how the stored benchmark results were created, in addition to storing the test results themselves. The main goal of the project presented in this report is to design, implement, test and deploy a system to manage data provenance for experimental data generated during Autobench/Httperf benchmarking. This system is one part of the TCP Perf suite of tools being developed at SFSU to partially automate the process of comparing the performance of different web server system configurations. These tools will help the user to not only better understand the results of a particular benchmarking test, but also to draw solid conclusions based on all of the information collected.


Autobench, Scientific Data Management system, provenance, Netlab environment, TCP Perf Suit


Pamela Das