An Experimental Study of the Hybrid Hash Join Algorithm
Join is a very important but usually expensive operation in all database systems. Hybrid hash join has proven to be efficient in systems with large main memory. My thesis describes the design and implementation of a series of experiments to understand the performance of the hybrid hash join algorithm under multiple-disk environments. We implement a generic hybrid hash join system and incorporate the disk simulator DiskSim-1.0 into the system. We also suggest a set of cost formulas for the multi-disk environments that we consider. From the experiments, we verify that the cost formulas are accurate enough to give a good estimate of the cost of the join operation. We also find that large main memory and data striping with large memory pages can improve the join performance, while disk caching and prefetching does not help.
1988-1992 Bachelor of Science in Electrical Engineering from Shanghai University of Technology
1997-1999 Master of Science in Computer Science from San Francisco State University