Join
is a very important but usually expensive operation in all
database systems. Hybrid hash join has proven to be efficient
in systems with large main memory. My thesis describes the
design and implementation of a series of experiments to
understand the performance of the hybrid hash join algorithm
under multiple-disk environments. We implement a generic
hybrid hash join system and incorporate the disk simulator
DiskSim-1.0 into the system. We also suggest a set of cost
formulas for the multi-disk environments that we consider.
From the experiments, we verify that the cost formulas are
accurate enough to give a good estimate of the cost of the
join operation. We also find that large main memory and
data striping with large memory pages can improve the join
performance, while disk caching and prefetching does not
help.