Anagha Kulkarni, Ph.D. Candidate (Canegie Mellon University)
Commercial search providers deploy large data centers to efficiently process big document collections. Often the collection is divided into 'shards' that are distributed across a large number of computers and searched in parallel to provide rapid interactive search. Typically, all index shards are searched for each query. This approach, referred to as 'distributed exhaustive search', works well in resource-rich environments but it cannot be prescribed to organizations with modest computing resources.
In this talk, I will present 'distributed selective search', an approach for low-resource environments, that partitions the collection such that only a few shards need to be searched for each query. Results from empirical evaluations conducted using some of the largest available document collections demonstrate that selective search is just as accurate and substantially more efficient than exhaustive search.