Large-scale Search in Low-Resource Environments
Commercial search providers deploy large data centers to efficiently process big document collections. Often the collection is divided into 'shards' that are distributed across a large number of computers and searched in parallel to provide rapid interactive search. Typically, all index shards are searched for each query. This approach, referred to as 'distributed exhaustive search', works well in resource-rich environments but it cannot be prescribed to organizations with modest computing resources.
In this talk, I will present 'distributed selective search', an approach for low-resource environments, that partitions the collection such that only a few shards need to be searched for each query. Results from empirical evaluations conducted using some of the largest available document collections demonstrate that selective search is just as accurate and substantially more efficient than exhaustive search.
Anagha Kulkarni is a PhD candidate in the Language Technologies Institute at Carnegie Mellon University. Her PhD thesis research focuses on efficient and effective large-scale search. She has research interests in information retrieval, natural language processing, and machine learning. She is a recipient of the Barbara Lazarus Women@IT