|
Abstract: |
Today's most popular computer applications are Internet services like Google, Facebook, and Amazon.
In addition to serving millions of hits per day on the front-end, these services must analyze
hundreds of terabytes of data for applications like search, spam detection and business intelligence
on the back-end, using clusters of thousands of machines. I will talk about MapReduce, a simple but
surprisingly versatile programming model for clusters that was developed at Google and popularized
through the Hadoop open-source project. I will also tour some higher-level programming tools being
developed on top of MapReduce and related systems, such as Yahoo's Pig and Microsoft's DryadLINQ, to
simplify large-scale parallel programming. Finally, I will show how cloud computing services have
made it possible for small companies and research groups to take advantage of these large-scale data
processing systems. |