High-Performance Data Intensive Distributed Computing
This talk will give an overview of work that the Data Intensive Distributed Computing Group at Lawrence Berkeley National Lab has developed over that past few years.
The first is the Distributed Parallel Storage System (DPSS), which provides cost effective parallel access to many disks, and is tuned for wide-area network access. We have demonstrated an application with read and write speeds of up to 72 MBytes/sec over a OC-12 WAN using a 4 server DPSS system. Using a DPSS, an application can access remote data even faster than it can access data from a local disk, eliminating the need to copy data sets from the remote site before processing or visualizing it. For more information, see:
The second is The NetLogger Toolkit, which includes tools for instrumenting applications, tools for monitoring hosts and networks, and tools for collecting and visualizing monitoring data. Using these tools enables us to do end-to-end monitoring, and detailed performance analysis of every component in the system. This type of analysis is critical for obtaining performance in widely distributed systems. For more information, see:
Brian L. Tierney is a Staff Scientist and group leader of the Data Intensive Distributed Computing Project, which is part of the Future Technologies Group at Lawrence Berkeley National Laboratory. His research interests include data intensive distributed computing, high-speed I/O systems, and distributed system performance monitoring and analysis. Mr. Tierney has an M.S. in Computer Science from San Francisco State University, and a B.A. in physics from the University of Iowa.