Distributed Monitoring Techniques and Analysis; Eliminating the Wizard Gap

Wednesday, September 18, 2002 - 17:30
TH 331
Martin Stoufer Lawrence Berkeley National Laboratories

With the emergence of Distributed Computing and the 'Grid', it is all but impossible for the end user to determine why their distributed application runs slow. Researches with varying constraints simply do not have the resources to identify and resolve the problem(s). With the use of a comprehensive monitoring and analysis framework in place, we can now see the entire top-to-bottom state of the system at any give time. This talk will cover LBNL's work in developing a comprehensive suite of tools and techniques used to solve distributed computing problems.


Martin Stoufer is a Fall 1997 graduate from SFSU and has worked at LBNL for the past 5 years. Starting first in the Operations group of the Supercomputing center at the lab, he eventually moved into research end of large-scale computing. Currently in the Data Intensive Distributed Computing group of the Distributed Systems Division, he has worked on this project, and other Web Service projects for the past two years. He has co-authored 3 published papers on the needs and addressed problems of distributed computing.

http://www-didc.lbl.gov/papers/HPDC02-HP-monitoring.pdf http://www-didc.lbl.gov/papers/Monitoring-archive-SC02.pdf http://www-didc.lbl.gov/papers/Enable.HPDC01.pdf