In this talk we review a general framework for visualizing multimedia structure using inter-sample similarity. The approach is unsupervised and readily adaptable to various modalities, feature representations, and similarity measures. The resulting visualizations suggest several approaches for automatically characterizing the temporal structure of media streams. We consider two examples. The first is a system for identifying repetitive structure in music and audio. We use it to detect chorus segments in popular music for use as summaries. The approach provides a complete structural characterization of the audio stream to enable adaptable summary design. The second example is a system for automatically clustering digital photo collections into time-based events. We consider visualizations of temporal and content-based inter-photo similarity at multiple scales to construct a hierarchical partition of the time interval during which the photos were taken. We use various clustering criteria to select a final partitioning of the photos. This system is integrated in an application which also allows users to browse the hierarchical segmentation and organize their photos semi-automatically.