Using film grammar as the underpinning, we study the extraction of structures in video based on color using a wide configuration of clustering methods combined with existing and new similarity measures. We study the visualisation of these structures, which we call Scene-Cluster Temporal Charts and show how it can bring out the interweaving of different themes and settings in a film. We also extract color events that filmmakers use to draw/force a viewer's attention to a shot/scene. This is done by first extracting a set of colors used rarely in film, and then building a probabilistic model for color event detection. We demonstrate with experimental results from ten movies that our algorithms are effective in the extraction of both scene-cluster temporal charts and color events.