In a contextual cuing paradigm, we examined how memory for the spatial structure of a natural scene guides visual search. Participants searched through arrays of objects that were embedded within depictions of real-world scenes. If a repeated search array was associated with a single scene during study, then array repetition produced significant contextual cuing. However, expression of that learning was dependent on instantiating the original scene in which the learning occurred: Contextual cuing was disrupted when the repeated array was transferred to a different scene. Such scene-specific learning was not absolute, however. Under conditions of high scene variability, repeated search array were learned independently of the scene background. These data suggest that when a consistent environmental structure is available, spatial representations supporting visual search are organized hierarchically, with memory for functional subregions of an environment nested within a representation of the larger scene.