Fundamental theories of human cognition have long posited that the short-term maintenance of actions is supported by one of the "core knowledge" systems of human visual cognition, yet its neural substrates are still not well understood. In particular, it is unclear whether the visual short-term memory (VSTM) of actions has distinct neural substrates or, as proposed by the spatio-object architecture of VSTM, shares them with VSTM of objects and spatial locations. In two experiments, we tested these two competing hypotheses by directly contrasting the neural substrates for VSTM of actions with those for objects and locations. Our results showed that the bilateral middle temporal cortex (MT) was specifically involved in VSTM of actions because its activation and its functional connectivity with the frontal-parietal network (FPN) were only modulated by the memory load of actions, but not by that of objects/agents or locations. Moreover, the brain regions involved in the maintenance of spatial location information (i.e., superior parietal lobule, SPL) was also recruited during the maintenance of actions, consistent with the temporal-spatial nature of actions. Meanwhile, the frontoparietal network (FPN) was commonly involved in all types of VSTM and showed flexible functional connectivity with the domain-specific regions, depending on the current working memory tasks. Together, our results provide clear evidence for a distinct neural system for maintaining actions in VSTM, which supports the core knowledge system theory and the domain-specific and domain-general architectures of VSTM.