Mechanisms of human dynamic object recognition revealed by sequential deep neural networks
Fig 3
Augmenting BLnext with different forms of adaptation boosts sequential object recognition.
A We here study the challenging task of sequential object recognition, during which a model simply receives a video stream, akin to the inputs received by biological visual systems–without any indication of the timing and number of presented images (top panel). We compare this to the easier task of single-image recognition, during which recognition is tested on a single image at a time and is artificially reset for every new image (lower panel). If thought of as a sequence, this type of recognition presupposes knowledge of the exact timing of every image change. Single-image models serve here as reference models that capture stimulus difficulty without the added difficulty of sequential processing. To understand the potential contributions of lateral recurrence and adaptation to sequential object recognition, we compared three different candidate models: A lateral recurrent model without adaptation (BLnext, orange) evaluated across a different number of recurrent steps (incl. no recurrent processing for a single model step), the same model with exponential adaptation (BLnext–exponential, light blue) and one with power-law adaptation (BLnext–power-law, dark blue). B Suppression and activation evolve over time in a network unit with adaptation. As can be expected, there is no reduction in activation without adaptation (r, orange) over time, while for both exponential and power-law adaptation (light and dark blue, respectively), an ongoing incoming signal (shaded grey area) leads to a build-up in suppression (sexp, spow), resulting in a reduced activation (rexp, rpow). Without a signal, these suppression indices decay and can potentially result in an offset at the arrival of a new signal. For clarity, this example does not show any recurrent interactions. C Evaluating a trial example (same as shown in Fig 2B–2D) for sequential recognition highlights that in contrast to a recurrent model without adaptation, both forms of adaptation are associated with suppressed representations of the target right after the target image is presented. D Lateral recurrent models and in particularly those with adaptation improve in their sequential object recognition performance on the Ecoset test set (500 image sequences based on 6 random images) on longer sequences. Top5 accuracy was chosen to account for lingering representations of previously processed images during sequential processing. As a reference, performance on the easier task of single-image recognition (using the same images) is shown (dashed purple frame). Single-image recognition performance reflects the difficulty of the randomly chosen images (without any interference due to sequential processing).