Humans are exquisitely sensitive to the particular arrangement of complex visual features that make up objects. A large body of evidence implicates the ventral temporal cortex in object recognition, suggesting that it encodes both complex features and their spatial arrangement. Contrarily, we demonstrate that ventral temporal cortex and image-net trained deep convolutional neural networks display representational geometry that does not distinguish between natural images of objects or scenes and images that have been engineered to have similar complex visual features that have been spatially scrambled. Human observers, nonetheless, can easily distinguish the natural images. Our results suggest the need to reconceptualize the role of ventral temporal cortex as representing a basis set of complex texture-like visual features that are generally useful for a variety of visual behaviors, rather than as an explicit representation of objects.
講者個人網頁:https://profiles.stanford.edu/justin-gardner?tab=bio
主辦單位:台大心理系
協辦單位:台大人工智慧與機器人研究中心、台大身心文整合影像研究中心