Visual Perception meets Computational Neuroscience


Central part of the focus programme will be a Plenary Symposium entitled 'Computational Neuroscience meets Visual Perception' in the afternoon of Tuesday, 27th of August: During the past years, theoretical work, mathematical analysis and model simulations have become an increasingly important part in understanding human vision. In the symposium, we will have five keynote presentations on different topics in visual perception where Computational Neuroscience plays a key role to evaluate, understand, and to unify empirical observations. For this purpose, we have invited scientists who both provided fundamental contributions to Computational Neuroscience, and also did either experimental work by themselves or closely collaborated with experimentalists in vision science. The purpose of the presentations will be not only to tell the success stories of Computational Neuroscience in vision research, but especially to inspire and motivate young scientists to use quantitative methods and models for analyzing and supporting their experimental findings. We hope that setting this focus in the ECVP meeting will bring the two communities closer together and provide synergies for both theoreticians and experimentalists.

Confirmed speakers for this symposium:

 

Felix Wichmann
(University of Tübingen, Tübingen, Germany)
Models of Early Spatial Vision: Bayesian Statistics and Population Decoding

Li Zhaoping
(University College London, London, United Kingdom):
A theory of the primary visual cortex (V1): Predictions, experimental tests, and implications for future research

Wilson Geisler
(University of Texas, Austin, Texas, USA):
Task-Specific Optimal Encoding and Decoding

Josh Tenenbaum
(Massachusetts Institute of Technology, Cambridge, Massachusetts, USA):
Modeling common-sense scene understanding with probabilistic programs


Martin Giese
(University of Tübingen, Tübingen, Germany)
Neural theory for the visual recognition of goal-directed actions


 

Abstracts:


Models of Early Spatial Vision: Bayesian Statistics and Population Decoding

Felix Wichmann

University of Tübingen, Tübingen, Germany

In psychophysical models of human pattern detection it is assumed that the retinal image is analyzed through (nearly) independent and linear pathways (“channels”) tuned to different spatial frequencies and orientations followed by a simple  maximum-output decoding rule. This hypothesis originates from a series of very carefully conducted and frequently replicated psychophysical pattern detection, summation, adaptation, and uncertainty experiments, whose data are all consistent with the simple model described above. However, spatial-frequency tuned neurons in primary visual cortex are neither linear nor independent, and ample evidence suggests that perceptual decisions are mediated by pooling responses of multiple neurons. Here I will present recent work by Goris, Putzeys, Wagemans & Wichmann (Psychological Review, in press), proposing an alternative theory of detection in which perceptual decisions develop from maximum-likelihood decoding of a neurophysiologically-inspired model of population activity in primary visual cortex. We demonstrate that this model predicts a broad range of classic detection results. Using a single set of parameters, our model can account for several summation, adaptation and uncertainty effects, thereby offering a new theoretical interpretation for the vast psychophysical literature on pattern detection. One key component of this model is a task-specific, normative decoding mechanisms instead of a task-independent maximum-output---or any Minkowski-norm---typically employed in early vision models. This opens the possibility that perceptual learning may at least sometimes be understood in terms of learning the weights of the decoder: Why and when can we successfully learn it, as in the examples presented by Goris et al. (in press)? Why do we fail to learn it in other cases, e.g. Putzeys, Bethge, Wichmann, Wagemans & Goris (PLoS Computational Biology, 2012)? Furthermore, the success of the Goris et al. (2013) model highlights the importance of moving away from ad-hoc models designed to account for data of a single experiment, and instead moving towards more systematic and principled modeling efforts accounting for many different datasets using a single model. Finally, I will briefly show how statistical modeling can complement the mechanistic modeling approach by Goris et al. (2013). Using a Bayesian graphical model approach to contrast discrimination, I show how Bayesian inference allows to estimate the posterior distribution of the parameters of such a model. The posterior distribution provides diagnostics of the model that help drawing meaningful conclusions from a model and its parameters.

 


A theory of the primary visual cortex (V1): Predictions, experimental tests, and implications for future research

Li Zhaoping

University College London, London, United Kingdom

Since Hubel and Wiesel's venerable studies, more is known about the physiology of V1 than other areas in visual cortex.  However, its function has been seen merely as extracting primitive image features to service more important functions of higher visual areas such as object recognition. A decade ago, a different function of V1 was hypothesized: creating a bottom-up saliency map which exogenously guides an attentional processing spotlight to a tiny fraction of visual input (Li, 2002, Trends in Cognitive Science, 6(1):9-16). This theory holds that the bottom-up saliency of any visual location in a given scene is signaled by the highest V1 neural response to this location, regardless of the feature preferences of the neurons concerned. Intra-cortical interactions between neighboring V1 neurons serve to transform visual inputs to neural responses that signal the saliency.  In particular, iso-feature suppression between neighboring V1 neurons tuned to similar visual features, such as orientation or color, reduces V1 responses to an iso-feature background, thereby highlighting the relatively unsuppressed response to an unique feature singleton.  Superior colliculus, receiving inputs directly from V1, likely reads out the V1 saliency map to execute attentional selection.

Several non-trivial predictions from this V1 theory have subsequently been confirmed. The most surprising one states that an ocular singleton --- an item uniquely presented to one eye among items presented to the other eye --- should capture attention (Zhaoping, 2008, Journal of Vision, 8/5/1). This attentional capture is stronger than that of a perceptually distinct orientation singleton. It is a hallmark of V1, since the eye of origin of visual input is barely encoded in cortical areas beyond V1, and indeed it is nearly impossible for observers to recognize an input based on its eye of origin.

Another distinctive prediction is quantitative, yet parameter-free (Zhaoping and Zhe, 2012, Journal of Vision, 12(9):1160). It concerns reaction times  for finding a single bar with unique features (in color, orientation, and/or motion direction) in a field of other bars that are all the same. Reaction times are shorter when the unique target bar differs from the background bars by more features; the theory predicts exactly how much. Behavioural data (collected by Koene and Zhaoping 2007, Journal of Vision, 7/7/6) confirms this prediction. The prediction depends on there being only few neurons tuned to all the three features, a restriction that is true of V1, but not of extra-striate areas. This suggests that the latter play little role in exogenous saliency of at least feature singletons.

Exogenous selection is faster and often more potent than endogenous selection, and together they admit only a tiny fraction of sensory information through an attentional bottleneck. V1's role in exogeneous selection suggests that extra-striate areas might be better understood in terms of computations in light of the exogenous selection, and these computations include endogenous selection and post selectional visual inference.  Furthermore, visual bottom-up saliency signals found in frontal and parietal cortical areas should be inherited from V1.

 


Task-Specific Optimal Encoding and Decoding

Wilson Geisler, Johannes Burge, Anthony D’Antona and Jeffrey S. Perry

Center for Perceptual Systems, University of Texas, Austin, Texas, USA

The visual system of an organism is likely to be well-matched to the specific tasks that the organism performs. Thus, for any natural task of interest, it is often valuable to consider how to perform the task optimally, given the statistical properties of the natural signals and the relevant biological constraints. Such a “natural systems analysis” can provide a deep computational understanding of the natural task, as well as principled hypotheses for perceptual mechanisms that can be tested in behavioral and/or neurophysiological experiments. To illustrate this approach, I will briefly summarize the key concepts of Bayesian ideal observer theory for estimation tasks, and then show how those concepts can be applied to the tasks of binocular-disparity (depth) estimation and occluded-point estimation in natural scenes. In the case of disparity estimation, the analysis shows that many properties of neurons in early visual cortex, as well as properties of human disparity discrimination performance, follow directly from first principles; i.e., from optimally exploiting the statistical properties of the natural signals, given the biological constraints imposed by the optics and geometry of the eyes. In the case of occluded-point estimation, the analysis shows that almost all the relevant image information is contained in the immediate neighborhood of the occluded point, and that optimal performance requires encoding and decoding absolute intensities; the pattern of relative intensities (the contrast image) is not sufficient for optimal performance. Psychophysical measurements show that human estimation accuracy is sub-optimal, but that humans closely match an ideal observer that uses only the relative intensities. I conclude that analysis of optimal encoding and decoding in specific natural tasks is a powerful approach for investigating the mechanisms of visual perception in humans and other organisms.

 


Modeling common-sense scene understanding with probabilistic programs

Josh Tenenbaum

Massachusetts Institute of Technology, Cambridge, Massachusetts, USA

To see is, famously, to ''know what is where by looking''. Yet to see is also to know what will happen, what can be done, and what is being done -- to detect not only objects and their locations, but the physical dynamics governing how objects in the scene interact with each other and how agents can act on them, and the psychological dynamics governing how intentional agents in the scene interact with these objects and each other to achieve their goals.  I will talk about recent efforts to capture these core aspects of human common-sense scene understanding in computational models that can be compared with the judgments of both adults and young children in precise quantitative experiments, and used for building more human-like machine vision systems.  These models of intuitive physics and intuitive psychology take the form of "probabilistic programs": probabilistic generative models defined not over graphs, as in many current machine learning and vision models, but over programs whose execution traces describe the causal processes giving rise to the behavior of physical objects and intentional agents.  Common-sense physical and psychological scene understanding can then be characterized as approximate Bayesian inference over these probabilistic programs.

Specifically, we embed several standard algorithms -- programs for fast approximate graphics rendering from 3D scene descriptions, fast approximate physical simulation of rigid body dynamics, and optimal control of rational agents (including state estimation and motion planning) -- inside a Monte Carlo inference framework, which is capable of inferring inputs to these programs from observed partial outputs.  We show that this approach is able to solve a wide range of problems including inferring scene structure from images, predicting physical dynamics and inferring latent physical attributes from static images or short movies, and reasoning about the goals and beliefs of agents from observations of short action traces.  We compare these solutions quantitatively with human judgments, and with the predictions of a range of alternative models.  How these models might be implemented in neural circuits remains an important and challenging open question.  Time permitting, I will speculate briefly on how it might be addressed.

This talk will cover joint work with Peter Battaglia, Jess Hamrick, Chris Baker, Tomer Ullman, Tobi Gerstenberg, Kevin Smith, Ed Vul, Eyal Decther, Vikash Mansinghka, Tejas Kulkarni, and Tao Gao.

 


Neural theory for the visual recognition of goal-directed actions

Martin Giese1,2, Falk Fleischer1,2, Vittorio Caggiano2,3, Jörn Pomper2, and Peter Thier2

1Section for Computational Sensomotorics, University Tübingen, Germany
2Dept. for Cognitive Neurology, HIH and CIN, University Clinic Tübingen, Germany
3McGovern Institute for Brain Research, M.I.T., Cambridge, MA, USA

The visual recognition of biological movements and actions is a centrally important visual function, involving complex computational processes that link neural representations for action perception and execution. This fact has made this topic highly attractive for researchers in cognitive neuroscience, and a broad spectrum of partially highly speculative theories have been proposed about the computational processes that might underlie action vision in primate cortex. Additional work has associated underlying principles with a wide range of other brain functions, such as social cognition, emotions, or the interpretation of causal events. In spite of this very active discussion about hypothetical computational and conceptual theories, our detailed knowledge about the underlying neural processes is quite limited, and a broad spectrum of critical experiments that narrow down the relevant computational key steps remain yet to be done.

We will present a physiologically-inspired neural theory for the processing of goal-directed actions, which provides a unifying account for existing neurophysiological results on the visual recognition of hand actions in monkey cortex. At the same time, we will present new experimental results from the Tübingen group. These experiments were partly motivated by testing aspects of the proposed neural theory. Partly they confirm aspects of this theory, and partly they point to substantial limitations, helping to develop more comprehensive neural accounts for the  computational processes that underlie visual action recognition in primate cortex.

Importantly, our model accounts for many basic properties of cortical action-selective  neurons by simple physiologically plausible mechanisms that are known from visual shape and motion processing, without necessitating a central computational role of motor representations. We demonstrate that the same model also provides an account for experiments on the visual perception of causality, suggesting that simple forms of causality perception might be a side effect of computational processes that mainly subserve the recognition of goal-directed actions.

Supported by the DFG, BMBF, and EU FP7 projects TANGO, AMARSI, and ABC.

 


Symposium Supported by: National Bernstein Network for Computational Neuroscience, Germany.