How do our brains reconstruct the visual world?
Given that we see the world through two small, flat retinae at the backs of our eyes, it seems remarkable that what each of us perceives is a seamless, three-dimensional visual world.
The retinae respond to various wavelengths of light from the world around us. But that’s just the first part of the process. Our brains have to do a lot of work with all that raw data that comes in – stitching it all together, choosing what to concentrate on and what to ignore. It’s the brain that constructs our visual world.
Neuroscience researchers and cognitive scientists have recently made much progress investigating how this process works. My own research focuses on how humans construct the visual world by selecting what visual information to pay attention to and using visual memory to retain it over short periods of time. There’s a lot more than simple sensory input that goes into building our perception of the visual world we live within.
Eyes as visual sensors
The retina is a sheet of cells at the back of each of our eyes. Some of these cells, called photoreceptors, are sensitive to light. There are two main types: rods are sensitive to light-dark differences and cones are sensitive to color.
These photoreceptors are most densely packed together in a small area at the center of the retina called the fovea. It corresponds to the center of our vision, where resolution is at its highest. Detail progressively decreases for distances further from the center of our visual field – that is, in the periphery (hence “peripheral vision”).
As we look around our environment, we move our eyes. This enables us to orient the fovea toward what we’re most interested in within the vicinity. These voluntary eye movements are called saccades and are made about three times a second.
Eyes + brain = vision
Given that the eyes are in constant motion, how does the picture of the world we have in our mind remain so apparently stable? Investigating this apparent discrepancy, neuroscientists have discovered that inputs from the eyes are suppressed during saccades, so we don’t register the fast motion and image blur that would otherwise occur. Furthermore, our brain corrects for movements of the eyes using information from the eye muscles that control their movement. Because the brain omits the information that comes in while the eyes are moving, our visual world is perceived mostly during fixations, the short periods of time (approximately 200-300 milliseconds long) when the eyes are stationary. While reading for instance, our eyes are in motion only 10%-20% of the time.
During each fixation, we must select the visual information most relevant to performing the task at hand. We have an ability to attend to or focus on one or several sources of information while ignoring all the rest, or at least reducing their significance. Researchers call this visual attention; they think it’s critical for helping us bind together or integrate elementary features (for instance, color, orientation) to form the perception of complete objects in the environment.
In other words, the theory holds that visual attention is our small window into the world. It’s from that limited focus – both in space and attention – that our brains integrate visual information into coherent objects. For example, when looking at a busy city street, there are many potential sources of visual information to focus on. But using our visual attention we will only select a small subset of this information – for example, the yellow blob coming toward us that forms into a taxi – at any one time.
Visual memory helps build the scene
It’s selective processes such as visual attention that let the brain process important information and discard what’s not. What is or isn’t of interest will be determined by your individual goals. For example, one study showed that observers noticed a change to an object in a virtual reality setting only if that object was made task-relevant at the time of the change. For instance, if they were told to virtually sort bricks by size, they were more likely to notice changes in the bricks’ dimensions than if they were just lifting them in the order they appeared.
To figure out what is or isn’t important to the task at hand, an individual needs a way to retain some information across time. This is where visual memory comes in. It’s typically divided into short- and long-term flavors. The short-term version of visual memory is most important in moment-to-moment construction and stabilization of the visual world.
Scientists used to think visual short-term memory represented the visual world in detail, stitching together information from every stationary eye fixation to build up a detailed “picture in the head” of our surroundings.
However, more recent research has shown that observers typically don’t notice relatively large changes to the visual environment when these changes are accompanied by, say, an eye movement, or some other interruption. This phenomenon is called change blindness.
In a related phenomenon called inattentional blindness, people often miss obvious events in the environment when engrossed in an unrelated task. For instance when required to count the number of passes of a basketball, people generally did not notice when a person in a gorilla suit crossed the court and walked right through the group of players.
Research into these phenomena and their associated mechanisms has shown that humans build up a more schematic version of the environment across eye fixations than was previously thought. This schematic version of the environment is typically known as scene gist. It contains conceptual information about the scene’s basic category – is it natural, human-made, a cityscape? – and general layout, maybe limited to a few objects and/or features. This schematic version of the environment is a far cry from the “picture in the head” scenario. But it’s this schematic information that guides us from one eye fixation to the next, during which more detailed information can be sampled.
Without the brain constantly computing as a visual processor, the visual information we receive through our eyes would remain a chaotic, jumpy mess. Corrective neurological mechanisms account for our eyes’ movements. Visual memory and attention work together to allow a fluid transition from one source of information to the next. In combination, these processes allow our brain to create our perception of a coherent, stable visual world.