Experts at the Massachusetts Institute of Technology (MIT) have warned against welcoming recent suggestions that computers are making progress in learning how to see like humans—something really important for applications ranging from "intelligent" cars to visual prosthetics for the blind.
Any such results may be misleading, they say, because the tests being used are inadvertently stacked in favour of computers.
Recent computational models show 60 per cent success rates in classifying natural photographic image sets. These include the widely used Caltech101 database, which is intended to test computer vision algorithms against the variety of images seen in the real world.
However, MIT experts argue that these image sets have design flaws that enable computers to succeed where they would fail with more authentically varied images.
While photographers tend to centre objects in a frame and to prefer certain views and contexts, the visual system, by contrast, encounters objects in a much broader range of conditions, the experts add.
"The ease with which we recognize visual objects belies the computational difficulty of this feat," says James DiCarlo, a neuroscientist in the McGovern Institute for Brain Research at MIT.
"The core challenge is image variation. Any given object can cast innumerable images onto the retina depending on its position, distance, orientation, lighting and background," adds DiCarlo, who is also a senior author of the study posted online in PLoS Computational Biology.
The flaws were exposed in current tests of computer object recognition, during which a simple "toy" computer model inspired by the earliest steps in the brain's visual pathway was used.
Artificial neurons with properties resembling those in the brain's primary visual cortex analyse each point in the image, and capture low-level information about the position and orientation of line boundaries.
The model lacks the more sophisticated analysis that happens in later stages of visual processing to extract information about higher-level features of the visual scene such as shapes, surfaces or spaces between objects.
The researchers were expecting this model to fail as a way to establish a baseline while testing it on the Caltech101 images, but the model did surprisingly well, with performance similar or better than five state-of-the-art object-recognition systems.
"We suspected that the supposedly natural images in current computer vision tests do not really engage the central problem of variability, and that our intuitions about what makes objects hard or easy to recognize are incorrect," said Nicolas Pinto, a graduate student of the Rowland Harvard Institute.
The authors then designed a more carefully controlled test. Using just two categories-planes and cars-they introduced variations in position, size and orientation that better reflected the range of variation in the real world.
"With only two types of objects to distinguish, this test should have been easier for the 'toy' computer model, but it proved harder," said David Cox, another graduate student of the Rowland Harvard Institute.
The team concluded: "Our model did well on the Caltech101 image set not because it is a good model but because the 'natural' images fail to adequately capture real-world variability."
The researchers stressed the need for revamping the current standards and images, which are used by the computer-vision community to compare models and measure progress.
Before computers can approach the performance of the human brain, they say, scientists must better understand why the task of object recognition is so difficult, and the brain's abilities are so impressive