It is still difficult for computers to simulate the human brain when it comes to listening.
Scientists have developed a mathematical model that can significantly improve the automatic recognition and processing of spoken language.
Researchers at the Leipzig Max Planck Institute for Human Cognitive and Brain Sciences and the Wellcome Trust Center for Neuroimaging in London say that such algorithms, which imitate brain mechanisms, could help machines to perceive the world around them.
The reason for this is that the computer programs that have been used to date rely on processes that are particularly sensitive to perturbations. When computers process language, they primarily attempt to recognize characteristic features in the frequencies of the voice in order to recognize words.
"It is likely that the brain uses a different process," said Stefan Kiebel from the Leipzig Max Planck Institute for Human Cognitive and Brain Sciences.
He presumed that the analysis of temporal sequences plays an important role in this.
"Many perceptual stimuli in our environment could be described as temporal sequences," he said.
For example, music and spoken language are comprised of sequences of different length, which are hierarchically ordered.
He hypothesized that the brain classifies the various signals from the smallest, fast-changing components (e.g., single sound units like 'e' or 'u') up to big, slow-changing elements (e.g., the topic).
The significance of the information at various temporal levels is probably much greater than previously thought for the processing of perceptual stimuli.
"The brain permanently searches for temporal structure in the environment in order to deduce what will happen next," explained the scientist.
As a result, the brain can, for example, often predict the next sound units based on the slow-changing information. Thus, if the topic of conversation is the hot summer, 'su...' will more likely be the beginning of the word 'sun' than the word 'supper'.
To test the hypothesis, the researchers constructed a mathematical model, which was designed to imitate, in a highly simplified manner, the neuronal processes which occur during the comprehension of speech.
Neuronal processes were described by algorithms, which processed speech at several temporal levels.
The model succeeded in processing speech-it recognized individual speech sounds and syllables.
In contrast to other artificial speech recognition devices, it was able to process sped-up speech sequences.
In addition, it had the brain's ability to 'predict' the next speech sound. And in case a prediction turned out to be wrong because the researchers made an unfamiliar syllable out of the familiar sounds, the model was able to detect the error.
This indicates that the researchers' model could represent the processes in the brain and also it provides new approaches for practical applications in the field of artificial speech recognition.
The study has been published in the latest issue of PLoS Computational Biology.