The Brain Processes Speech in Parallel With Other Sounds
Introduction
Hearing is so effortless for most of us that it’s often difficult to comprehend how much information the brain’s auditory system needs to process and disentangle. It has to take incoming sounds and transform them into the acoustic objects that we perceive: a friend’s voice, a dog barking, the pitter-patter of rain. It has to extricate relevant sounds from background noise. It has to determine that a word spoken by two different people has the same linguistic meaning, while also distinguishing between those voices and assessing them for pitch, tone and other qualities.
According to traditional models of neural processing, when we hear sounds, our auditory system extracts simple features from them that then get combined into increasingly complex and abstract representations. This process allows the brain to turn the sound of someone speaking, for instance, into phonemes, then syllables, and eventually words.
But in a paper published in Cell in August, a team of researchers challenged that model, reporting instead that the auditory system often processes sound and speech simultaneously and in parallel. The findings suggest that how the brain makes sense of speech diverges dramatically from scientists’ expectations, with the signals from the ear branching into distinct brain pathways at a surprisingly early stage in processing — sometimes even bypassing a brain region thought to be a crucial stepping-stone in building representations of complex sounds.
The work offers hints of a new explanation for how the brain can unbraid overlapping streams of auditory stimuli so quickly and effectively. Yet in doing so, the discovery doesn’t just call into question more established theories about speech processing; it also challenges ideas about how the entire auditory system works. Much of the prevailing wisdom about our perception of sounds is based on analogies to what we know about computations performed in the visual system. But growing evidence, including the recent study on speech, hints that auditory processing works very differently — so much so that scientists are starting to rethink what the various parts of the auditory system are doing and what that means for how we decipher rich soundscapes.
“This study is a monumental undertaking,” said Dana Boebinger, a cognitive neuroscientist at Harvard University who was not involved in the work. Although she is not ready to abandon more conventional theories about how the brain processes complex auditory information, she finds the results “provocative” because they hint that “maybe we don’t actually have a very good idea of what’s going on.”
Turning a Hierarchy on Its Head
The earliest steps in our perception of sound are very well understood. When we hear someone speak, the cochlea in our ear separates the complex sound into different component frequencies and sends that representation through several stages of processing to the auditory cortex. At first, information is extracted from those signals about a sound’s location in space, its pitch and how much it is changing. What happens next is trickier to nail down: Higher cortical regions are thought to tease out features specifically relevant to speech — from phonemes to prosody — in a hierarchical sequence. The features of other complex types of sounds, such as music, would be handled similarly.
This arrangement echoes models of how the visual system works: It interprets patterns of light falling on cells in the retina first as lines and edges, and then as more complex features and patterns, ultimately building up a representation of a face or an object.
But dissecting the details of the flow of auditory information has been difficult. Studies of speech can’t get far by using animals because speech is a uniquely human trait. And in humans, most research has to use indirect methods to measure brain activity. Getting direct recordings is much trickier because it’s invasive: Scientists need to piggyback on medical procedures, collecting data from electrodes implanted in the brains of patients getting surgery for epilepsy. But many auditory regions of interest are nestled deep within the brain between the frontal and temporal lobes — an area where surgeons don’t usually seek recordings.
Still, many of those direct and indirect studies found evidence for the traditional hierarchical model of auditory and speech processing: One of the early stops in the process, the primary auditory cortex, seems to be tuned to encode simple features of sounds, such as frequency. As the signals progress away from the primary auditory cortex, other brain regions seem to respond more to increasingly complex sound features instead, including features unique to speech, like phonemes. So far, so good.
But scientists deduced this hierarchical framework “based on experiments that weren’t necessarily looking to see how these regions were connected” or the sequences in which they became active, said Liberty Hamilton, a neuroscientist at the University of Texas, Austin.
And so, in 2014, she set out to build a more comprehensive map of speech sound representations throughout the auditory cortex, to learn what kind of information gets distilled from a sound in different brain areas, and how that information gets integrated from one region to the next.
She had a rare opportunity to explore that question, first as a postdoctoral researcher in the lab of Edward Chang, a neurosurgeon at the University of California, San Francisco, and then in her own lab in Austin. Chang, Hamilton and their colleagues were able to bring together several patients whose treatment had required electrode grids to be placed in various auditory locations.
Because opportunities to monitor those areas are so hard to come by, their recordings were “super precious data, and exciting,” Boebinger said. The researchers had hoped to be able to fill in details about how the brain transforms the low-level sound representations in the primary auditory cortex into more complex representations of speech sounds in a region further up in the hierarchy, the superior temporal gyrus.
Instead, what they found “sort of turned the idea on its head,” Hamilton said.
Pathways Diverging Early
The first hint that things weren’t proceeding as anticipated arrived quickly. The Chang group analyzed the responses of diverse auditory regions to features of pure tones and spoken words and sentences. They were able to confirm previous findings and fill in details of the map that had been missing.
But they also observed something strange. If information flowed hierarchically from “lower” to “higher” areas as they thought, then the primary auditory cortex should respond to an input before the superior temporal gyrus did. Yet some areas of the superior temporal gyrus seemed to respond to the onset of speech just as quickly as the primary auditory cortex responded to simple sound characteristics, like frequency.
The observation invited a tantalizing hypothesis: that the two brain regions were processing different aspects of the same input in parallel, and that “this parallel pathway for speech perception can bypass the primary auditory cortex — which is where we thought all of the information was supposed to go,” Hamilton said. That would mean some representations of speech sounds didn’t need to be built out of lower-level features extracted in the primary auditory cortex. “In a hierarchical model, you expect that the primary auditory cortex is the first way station that you have to go through before getting to the speech areas of the cortex,” Hamilton said. But her results suggested that that’s not necessarily true.
Chang, Hamilton and their colleagues decided to test that idea further. When they stimulated patients’ primary auditory cortex to disrupt its function, the patients still had no problem perceiving speech. Instead, they reported auditory hallucinations: sounds on top of the words or sentences they were hearing, ranging from buzzing and tapping to running water and shifting gravel.
When the researchers stimulated a subregion of the superior temporal gyrus, they saw the opposite: Patients could not understand speech but could still apparently hear sounds normally. “I could hear you speaking but can’t make out the words,” one subject reported.
Once again, “it was like there are just two separate processes,” Hamilton said — independent pathways for the processing of sounds and supposedly higher-level features associated with speech.
Finding parallel processing in the auditory cortex isn’t entirely a surprise. “Hierarchies are nice and clean when you’re talking about perceptual systems, because you know that at some level, you’re going from a noisy signal to something higher order and more abstract,” said Sophie Scott, a neuroscientist at University College London who did not participate in the study. “But no one ever told nature that that had to be the easiest or cleanest way of doing it.”
It only makes sense that at some point, separate brain circuits have to handle different types of auditory information simultaneously. In fact, researchers have already reported parallel functions at later stages in auditory processing: Complex musical and speech elements are processed separately, with their representations forming at least partly in parallel.
But those splits in speech and sound processing happen only after signals have passed through the primary auditory cortex. Hamilton and Chang’s work uncovered such a branch point very early in the process — so early that it might mean that information gets integrated to represent speech sounds at the subcortical level, rather than just in the cortex. And if subcortical processing plays such a large role in speech, researchers might also have overlooked other important ways in which the brain makes sense of complex sounds.
“We’ve learned again and again over the years that a lot of the things that we thought are cortical actually have, at least to some extent, also been subcortical,” said Israel Nelken, a neurobiologist and director of the Edmond and Lily Safra Center for Brain Sciences at the Hebrew University of Jerusalem.
In fact, the new results demonstrate that “lower” levels of the cortex might be hiding greater complexity, too. Scott, for instance, found it intriguing that stimulating the primary auditory cortex led to such a rich set of auditory hallucinations in the Chang group’s patients. According to her, such hallucinations would typically be associated with higher-order cortical regions.
So the primary auditory cortex might be doing more than it’s typically given credit for. Other recent work has pointed to the same conclusion: In contrast to the primary visual cortex, the primary auditory cortex receives signals that have already undergone much more processing, and it represents information in a much more context-sensitive way. It’s “functionally much more downstream than primary visual cortex is,” said David Poeppel, a neuroscientist at New York University.
‘More Like a Lightning Storm’
Even so, “I don’t think we want to throw out the hierarchical baby with the bathwater entirely,” Poeppel said. There are still hierarchies in this system, and they are important for constructing increasingly abstract mental representations.
But departing from that hierarchy to process speech and other sounds in parallel very early on might offer a lot of advantages. For one, it could help to optimize the speed of the auditory system, which demands microsecond-level precision because of the transient nature of sounds. “So having this kind of parallel organization might allow you to get information about speech or other complex sounds analyzed more quickly,” Boebinger said. Moreover, auditory signals are inherently messy: Individuals drop phonemes or skip words inconsistently, and they may speak differently in different social contexts. A parallel processing system might be better at dealing with such chaotic inputs.
It might also help the auditory system to segregate complex, overlapping sounds more efficiently and allow the brain to rapidly switch attention between those acoustic streams. “There have to be multiple streams of different sorts of information being processed, all at the same time, in a very plastic way, because the auditory environment can change at the drop of a hat,” Scott said. Given the importance of speech sounds to humans, it makes sense that our brain would process them quickly and in a way that keeps them distinct from background or environmental sounds.
And if speech and the sounds that produce them get processed independently very early on, then perhaps other types of sounds do too. To find out, Hamilton and others are hoping to do experiments with a broader array of auditory inputs — environmental sounds, music, sentences spoken amid background noise rather than in silence — to examine when and where different kinds of parallel processing might occur.
“We’re just starting to be able to dissect the components of that processing,” said Robert Shannon, a neuroscientist at the University of Southern California. Perhaps representations will turn out to form not just in ascending hierarchies or in neat parallel pathways, but with so much parallelism and complexity that it’s “more like a lightning storm,” he added.
And that “is a very different picture of how sensory systems work,” Nelken said.