neural networks

How Transformers Seem to Mimic Parts of the Brain

Neural networks originally designed for language processing turn out to be great models of how our brains understand places.
A transformer model is superimposed on a human brain with a red background

Kristina Armitage/Quanta Magazine

Introduction

Understanding how the brain organizes and accesses spatial information — where we are, what’s around the corner, how to get there — remains an exquisite challenge. The process involves recalling an entire network of memories and stored spatial data from tens of billions of neurons, each connected to thousands of others. Neuroscientists have identified key elements such as grid cells, neurons that map locations. But going deeper will prove tricky: It’s not as though researchers can remove and study slices of human gray matter to watch how location-based memories of images, sounds and smells flow through and connect to each other.

Artificial intelligence offers another way in. For years, neuroscientists have harnessed many types of neural networks ­­­— the engines that power most deep learning applications — to model the firing of neurons in the brain. In recent work, researchers have shown that the hippocampus, a structure of the brain critical to memory, is basically a special kind of neural net, known as a transformer, in disguise. Their new model tracks spatial information in a way that parallels the inner workings of the brain. They’ve seen remarkable success.

“The fact that we know these models of the brain are equivalent to the transformer means that our models perform much better and are easier to train,” said James Whittington, a cognitive neuroscientist who splits his time between Stanford University and the lab of Tim Behrens at the University of Oxford.

Studies by Whittington and others hint that transformers can greatly improve the ability of neural network models to mimic the sorts of computations carried out by grid cells and other parts of the brain. Such models could push our understanding of how artificial neural networks work and, even more likely, how computations are carried out in the brain, Whittington said.

“We’re not trying to re-create the brain,” said David Ha, a computer scientist at Google Brain who also works on transformer models. “But can we create a mechanism that can do what the brain does?”

Transformers first appeared five years ago as a new way for AI to process language. They are the secret sauce in those headline-grabbing sentence-completing programs like BERT and GPT-3, which can generate convincing song lyrics, compose Shakespearean sonnets and impersonate customer service representatives.

Transformers work using a mechanism called self-attention, in which every input — a word, a pixel, a number in a sequence — is always connected to every other input. (Other neural networks connect inputs only to certain other inputs.) But while transformers were designed for language tasks, they’ve since excelled at other tasks such as classifying images — and now, modeling the brain.

In 2020, a group led by Sepp Hochreiter, a computer scientist at Johannes Kepler University Linz in Austria, used a transformer to retool a powerful, long-standing model of memory retrieval called a Hopfield network. First introduced 40 years ago by the Princeton physicist John Hopfield, these networks follow a general rule: Neurons that are active at the same time build strong connections with each other.

Hochreiter and his collaborators, noting that researchers have been looking for better models of memory retrieval, saw a connection between how a new class of Hopfield networks retrieve memories and how transformers perform attention. These new Hopfield networks, developed by Hopfield and Dmitry Krotov at the MIT-IBM Watson AI Lab, can store and retrieve more memories compared to the standard Hopfield networks because of more effective connections. Hochreiter’s team upgraded these networks by adding a rule that acts like the attention mechanism in transformers.

Then, earlier this year, Whittington and Behrens helped further tweak the approach, modifying the transformer so that instead of treating memories as a linear sequence — like a string of words in a sentence — it encoded them as coordinates in higher-dimensional spaces. That “twist,” as the researchers called it, further improved the model’s performance on neuroscience tasks. They also showed that the model was mathematically equivalent to models of the grid cell firing patterns that neuroscientists see in fMRI scans.

“Grid cells have this kind of exciting, beautiful, regular structure, and with striking patterns that are unlikely to pop up at random,” said Caswell Barry, a neuroscientist at University College London. The new work showed how transformers replicate exactly those patterns observed in the hippocampus. “They recognized that a transformer can figure out where it is based on previous states and how it’s moved, and in a way that’s keyed into traditional models of grid cells.”

Photo of Tim Behrens in a blue print T-shirt, photo of James Whittington in a white T-shirt

Tim Behrens (left) and James Whittington helped show that structures in our brains function mathematically similar to transformers.

(left) Louise Bold; Tongyun Shang

Other recent work suggests that transformers could advance our understanding of other brain functions as well. Last year, Martin Schrimpf, a computational neuroscientist at the Massachusetts Institute of Technology, analyzed 43 different neural net models to see how well they predicted measurements of human neural activity as reported by fMRI and electrocorticography. Transformers, he found, are the current leading, state-of-the-art neural networks, predicting almost all the variation found in the imaging.

And Ha, along with fellow computer scientist Yujin Tang, recently designed a model that could intentionally send large amounts of data through a transformer in a random, unordered way, mimicking how the human body transmits sensory observations to the brain. Their transformer, like our brains, could successfully handle a disordered flow of information.

“Neural nets are hard-wired to accept a particular input,” said Tang. But in real life, data sets often change quickly, and most AI doesn’t have any way to adjust. “We wanted to experiment with an architecture that could adapt very quickly.”

Despite these signs of progress, Behrens sees transformers as just a step toward an accurate model of the brain — not the end of the quest. “I’ve got to be a skeptic neuroscientist here,” he said. “I don’t think transformers will end up being how we think about language in the brain, for example, even though they have the best current model of sentences.”

“Is this the most efficient basis to make predictions about where I am and what I will see next? If I’m honest, it’s too soon to tell,” said Barry.

Schrimpf, too, noted that even the best-performing transformers are limited, working well for words and short phrases, for example, but not for larger-scale language tasks like telling stories.

“My sense is that this architecture, this transformer, puts you in the right space to understand the structure of the brain, and can be improved with training,” said Schrimpf. “This is a good direction, but the field is super complex.”

Correction: September 16, 2022
This article has been edited to more accurately reflect the connection between transformers and Hopfield networks.

Comment on this article