The Computer Scientist Training AI to Think With Analogies
Introduction
The Pulitzer Prize-winning book Gödel, Escher, Bach inspired legions of computer scientists in 1979, but few were as inspired as Melanie Mitchell. After reading the 777-page tome, Mitchell, a high school math teacher in New York, decided she “needed to be” in artificial intelligence. She soon tracked down the book’s author, AI researcher Douglas Hofstadter, and talked him into giving her an internship. She had only taken a handful of computer science courses at the time, but he seemed impressed with her chutzpah and unconcerned about her academic credentials.
Mitchell prepared a “last-minute” graduate school application and joined Hofstadter’s new lab at the University of Michigan in Ann Arbor. The two spent the next six years collaborating closely on Copycat, a computer program which, in the words of its co-creators, was designed to “discover insightful analogies, and to do so in a psychologically realistic way.”
The analogies Copycat came up with were between simple patterns of letters, akin to the analogies on standardized tests. One example: “If the string ‘abc’ changes to the string ‘abd,’ what does the string ‘pqrs’ change to?” Hofstadter and Mitchell believed that understanding the cognitive process of analogy — how human beings make abstract connections between similar ideas, perceptions and experiences — would be crucial to unlocking humanlike artificial intelligence.
Mitchell maintains that analogy can go much deeper than exam-style pattern matching. “It’s understanding the essence of a situation by mapping it to another situation that is already understood,” she said. “If you tell me a story and I say, ‘Oh, the same thing happened to me,’ literally the same thing did not happen to me that happened to you, but I can make a mapping that makes it seem very analogous. It’s something that we humans do all the time without even realizing we’re doing it. We’re swimming in this sea of analogies constantly.”
As the Davis professor of complexity at the Santa Fe Institute, Mitchell has broadened her research beyond machine learning. She’s currently leading SFI’s Foundations of Intelligence in Natural and Artificial Systems project, which will convene a series of interdisciplinary workshops over the next year examining how biological evolution, collective behavior (like that of social insects such as ants) and a physical body all contribute to intelligence. But the role of analogy looms larger than ever in her work, especially in AI — a field whose major advances over the past decade have been largely driven by deep neural networks, a technology that mimics the layered organization of neurons in mammal brains.
“Today’s state-of-the-art neural networks are very good at certain tasks,” she said, “but they’re very bad at taking what they’ve learned in one kind of situation and transferring it to another” — the essence of analogy.
Quanta spoke with Mitchell about how AI can make analogies, what the field has learned about them so far, and where it needs to go next. The interview has been condensed and edited for clarity.
Why is analogy-making so important to AI?
It’s a fundamental mechanism of thought that will help AI get to where we want it to be. Some people say that being able to predict the future is what’s key for AI, or being able to have common sense, or the ability to retrieve memories that are useful in a current situation. But in each of these things, analogy is very central.
For example, we want self-driving cars, but one of the problems is that if they face some situation that’s just slightly distant from what they’ve been trained on they don’t know what to do. How do we humans know what to do in situations we haven’t encountered before? Well, we use analogies to previous experience. And that’s something that we’re going to need these AI systems in the real world to be able to do, too.
But you’ve also written that analogy is “an understudied area in AI.” If it’s so fundamental, why is that the case?
One reason people haven’t studied it as much is because they haven’t recognized its essential importance to cognition. Focusing on logic and programming in the rules for behavior — that’s the way early AI worked. More recently people have focused on learning from lots and lots of examples, and then assuming that you’ll be able to do induction to things you haven’t seen before using just the statistics of what you’ve already learned. They hoped the abilities to generalize and abstract would kind of come out of the statistics, but it hasn’t worked as well as people had hoped.
You can show a deep neural network millions of pictures of bridges, for example, and it can probably recognize a new picture of a bridge over a river or something. But it can never abstract the notion of “bridge” to, say, our concept of bridging the gender gap. These networks, it turns out, don’t learn how to abstract. There’s something missing. And people are only sort of grappling now with that.
And they’ll never learn to abstract?
There are new approaches, like meta-learning, where the machines “learn to learn” better. Or self-supervised learning, where systems like GPT-3 learn to fill in a sentence with one of the words missing, which lets it generate language very, very convincingly. Some people would argue that systems like that will eventually, with enough data, learn to do this abstraction task. But I don’t think so.
You’ve described this limitation as “the barrier of meaning” — AI systems can emulate understanding under certain conditions, but become brittle and unreliable outside of them. Why do you think analogy is our way out of this problem?
My feeling is that solving the brittleness problem will require meaning. That’s what ultimately causes the brittleness problem: These systems don’t understand, in any humanlike sense, the data that they’re dealing with.
This word “understand” is one of these suitcase words that no one agrees what it really means — almost like a placeholder for mental phenomena that we can’t explain yet. But I think this mechanism of abstraction and analogy is key to what we humans call understanding. It is a mechanism by which understanding occurs. We’re able to take something we already know in some way and map it to something new.
So analogy is a way that organisms stay cognitively flexible, instead of behaving like robots?
I think to some extent, yes. Analogy isn’t just something we humans do. Some animals are kind of robotic, but other species are able to take prior experiences and map them onto new experiences. Maybe it’s one way to put a spectrum of intelligence onto different kinds of living systems: To what extent can you make more abstract analogies?
One of the theories of why humans have this particular kind of intelligence is that it’s because we’re so social. One of the most important things for you to do is to model what other people are thinking, understand their goals and predict what they’re going to do. And that’s something you do by analogy to yourself. You can put yourself in the other person’s position and kind of map your own mind onto theirs. This “theory of mind” is something that people in AI talk about all the time. It’s essentially a way of making an analogy.
Your Copycat system was an early attempt at doing this with a computer. Were there others?
“Structure mapping” work in AI focused on logic-based representations of situations and making mappings between them. Ken Forbus and others used the famous analogy [made by Ernest Rutherford in 1911] of the solar system to the atom. They would have a set of sentences [in a formal notation called predicate logic] describing these two situations, and they mapped them not based on the content of the sentences, but based on their structure. This notion is very powerful, and I think it’s right. When humans are trying to make sense of similarities, we’re more focused on relationships than specific objects.
Why didn’t these approaches take off?
The whole issue of learning was largely left out of these systems. Structure mapping would take these words that were very, very laden with human meaning — like “the Earth revolves around the sun” and “the electron revolves around the nucleus” — and map them onto each other, but there was no internal model of what “revolves around” meant. It was just a symbol. Copycat worked well with letter strings, but what we lacked was an answer to the question of how do we scale this up and generalize it to domains that we actually care about?
Deep learning famously scales quite well. Has it been any more effective at producing meaningful analogies?
There’s a view that deep neural networks kind of do this magic in between their input and output layers. If they can be better than humans at recognizing different kinds of dog breeds — which they are — they should be able to do these really simple analogy problems. So people would create one big data set to train and test their neural network on and publish a paper saying, “Our method gets 80% right on this test.” And somebody else would say, “Wait, your data set has some weird statistical properties that allow the machine to learn how to solve them without being able to generalize. Here’s a new data set that your machine does horribly on, but ours does great.” And this goes on and on and on.
The problem is that you’ve already lost the battle if you’re having to train it on thousands and thousands of examples. That’s not what abstraction is all about. It’s all about what people in machine learning call “few-shot learning,” which means you learn on a very small number of examples. That’s what abstraction is really for.
So what is still missing? Why can’t we just stick these approaches together like so many Lego blocks?
We don’t have the instruction book that tells you how to do that! But I do think we have to Lego them all together. That’s at the frontier of this research: What’s the key insight from all of these things, and how can they complement each other?
A lot of people are quite interested in the Abstraction and Reasoning Corpus [ARC], which is a very challenging few-shot learning task built around “core knowledge” that humans are essentially born with. We know that the world should be parsed into objects, and we know something about the geometry of space, like something being over or under something [else]. In ARC, there is one grid of colors that changes into another grid of colors in a way that humans would be able to describe in terms of this core knowledge — like, “All the squares of one color go to the right, all the squares of the other color go to the left.” It gives you an example like this and then asks you to do the same thing to another grid of colors.
I think of it very much as an analogy challenge. You’re trying to find some kind of abstract description of what the change was from one image to a new image, and you cannot learn any weird statistical correlations because all you have is two examples. How to get machines to learn and reason with this core knowledge that a baby has — this is something that none of the systems I’ve mentioned so far can do. This is why none of them can deal with this ARC data set. It’s a little bit of a holy grail.
If babies are born with this “core knowledge,” does that mean that for an AI to make these kinds of analogies, it also needs a body like we have?
That’s the million-dollar question. That’s a very controversial issue that the AI community has no consensus on. My intuition is that yes, we will not be able to get to humanlike analogy [in AI] without some kind of embodiment. Having a body might be essential because some of these visual problems require you to think of them in three dimensions. And that, for me, has to do with having lived in the world and moved my head around, and understood how things are related spatially. I don’t know if a machine has to go through that stage. I think it probably will.