Q&A

Does AI Know What an Apple Is? She Aims to Find Out.

The computer scientist Ellie Pavlick is translating philosophical concepts such as “meaning” into concrete, testable ideas.
Ellie Pavlick in a blue scarf stands on a stairwell next to a shiny machine

Ellie Pavlick stands in Brown University’s Computer History Museum. Her work on how large language models understand concepts often merges philosophy with science.

Adam Wasilewski for Quanta Magazine

Introduction

Start talking to Ellie Pavlick about her work — looking for evidence of understanding within large language models (LLMs) — and she might sound as if she’s poking fun at it. The phrase “hand-wavy” is a favorite, and if she mentions “meaning” or “reasoning,” it’ll often come with conspicuous air quotes. This is just Pavlick’s way of keeping herself honest. As a computer scientist studying language models at Brown University and Google DeepMind, she knows that embracing natural language’s inherent mushiness is the only way to take it seriously. “This is a scientific discipline — and it’s a little squishy,” she said.

Precision and nuance have coexisted in Pavlick’s world since adolescence, when she enjoyed math and science “but always identified as more of a creative type.” As an undergraduate, she earned degrees in economics and saxophone performance before pursuing a doctorate in computer science, a field where she still feels like an outsider. “There are a lot of people who [think] intelligent systems will look a lot like computer code: neat and conveniently like a lot of systems [we’re] good at understanding,” she said. “I just believe the answers are complicated. If I have a solution that’s simple, I’m pretty sure it’s wrong. And I don’t want to be wrong.”

A chance encounter with a computer scientist who happened to work in natural language processing led Pavlick to embark on her doctoral work studying how computers could encode semantics, or meaning in language. “I think it scratched a certain itch,” she said. “It dips into philosophy, and that fits with a lot of the things I’m currently working on.” Now, one of Pavlick’s primary areas of research focuses on “grounding” — the question of whether the meaning of words depends on things that exist independently of language itself, such as sensory perceptions, social interactions, or even other thoughts. Language models are trained entirely on text, so they provide a fruitful platform for exploring how grounding matters to meaning. But the question itself has preoccupied linguists and other thinkers for decades.

“These are not only ‘technical’ problems,” Pavlick said. “Language is so huge that, to me, it feels like it encompasses everything.”

Quanta spoke with Pavlick about making science out of philosophy, what “meaning” means, and the importance of unsexy results. The interview has been condensed and edited for clarity.

By clicking to watch this video, you agree to our privacy policy.

Video: Brown University computer scientist Ellie Pavlick is translating philosophical concepts such as “understanding” and “meaning” into concrete ideas that are testable on LLMs.

Emily Buder/Quanta Magazine

What does “understanding” or “meaning” mean, empirically? What, specifically, do you look for?

When I was starting my research program at Brown, we decided that meaning involves concepts in some way. I realize this is a theoretical commitment that not everyone makes, but it seems intuitive. If you use the word “apple” to mean apple, you need the concept of an apple. That has to be a thing, whether or not you use the word to refer to it. That’s what it means to “have meaning”: there needs to be the concept, something you’re verbalizing.

I want to find concepts in the model. I want something that I can grab within the neural network, evidence that there is a thing that represents “apple” internally, that allows it to be consistently referred to by the same word. Because there does seem to be this internal structure that’s not random and arbitrary. You can find these little nuggets of well-defined function that reliably do something.

I’ve been focusing on characterizing this internal structure. What form does it have? It can be some subset of the weights within the neural network, or some kind of linear algebraic operation over those weights, some kind of geometric abstraction. But it has to play a causal role [in the model’s behavior]: It’s connected to these inputs but not those, and these outputs and not those.

That feels like something you could start to call “meaning.” It’s about figuring out how to find this structure and establish relationships, so that once we get it all in place, then we can apply it to questions like “Does it know what ‘apple’ means?”

Have you found any examples of this structure?

Yes, one result involves when a language model retrieves a piece of information. If you ask the model, “What is the capital of France,” it needs to say “Paris,” and “What is the capital of Poland” should return “Warsaw.” It very readily could just memorize all these answers, and they could be scattered all around [within the model] — there’s no real reason it needs to have a connection between those things.

Instead, we found a small place in the model where it basically boils that connection down into one little vector. If you add it to “What is the capital of France,” it will retrieve “Paris”; and that same vector, if you ask “What is the capital of Poland,” will retrieve “Warsaw.” It’s like this systematic “retrieve-capital-city” vector.

That’s a really exciting finding because it seems like [the model is] boiling down these little concepts and then applying general algorithms over them. And even though we’re looking at these really [simple] questions, it’s about finding evidence of these raw ingredients that the model is using. In this case, it would be easier to get away with memorizing — in many ways, that’s what these networks are designed to do. Instead, it breaks [information] down into pieces and “reasons” about it. And we hope that as we come up with better experimental designs, we might find something similar for more complicated kinds of concepts.

Ellie Pavlick sits on the edge of a bench playing a saxophone in a grassy square

Before her doctorate in computer science, Pavlick double majored in economics and saxophone performance.

Adam Wasilewski for Quanta Magazine

How does grounding relate to these representations?

The way humans learn language is grounded in a ton of nonlinguistic input: your bodily sensations, your emotions, whether you’re hungry, whatever. That’s considered to be really important to meaning.

But there are other notions of grounding which have more to do with internal representations. There are words that aren’t obviously connected to the physical world, yet they still have meaning. A word like “democracy” is a favorite example. It’s a thing in your head: I can think about democracy without talking about it. So the grounding could be from language to that thing, that internal representation.

But you argue that even things that are more external, like color, might still be anchored to internal “conceptual” representations, without relying on perceptions. How would that work?

Well, a language model doesn’t have eyes, right? It doesn’t “know” anything about colors. So maybe [it captures] something more general, like understanding the relationships between them. I know that when I combine blue and red, I get purple; those kinds of relations could define this internal [grounding] structure.

We can give examples of color to an LLM using RGB codes [strings of numbers that represent colors]. If you say “OK, here’s red,” and give it the RGB code for red, and “Here’s blue,” with the RGB code for blue, and then say “Tell me what purple is,” it should generate the RGB code for purple. This mapping should be a good indication that the internal structure the model has is sound — it’s missing the percepts [for color], but the conceptual structure’s there.

What’s tricky is that [the model] could just memorize RGB codes, which are all over its training data. So we “rotated” all the colors [away from their real RGB values]: We’d tell the LLM that the word “yellow” was associated with the RGB code for green, and so on. The model performed well: When you asked for green, it would give you the rotated version of the RGB code. That suggests that there is some kind of consistency to its internal representations for color. It’s applying knowledge of their relations, not just memorizing.

That’s the whole point of grounding. Mapping a name onto a color is arbitrary. It’s more about the relationships between them. So that was exciting.

Ellie Pavlick in a black shirt sits at a laptop deep in thought

Pavlick’s work often deals with the question of “grounding,” how a word’s meaning can depend on concepts from outside language.

Adam Wasilewski for Quanta Magazine

How can these philosophical-sounding questions be scientific?

I recently learned of a thought experiment: What if the ocean swept up onto the sand and [when it] pulled back, the patterns generated a poem? Does the poem have meaning? That seems super abstract, and you can have this long philosophical debate.

The nice thing about language models is we don’t need a thought experiment. It’s not like, “In theory, would such and such a thing be intelligent?” It’s just: Is this thing intelligent? It becomes scientific and empirical.

Sometimes people are dismissive; there’s the “stochastic parrots” approach. I think it [comes from] a fear that people are going to oversubscribe intelligence to these things — which we do see. And to correct for that, people are like, “No, it’s all a sham. This is smoke and mirrors.”

It’s a bit of a disservice. We’ve hit on something quite exciting and quite new, and it’s worth understanding it deeply. That’s a huge opportunity that shouldn’t get skirted over because we’re worried about over-interpreting the models.

Of course, youve also produced research debunking exactly that kind of over-interpretation.

That work, where people were finding all the “shallow heuristics” that models were exploiting [to mimic understanding] — those were very foundational to my coming-of-age as a scientist. But it’s complicated. It’s like, don’t declare victory too soon. There’s a bit of skepticism or paranoia [in me] that an evaluation was done right, even one that I know I designed very carefully!

So that’s part of it: not over-claiming. Another part is that, if you deal with these [language model] systems, you know that they’re not human-level — the way that they’re solving things is not as intelligent as it seems.

Ellie Pavlick and four students sit and talk within a lounge with white walls

Pavlick meets with students at Brown University.

Adam Wasilewski for Quanta Magazine

When so many of the basic methods and terms are up for debate in this field, how do you even measure success?

What I think we’re looking for, as scientists, is a precise, human-understandable description of what we care about — intelligence, in this case. And then we attach words to help us get there. We need some kind of working vocabulary.

But that’s hard, because then you can get into this battle of semantics. When people say “Does it have meaning: yes or no?” I don’t know. We’re routing the conversation to the wrong thing.

What I’m trying to offer is a precise account of the behaviors we cared about explaining. And it is kind of moot at that point whether you want to call it “meaning” or “representation” or any of these loaded words. The point is, there’s a theory or a proposed model on the table — let’s evaluate that.

Ellie Pavlick in a black shirt sits at a desk with a laptop in front of a shelf full of books

“I want to find concepts in the model,” Pavlick said. “I want something that I can grab within the neural network, evidence that there is a thing that represents ‘apple’ internally.”

Adam Wasilewski for Quanta Magazine

So how can research on language models move toward that more direct approach?

The kinds of deep questions I would really like to be able to answer — What are the building blocks of intelligence? What does human intelligence look like? What does model intelligence look like? — are really important. But I think the stuff that needs to happen for the next 10 years is not very sexy.

If we want to deal with these [internal] representations, we need methods for finding them — methods that are scientifically sound. If it’s done in the right way, this low-level, super in-the-weeds methodological stuff won’t deliver headlines. But that’s the really important stuff that will allow us to answer these deep questions correctly.

Meanwhile, the models are going to keep changing. So there’s going to be a lot of stuff that people will keep publishing as though it’s “the breakthrough,” but it’s probably not. In my mind, it feels too soon to get big breakthroughs.

People are studying these really simple tasks, like asking [a language model to complete] “John gave a drink to _______,” and trying to see whether it says “John” or “Mary.” That doesn’t have the feeling of a result that explains intelligence. But I do actually believe that the tools we’re using to describe this boring-ass problem are essential for answering the deep questions about intelligence.

Comment on this article