How the DNA Computer Program Makes You and Me
Introduction
One of the miracles of nature is embryogenesis: the transformation of a single fertilized egg cell into an embryo that will eventually become a fully formed baby animal. Various analogies have been applied to this process, from the primitive concept of a blueprint to Richard Dawkins’ cake recipe that calls for genetic ingredients. To my mind, the best analogy comes from Gary Marcus’ 2004 book The Birth of the Mind: How a Tiny Number of Genes Creates the Complexities of Human Thought. According to Marcus, embryogenesis most resembles a genetic computer program that produces a three-dimensional living organism. Marcus states that every gene is like a single line of code. All the genes together form the master DNA program, which is copied and run simultaneously in trillions of cells to achieve this miracle of physicochemical engineering.
The elucidation of the genetic code in the early years of this century was a great achievement, but it gives us limited insight. It is like finding a binary listing of 0s and 1s comprising the “machine language” of a complicated, sophisticated software program, not the “source code” that fosters understanding. In a sense, 0s and 1s are all there is to any computer program, but this binary string offers no obvious indication as to which part is instruction and which is data, or how one step leads to the next. To actually make sense of this kind of code and gain an understanding of how it performs its complex task, software engineers have to translate such a binary string to higher level source code in a difficult process called decompilation. The original hypothesis of gene action was expressed by the phrase “one gene-one enzyme,” which has since been replaced by “one gene-one polypeptide,” and even this is now being recognized as much too simplistic. At a conceptual level, genes, along with their promoters, regulators and inhibitors, perform all the control operations that software programs do. Genes routinely execute conditional “if-then” logic (all genes are activated only if specific conditions are satisfied), “do loops” (certain genes create a specific number of body segments and parts; for example, damage to a gene whimsically called Sonic Hedgehog can produce extra fingers), “timing routines” (genes code for clock proteins), “subroutine calls” (a gene called Pax6 in fruit flies can initiate a gene cascade that recruits over 2,000 specialized genes to build different parts of the eye) and so on. Just as a single line of software code may initiate sweeping changes or be merely incremental, a single top-level gene may initiate the building of an entire body or organ (analogous to a top-level line that invokes a subroutine “MakeEye”) or may merely add a small building block molecule in a tissue (like the line n = n + 1). That is why single genetic mutations can have major repercussions such as organ system deformities, or minor ones such as creating a different blood group. The full decompilation of the genetic code is a task that will probably engage geneticists for many, many years to come.
Of course, unlike the software programs we are used to, the genetic program runs in three-dimensional space, in a water-based medium. It is subject to the laws of physics and chemistry, and therefore constrained by such things as the properties of available molecules and the presence of concentration gradients. Yet our genetic program does amazing things like generating its own building materials programmatically. Moreover, the genetic program copies itself into and runs inside of trillions of cells, with the ability to modify, ignore or emphasize different parts of itself to suit each cell, and using parallel coordination techniques that have only recently been explored in software, such as “swarm programming.” Swarm programming is a paradigm in which a group of cells or robots that contain identical copies of a master program nevertheless behave differently based on their location within the group and on how their neighbors are distributed.
Our puzzles explore a couple of issues in the programming of three-dimensional structures that the genetic program has to address. One such issue, the patterning of animal skins into stripes and spots, was the subject of a mathematical model created by the famous British mathematician Alan Turing. As Jennifer Ouellette reported in Quanta in 2013, Turing “proposed that patterns such as spots form as a result of the interactions between two chemicals that spread throughout a system much like gas atoms in a box do, with one crucial difference. Instead of diffusing evenly like a gas, the chemicals, which Turing called ‘morphogens,’ diffuse at different rates.” The problem that needs to be solved is how to make some cells behave differently from others based on their location in a sheet of cells, even though they all have the exact same code. Turing theorized that this problem could be solved if cells contain substances that are activated or inhibited if they detect a particular concentration of the morphogen: It’s like an “if-then” statement in the cell’s master program. As Ouellette reported, such morphogens have been found for zebra fish stripes, and behave pretty much as Turing predicted. Regarding the zebra’s stripes, Turing is reported to have said to the equally famous British molecular biologist Francis Crick: “Well, the stripes are easy. But what about the horse part?” We are now finding that the entire horse is programmed by similar, albeit far more complex, instructions!
Our first problem explores the kind of issues that need to be solved by DNA programming to create 3-D structures. Note that this example is not based on an actual biological case but is meant to illustrate the general principles of how chemical gradients can be used by the embryo in conjunction with constraints in properties of available molecules.
Problem 1
In this problem you have to figure out the details of a hypothetical scenario in which a growing embryo can initiate the formation of two bony rods in the middle of its body using morphogens. Imagine a rectangular sheet consisting of 101 vertical columns and 200 horizontal rows of identical round cells lined end to end. The cells along the left edge (in column 0) can sense that they are on the edge, and can activate genes to release three different morphogens, A through C, in different concentrations. Each morphogen achieves its highest concentration at the left edge of the sheet, but each diffuses at different rates, so that the concentrations of A, B and C respectively at the right edge are 0.1, 0.2 and 0.4 times their left-edge concentrations, with a uniform gradient in between. Each cell in the sheet is programmed to make three pairs of molecules — one pair for each morphogen. Each pair consists of a “bone initiator” molecule and a “bone suppressor” molecule. These molecules get switched on or off based on the concentration of their particular morphogen as shown in the table below. Thus, morphogen A’s bone initiator becomes active when the concentration of A is at or below 0.64 units/ml, whereas A’s bone suppressor becomes active when A’s concentration falls below 0.46 units/ml. The bone initiators and suppressors related to the other morphogens function similarly, but at different concentrations of their morphogens as shown below. Each bone suppressor, when active, completely blocks the action of its corresponding bone initiator. Bone is laid down when at least two bone initiators are active in a given cell, without being blocked by their corresponding suppressors.
Morphogen | Bone initiator requires | Bone suppressor requires |
A | <= 0.64 units/mL | < 0.46 units/mL |
B | <= 6.8 units/mL | < 6.1 units/mL |
C | <= 2.8 units/mL | < 2.6 units/mL |
What concentrations of the three morphogens at the left edge would make columns 40 to 45 and columns 55 to 60 of the cells lay down bone in response to two active bone initiators, while no other cells lay down bone? Of the several concentrations that can work, which ones might be least prone to development errors in response to small random fluctuations in morphogen concentrations?
Our second problem has to do with the incredible shrinkage of the human Y chromosome. Chromosomes are 23 pairs of rod-like structures present inside the nucleus of every human cell. Genes are strung linearly on these chromosomes. As is well known, human males have one X and one Y chromosome and human females have two X chromosomes. The Y chromosome, which is responsible for maleness in humans, is much smaller than the X chromosome, as it has been losing genes over evolutionary time (note that the genes are still around: They have just migrated to other chromosomes). As Emily Singer reported in Quanta in 2015, the Australian biologist Jennifer Marshall Graves has predicted that the human Y chromosome could disappear in future evolution. This prediction is based on the following fact reported by Singer: “In the last 190 million years, the number of genes on the Y has plummeted from more than 1,000 to roughly 50, a loss of more than 95 percent. The X chromosome, in contrast, still has roughly 1,000 genes.”
The Y chromosome is losing genes because we have only one copy of it. There are, however, two copies of the X chromosomes in females, so any defective gene on one of them can, to use our coding analogy, be “backed up” (compensated or repaired) by the other good one. So there is a survival advantage for genes to be on a chromosome that has two copies.
But there is another point to consider: Why do genes need to be on particular chromosomes in the first place? Our coding analogy suggests a possible reason. When software engineers work on complex programming projects, they find it advantageous to organize their code into modules, such that all the routines responsible for a particular part of the application are in the same module. They can then be easily accessed for specific tasks related to the common function. Presumably, the same advantage is true for the cell. By having all the genes responsible for a particular function such as maleness on the same chromosome, all such genes can be more easily controlled by gene promoters and inhibitors responsible for male characteristics. The genes required for one function are also unlikely to be separated when pairs of chromosomes are separated during the formation of germ cells. This arrangement presumably has a survival advantage as well.
Problem 2
There are two schools of thought about whether or not the Y chromosome will disappear in the distant future. This problem examines the merits of the two arguments.
First, what will be the fate of the Y chromosome if you linearly extrapolate the loss of genes (1,000 to 50 in 190 million years)?
On the other hand, the Y chromosome has lost none or almost no genes in the last 25 million years (different estimates say 0 to 3), leading some scientists to argue that the deterioration of the Y has stopped. Perhaps there is a sizable advantage to keeping all the male-producing genes together in a neat “code module”! Assuming a constant probability for the disappearance of Y-chromosome genes over time, what is the probability that the pause or marked decrease in the loss of just 0 to 3 genes over the last 25 million years is due to chance?
I would love to see comments from readers about the genome computer code analogy and the disappearing Y chromosome. In particular, if the Y chromosome is no longer losing genes, what might have brought this to an end?
Happy puzzling — and speculating.
Editor’s note: The reader who submits the most interesting, creative or insightful solution (as judged by the columnist) in the comments section will receive a Quanta Magazine T-shirt. And if you’d like to suggest a favorite puzzle for a future Insights column, submit it as a comment below, clearly marked “NEW PUZZLE SUGGESTION.” (It will not appear online, so solutions to the puzzle above should be submitted separately.)
Note that we may hold comments for the first day or two to allow for independent contributions by readers.
Correction: Due to a typographical error, this article was revised on April 6, 2018, to reflect that there are 23 human chromosome pairs, not 26.
Update: The solution has been published here.