Solution: ‘The DNA Computer Program’
Introduction
Do genes behave like lines of computer code? Our April puzzle discussed ways in which genes hold true to this analogy: They have control structures commonplace in computer programs, such as “if-then” logic, “do loops,” timing routines and cascading “subroutine calls.” We also listed some ways that DNA programs differ from ordinary computer programs: Genes program three-dimensional structures from scratch in a water-based medium, using massive parallelism and swarm programming while making use of, and being constrained by, the laws of physics and chemistry.
There’s another important way that genes behave differently from computer code and mathematical models generated by humans. In a nutshell, biology is extremely complicated and messy. Unlike a well-documented and logically organized computer program that a professional software engineer would write, evolutionary algorithms were not written with human understanding in mind. DNA logic, which can seem like the original “spaghetti code,” is exceedingly difficult to follow in its details. This difference between human-generated and evolution-generated code also holds for mathematical models applied to biology. Unlike the smooth, elegant models that work in arguably simpler sciences like physics, sudden discrete changes and chaotic nonlinear functions in biological situations are very difficult to predict mathematically. While we can understand and appreciate general principles in these problems, we can’t escape the complications that lie just below the surface.
Problem 1
In this problem you have to figure out the details of a hypothetical scenario in which a growing embryo can initiate the formation of two bony rods in the middle of its body using morphogens. Imagine a rectangular sheet consisting of 101 vertical columns and 200 horizontal rows of identical round cells lined end to end. The cells along the left edge (in column 0) can sense that they are on the edge, and can activate genes to release three different morphogens, A through C, in different concentrations. Each morphogen achieves its highest concentration at the left edge of the sheet, but each diffuses at different rates, so that the concentrations of A, B and C respectively at the right edge are 0.1, 0.2 and 0.4 times their left-edge concentrations, with a uniform gradient in between. Each cell in the sheet is programmed to make three pairs of molecules — one pair for each morphogen. Each pair consists of a “bone initiator” molecule and a “bone suppressor” molecule. These molecules get switched on or off based on the concentration of their particular morphogen as shown in the table below. Thus, morphogen A’s bone initiator becomes active when the concentration of A is at or below 0.64 units/ml, whereas A’s bone suppressor becomes active when A’s concentration falls below 0.46 units/ml. The bone initiators and suppressors related to the other morphogens function similarly, but at different concentrations of their morphogens as shown below. Each bone suppressor, when active, completely blocks the action of its corresponding bone initiator. Bone is laid down when at least two bone initiators are active in a given cell, without being blocked by their corresponding suppressors.
Morphogen | Bone initiator requires | Bone suppressor requires |
A | <= 0.64 units/mL | < 0.46 units/mL |
B | <= 6.8 units/mL | < 6.1 units/mL |
C | <= 2.8 units/mL | < 2.6 units/mL |
What concentrations of the three morphogens at the left edge would make columns 40 to 45 and columns 55 to 60 of the cells lay down bone in response to two active bone initiators, while no other cells lay down bone? Of the several concentrations that can work, which ones might be least prone to development errors in response to small random fluctuations in morphogen concentrations?
To allow only the specified cells to have two active unsuppressed bone initiators, we need the following things to be true.
i. One of the morphogens must reach its bone initiator threshold at the 40th column and must reach its bone suppressor threshold at the 61st column.
ii. The other two morphogens must behave as follows. One of them must activate its initiator sometime before the 40th column and activate its suppressor exactly at the 46th column; the other must activate its initiator exactly at the 55th column without activating its suppressor until after the 60th column.
Condition (i) is quite stringent, and only morphogen A can meet it. The difference between its bone initiator and suppressor threshold of 0.18 units/mL over 20 columns is exactly what is required for the right edge concentration to become 0.1 times the left edge concentration as specified. This gives us a consistency check that morphogen A passes, but morphogens B and C do not.
Condition (ii) is more flexible, and both morphogens B and C can meet it. Thus, we can use morphogen B for the left bone and morphogen C for the right as Alexandre did, or the other way around as Ed did. You can solve for the left-edge concentrations using the equation given by Alexandre, or use a spreadsheet as Ed did. Each of these two choices works for a range of left-edge concentrations of the two morphogens. For example, in the first case, the lowest left-edge concentration of B that works is 9.532. The concentration dips below 6.8 (the initiator threshold) well before the 40th column, stays just slightly above 6.1 (the suppressor threshold) at column 45, and dips comfortably below it at column 46. On the other hand, the highest possible left-edge concentration that works is 9.651. The concentration gradient behaves similarly, but this time just barely dips below the suppressor threshold at column 46.
The left-edge concentration ranges that work for the two cases are as follows:
A = 1 unit/mL, B = 9.532 to 9.651 units/mL, C = 4.142 to 4.179 units/mL
A = 1 unit/mL, B = 11.972 to 12.142 units/mL, C = 3.562 to 3.591 units/mL
The middle of the range concentrations, 9.59 and 4.16 for B and C in the first case, and 12.06 and 3.58 in the second, are the ones that would be least prone to developmental errors caused by random fluctuations in morphogen concentrations, because they provide a “cushion” of a few fractions of a unit on both sides at the transition columns, 45 to 46 and 54 to 55. At the extreme ends of the above ranges, random fluctuations could fail to initiate bone formation at the right column or spillover to adjacent columns making the bones thicker or thinner than they need to be. In this connection, the figures I provided for morphogen A to make things simple are much too tight to be biologically plausible. More realistic concentrations in the first line of the table should have been <= 0.645 and < 0.455.
Note that the above example uses morphogen concentrations to produce exact patterns despite small random fluctuations, whereas the Turing mechanism that seems to be operative in producing zebra fish stripes magnifies such fluctuations to produce unique, random patterns. I suspect that the former mechanism operates when an exact, symmetric pattern is desired, such as the complicated eyespots on butterfly wings, whereas the latter mechanism is used in unique, nonsymmetrical patterns such as human fingerprints.
Our second problem was based on a 2015 Quanta article about the biologist Jennifer Marshall Graves’ prediction that the human Y chromosome could disappear in future evolution. This prediction is based on the following information in the article: “In the last 190 million years, the number of genes on the Y has plummeted from more than 1,000 to roughly 50, a loss of more than 95 percent. The X chromosome, in contrast, still has roughly 1,000 genes.”
Problem 2
There are two schools of thought about whether or not the Y chromosome will disappear in the distant future. This problem examines the merits of the two arguments.
First, what will be the fate of the Y chromosome if you linearly extrapolate the loss of genes (1,000 to 50 in 190 million years)?
On the other hand, the Y chromosome has lost none or almost no genes in the last 25 million years (different estimates say 0 to 3), leading some scientists to argue that the deterioration of the Y has stopped. Perhaps there is a sizable advantage to keeping all the male-producing genes together in a neat “code module”! Assuming a constant probability for the disappearance of Y-chromosome genes over time, what is the probability that the pause or marked decrease in the loss of just 0 to 3 genes over the last 25 million years is due to chance?
As Alexandre commented, the Y chromosome’s rate of loss of genes is 950/190 = 5 genes every million years. If we do a linear extrapolation at this rate, the remaining 50 genes should disappear in just 10 million years. The fact that only 0 to 3 genes have been lost in the last 25 million years means that the loss of genes on the Y chromosome has indeed slowed dramatically. You would expect 1 gene to be lost every 0.2 million years, which would allow as many as 125 genes to be lost in 25 million years. In order to estimate how unlikely it is to have lost 0 to 3 genes, we can use the Poisson distribution formula Alexandre provided, setting the expected value (l) to 125 genes in the formula to obtain a probability of 1.7 x 10-49. Alternatively, you can use an online Poisson distribution calculator with the upper limit of genes lost (x) = 3 and expected number lost (λ) = 125 to get this answer. This probability is extremely tiny — indeed, it is 0 for all practical purposes. The linear model of Y-chromosome gene loss is obviously wrong.
Should we try a different mathematical model, such as the exponential one that Alexandre tried (and which gave a similar result)?
Not really — as I said, biology is far too complicated and messy. The idealized mathematical distributions we are familiar with simply do not model the real situation here. What actually happens to cause gene loss is based on rare, discrete, low probability events that have to be fixed in the entire population. On very rare occasions, chunks of DNA of different sizes break off and get stuck on other chromosomes (a “translocation”). This has to happen just right, so that the change does not disrupt the cell’s very complex machinery (a “balanced” translocation). While this is extremely unlikely, it can certainly happen — even for regions as large as entire chromosomes. There was a report in fact about a seemingly normal man in China with only 22 pairs of chromosomes instead of the usual 23. One of his chromosomes (not the Y) got stuck to another. But even such a rare event is not enough by itself — the change has to be advantageous enough to spread to the rest of the population, which is a very unpredictable process. Again, these things can happen — in fact, it has happened at least once within the last 7 million years in human evolution for a different chromosome. We know this because our closest primate relatives have 24 pairs of chromosomes (one pair more than we do). A very complex mathematical model could, in theory, be created to describe this translocation process and its probability of success and fixation, but in practice it wouldn’t work. It would need much greater quantitative understanding of extremely low probability processes.
So much for mathematical modeling. Perhaps more relevant, the primate Y chromosome has evolved sophisticated mechanisms using palindromic regions to repair defective regions and therefore keep backup copies of key genes — the lack of which is one of the reasons the Y is supposed to have lost genes in the first place. This may be another reason why gene loss has slowed or stopped, in addition to the fact that genes for maleness all need to be together, and translocations of small numbers of genes away from the Y can no longer be successful.
So it seems the Y chromosome may be safe. Even if it is not, and all its genes get translocated en masse to another chromosome in a single unlikely event, its genes will continue to make males. If they do not, this change will not survive. There are indeed many other ways to create males in nature even in the absence of a Y chromosome, as Emily Singer has reported in Quanta. The Y chromosome method happens to be what our primate lineage ended up with.
As we’ve seen, although the programming analogy has merit, the biological details are far more complicated than we would predict from a logical programming perspective (such as the one Alan described). Nature’s programs are not designed to be easy for humans to understand. Processes like evolutionary algorithms and artificial life have taught us that programs designed by evolution are extremely complicated and sometimes seem to defy logic, but they work. The decompilation of the human genome that I alluded to in the original puzzle column will not be easy.
Thank you to all who contributed. The Quanta T-shirt for this puzzle goes to Alexandre. Congratulations!