Rafiki Home Rafiki Store Learn about Code World Explore geometry

<< Back

Seeing Codons

It is a challenge to see something abstract. How does one ‘see’ information? This is the fundamental task in shifting our paradigm of genetic information, so I have spent a good deal of effort looking for alternative ways that codons can be ‘seen’. This is part art and part science, and I have come up with dozens of different strategies for visualizing the information contained in a codon. This lengthy page will be a general tour and review of the thought processes and the visual representations they have spawned.

Download a pdf file of the material (more or less) on this page seeing5.pdf (1.6 meg)

Efforts to visualize the data of the genetic code have convinced me of the following:

1. Historically, we have not had a functional way to ‘see’ genetic information.

2. There must be an objective context within which we can begin mapping the components of a code, and thereby create the essential context of component relationships.

3. Symmetry plays a major role in seeing genetic information correctly.

4. Cyclic permutations of nucleotide triplets are the basic informative structure within the molecular system of protein synthesis. The Rafiki map is useful because it unifies all visualization techniques through a mathematically unbiased contextual network of cyclic permutations.

To begin a process of visualizing information, consider a simple street map of your hometown. It is laid out on flat paper, but our razor-sharp human intelligence quickly understands that the shapes on the paper translate into a three-dimensional town of much greater size. ‘Mapping’ is a process of correlating two versions of informative reality. It is a form of code.

It is only slightly tougher to imagine that the orthogonal, two-dimensional information on a map is merely a subset of a vastly larger set of information on the surface of a sphere. Our street map is but a tiny window into a surface that wraps around a gigantic sphere – the earth. The fact that this information is scaled down, cut into a square and flattened onto a plane is of no concern whatsoever. The useful information the map contains is hardly affected by such transformations. However, the same cannot be said of the genetic code and protein synthesis. Transformations here are all important, if only to recognize that they have occurred in our textbooks and our thinking.

Like symbols for roads, buildings and lakes on the map of your hometown, we will need symbols to construct a map of the genetic code. We will start with the following three typographic symbols:

A = Nucleic Acid – set member
B = Assignment - set
C = Codon – set permutation

Codons are permutations of three nucleic acids. They are specific configurations of a set, which in this case has three members (triplets, or triads). Permutations are actual arrangements of symbols. For instance, the sequence of symbols XYZ is a permutation of the symbols X, Y and Z. Other permutations of the same symbols are YZX, or ZYX. Taking the number of possible symbols, in this case 4 nucleic acids, and raising it to a power of the number of symbols in the permutation, in this case 3, determine the total set of permutations. Therefore, there are sixty-four codons (43 = 4 X 4 X 4 = 64).

An assignment is a set of symbols that will be tied together somehow in an informative mapping. Nature made assignments of sets of molecules with other sets of molecules in making the genetic code. The ‘meaning’ of a code is derived from its assignments. The classic paradigm of the genetic code is called linear and one-dimensional because it is seen as an unambiguous one-to-one mapping of components with no further need for context.

In the linear model, there are an equal number of codons and assignments, so there seem to be no ambiguous assignments. All nucleotides and the codons they form are assigned once and only once. There are, however, only twenty amino acids in the standard set, so looking at the process from the perspective of amino acids, there appear to be only 20 required assignments, forty-four fewer than codons available.

Most curious, n’est pas?

This means that there is redundancy - more vulgar, degeneracy. The term degeneracy sprung from mathematics and engineering. It is a technical term to describe a formula that has more than one valid solution. More than one codon is usually assigned to each amino acid, so most assignments appear degenerate. The only two non-degenerate assignments of amino acids are Methionine and Tryptophan. The rest are total degenerates.

From this standpoint the linear model appears quite wasteful. It is, as I have been trying to tell you, a degenerate model. The genetic code has the capacity to carry more information in the form of more amino acids. Why does it not aggressively take advantage of this opportunity somewhere, in some organism?

There has been a lot of speculation about the “meaning” or the use for this redundancy, but we are far from a consensus agreement about it, and using 64 codons to only assign 20 amino acids is an undeniably wasteful information system. Is there something we are missing?

Regardless, if we begin to construct a map of the genetic code based on the tenets of the linear model, we know the exact structure and ‘shape’ of the map with which we must start – a line. This is the proper context for ‘seeing’ the genetic code from the perspective of the classic paradigm. Using the symbols A, B and C as described above creates the following brutish physical appearance of the linear model.

The breadth of symbols discourages their presentation in an actual linear format - otherwise they would be illegibly small. One must use the scissors of imagination to carve this page into a series of strips that are attached end-to-end, creating a single genetic code tape. Only then can we appreciate the literal shape of the linear code.

A two-dimensional grid is a more familiar presentation of assignments. I, for one, have never actually seen the linear model presented as a line. This is probably because it is not particularly useful, except to demonstrate how useless it is. However, for the sake of rigor I will do just such a thing here. (below is the tape of the above segments placed end-to-end.)



See how useless it is?

Of course the use of this format is explicit from the model. A linear code contains only one-dimension of information in assignments, and finding additional meaning in those assignments is therefore verboten; otherwise they are no longer one-dimensional. We are explicitly told to view the assignments as “one-dimensional, arbitrary and meaningless”. Any pattern that might appear in the tape must be rejected as a chimera. For if it is linear and arbitrary then it can be broken apart, reassembled and presented in any alternate fashion. There is absolutely no criterion on which to judge the correctness of one linear presentation of assignments over another; therefore, none can be “best”. The above tape could just as easily be rearranged and start with the following sequence, so long as all the numbers add up at the end.

The number of potential arrangements is extraordinarily large, and the basic problem is that we need to find a way to order, or weight the information so that we can find a pragmatic structure into which we can place and view it. This structure, whatever it is, will provide the context for ‘seeing the information’ as we study the code and its functions.

Science has chosen a grid, and it is most curious that the conventional two-dimensional codon table has somehow drawn a pass on the constraint of linearity and arbitrariness. It is so common to see the genetic code presented in a grid that the rigorous academic requirement for a linear disclaimer has somehow evolved out of the system. I never see the assignments accompanied by the warning: This table is manmade - do not recognize patterns.

Pattern recognition in the genetic code data grid should be dogmatically forbidden fruit. After all, any pattern in the assignment of amino acids to codons must be rejected as meaningless and arbitrary. Certainly no good scientist would take a pattern recognized in this correlation table and use it as the basis of further work. That is tacit approval of a non-linear model. After all, a grid is in fact a non-linear model. A pattern in the grid would be a sign that there is more than one dimension of meaning in the assignments of codons to amino acids. This would be a second degree of freedom in the mechanism that assigned amino acids within the code. It would be a sign of at least a second dimension of information, and perhaps more.

Assignments either have one dimension of meaning or they don’t. Which is it? If there is truly only one dimension of information in the code then patterns in the data are worthless. We know that there are patterns in the code, and they are not worthless, so how many levels of assignment, degrees of freedom or dimensions of meaning are in the code?

I am going to start with an excellent textbook presentation of the genetic correlation table and make it more excellent. I know that the following table is an excellent textbook presentation because I copped it from an excellent textbook. I will start here and apply my skills acquired in a wasted youth as a graphic artist. The premise of the exercise is to imagine how we ‘see’ the genetic code as it is presently taught, and then explore various ways to see it differently.

This two-dimensional grid clarifies yet confuses. I will tease out some of the assumptions and information that are contained in it, starting with the mathematical tape that is the linear model. The tape shows a need for 192 nucleic acids (64 codons X 3 nucleic acids per codon). A quick count of the above nucleic acids shows me only 24. Where are the other 168 nucleic acids?

I am not a total idiot - I am just being coy. I know that they are stacked on top of each other. It would be a grievous no-no to suggest that we could have somehow jettisoned 168 required nucleic acids from the sacred code. This would imply double duty for at least some nucleic acids, but these ‘identical’ molecules would be doing their duties in an imbalanced way, so then we must determine which nucleotides in a codon would be assigned to which duties.

These apparent missing nucleotides are merely illusions created by the conveniences of our written notation - a sacrifice to typographic efficiency. In fact, the table is nothing but a convenient presentation of data. It was more or less accidentally presented this way, and since we find it somehow useful, it has become frozen in our system of ‘seeing the code’.

Our use of this table is a rare case of an actual frozen accident in the genetic code! But it was our conscious decision to make it such, not a natural process that requires it.

Another feature of the above table is the obvious grouping of assignments. It seems that there is some mystical force in our arbitrary universe that is compelling assignments to gravitate to regions of an arbitrary (and forbidden) space. Assignments of amino acids appear to be clumping within sections of the table. A man of lesser discipline would be tempted to succumb to the temptation of inferring meaning from these groupings, which would amount to a second dimension of meaning in a one-dimensional process.

Fortunately, I am a man of lesser discipline, and I’m not only willing to assign meaning, I’m willing to make more groupings. (See how much fun science can be if you’re willing to ignore dogma.)

Let’s start by rearranging the above grid into another grid. Like potato chips, you can never have just one. The grid shows obvious relationships between codons and amino acids, but what are the relationships between amino acids, nucleic acids and their codons? In fact, it would be nice to know the inter and intra-relationships between all components of the system.

Starting with amino acids, the most likely relationship between them will somehow involve water, because water is so much a part of living systems. How well a molecule interacts with water is called its water affinity, and it can be measured in a number of ways. I created my own stylized grid, and since color is king, let’s add some color. In my world of visualizing abstract notions, color is a better choice for symbols than are typographic glyphs, like A, B and C. Now we can really start to see some patterns.



Just as the street map is a tiny window into a sphere, this table is a tiny window into another informative surface. From this table we can begin to develop the necessary symbols to illuminate that surface. In other words, we can begin doing some real work with patterns now by saying it with color, saying it strictly with color.

Here is the tape, or table in living color. And while we’re at it, we might as well take advantage of all the physical dimensions that God has given us. Two dimensions are nice, but three are better.

Before we lose our alphabet entirely, and with it English, becoming mired in the insanity of Rafiki-speak, let’s bring back an old friend, the textbook table to combine what it shows us with what we have just created. This is exciting, isn’t it?

We are finally in a position to go looking for those 168 missing nucleic acids.

There they are, all 192 of them, but why are they all treated differently? Why is the third position chopped up into 4 stacks?

DUH! The page only has two dimensions, Homer, and we need three… But we now have three!

Look at all the pretty colors - they must mean something. How could a one-dimensional, arbitrary and meaningless process produce such a beautiful three-dimensional pattern? Remember, those colors have meaning to those amino acids - they represent relative water affinity. It’s a shame that we’re technically not allowed to use it for anything. Things just get curiouser and curiouser all the time.

What is a pattern anyway? In this case a pattern is an arrangement of symbols and colors. Through combinations amongst and between each group of symbols a total pattern emerges. This pattern turns on the concept of neighbor, but what is a neighbor? For our study of the codon table, a neighbor is technically a next-to, as in this is next-to that. In the above pattern there are hundreds of different types of actual and potential next-to’s. There are colors next-to each other in a hierarchy of color. There are shapes next-to each other. There are categories of shapes next-to each other and next-to other categories. There are rows, columns and layers of next-to’s. But the one-dimensional dogma destroys the concept of next-to, doesn’t it? I mean anything next-to anything else is just random. There is no real next-to in the genetic code, is there? All of the next-to’s above were created by mathematical weighting, or observer bias. There should be no meaning to the pretty patterns that we have just created, unless there is meaning in the observer bias upon which we’ve stumbled.

Demons exist.

Only malicious demons would create this beautiful collection of accidental, meaningless next-to’s to tempt us into forbidden territory. We won’t go… we…can’t… screw it.

Let’s go - we can make better next-to’s than this.

The problem with the above next-to’s is precisely that they are biased by us, the observer, and these biases are clearly unequal. It is politically incorrect to allow unequal anything, let alone unequal next-to’s. Notice that corner amino acid assignments in the pattern have only three next-to’s, edges and sides have four, and middles have an embarrassingly capitalistic six next-to’s. I will fight for the corners, edges and sides, and tax the middles. Sorry middles, get over it.

This brings up a very interesting problem: what determines the next-to’s? Take the case of amino acids, what determines which amino acid falls on a corner, edge or middle, and therefore determines quantity and type of an amino acid’s next-to? Unfortunately - or fortunately - it is another set of next-to’s. Specifically, it is the nucleotide next-to’s. The specter of recursion looms large, but fear not, fearless traveler.

This grid - the textbook table - explicitly demands that nucleotides must be next to other nucleotides. This is the symbolic shorthand that allows us to generate permutations. By taking one symbol from the left we are given a choice of four symbols from the top, which in turn opens up another choice of four from the sixteen symbols on the right. There is a channeling process at work here, but it is never addressed by the dogma. It is actually a thinly veiled systematic, mathematical weighting of codons.

By picking any nucleotide we are limiting our choices of nucleotide next-to’s. The nucleotides in the grid are differentiated and stratified according to position in the codon. For this textbook presentation, we only need four literal symbols for each of the first two positions, but we need sixteen for the third position. Ultimately, this channeling process will allow us to arrange the entire genetic code along the linear genetic map. In other words, weighting data in this way means that there is a first, second, third… and last codon in the table. This table actually weights codons from first to last.

Trivial?

Not from the standpoint of the presentation and the pattern it demonstrates. If patterns are the goal, it is far from trivial, because we have no way to know if a pattern is natural or man-made. Changing the weighting will create an entirely different pattern. Furthermore, it appears that the 2nd position dominates the color assignments. What’s up there? I suppose we might tiptoe past this graveyard with the caveat that the channeling is an illusion, but let’s have a closer look at the weighting process before we decide.

The table is merely a specific instance of the tape in the linear model of the genetic code. Somehow the codons are assigned values around which the table is arranged. It turns out that the source of the values in this table is a simple formula that requires two biases. The first bias is a weighting of the nucleotides, which we will call the nucleotide values. The second bias is a weighting of the positions of the nucleotides in the codon, which we will call the position value. The assigned value of the codon then becomes the sum of the nucleotide values times their position values, which can be written as follows:

We can easily see that this formula will produce the textbook grid from the genetic tape. If we recreate the grid and display all of the values we will finally see the bias at work.

But since there is no meaning to the assignments, we are free to reassign these weights however we like. It is easy to change the position values and produce an entirely different grid.

There are still some patterns here, but they have taken a considerable hit, and this is just one of twenty-four permutations that can use these arbitrary values. Furthermore, there is a very bold unspoken assumption in these tables, namely that position is more important than identity when considering the value of a nucleotide in a codon. But since this table business is meaningless, there is absolutely nothing stopping us from coming up with any values that strike our fancy. Our one-dimensional system allows for no second dimension on which to evaluate any weighting scheme. In fact, if we take the linear crowd at their word, the most appropriate arrangement would be one that is completely arbitrary, such as this:

Now the patterns have disappeared completely. Of course this is what we expect from an arbitrary process - arbitrary results. However, if we don’t find arbitrary results, how can we conclude it was an arbitrary process? Any grid that we contrive will have contact with this non-linear heresy, so we cannot prohibit something – multi-dimensional pattern recognition - ignore the prohibition, and then fail to examine the unspoken assumptions inherent in the process. Clearly the nucleotide next-to’ness is influencing the table and somehow forming pretty patterns in the assignments. Furthermore, good scientists - ironically some of the same ones that forbade pattern recognition by insisting on a linear code - have used these patterns to form conventional theory, specifically the theory of wobble. The wobble hypothesis is nothing but a pattern recognition theory and therefore contradicts the premise of a one-dimensional model. The implication is that “wobble” somehow acted as a “force” in shaping the genetic code. This is necessarily a second dimension in an otherwise one-dimensional process. Each codon in the wobble model has amino acids and wobble partners assigned to it. It is the wobble partner assignment that theoretically shows up in the two-dimensional grid of data in the form of amino acid assignment clumping.

Wrong.

Our task at hand then is to remove weighting from the presentation of the data, so any pattern popping out must be natural instead of manmade. How can we un-weight a table? We can’t completely, but we can start with some rules.

Rule: All nucleic acids will be treated equally.

This must be so if weighting is to be removed, and there are important consequences of this rule. There was never a contrary rule to my knowledge, but somehow that’s how it worked itself out in the grid. I’m not sure anyone ever cared enough about unequal treatment of nucleic acids to even notice. Nonetheless, if all nucleotides are equal then we don’t need whole stacks of 16 “equal” nucleotides. All we need is one from each stack, and we can discard the rest. That leaves us with twelve.

Each of the 192 nucleic acids on the tape is a symbol that can have one of four values. Now there are only twelve symbols, and they must account for at least twenty 20 assignments. If we spend two symbols per assignment we end up with only 16 codons (42) but this clearly is not enough codons. If we decide to spend 3 symbols per assignment we generate a wasteful 64 codons. Since we had unlimited nucleic acids in the past we took that bargain. The times they are a changin’. What if the bargain was without waste instead of wasteful - how would it work?

Start by assuming that nucleotide assignment is precious. We now only have 12 nucleotides to spend, and we must achieve at least 20 assignments. This means that each nucleotide must be spent 5 times ((20 X 3) / 12). This is an awful lot to ask of a nucleotide.

A1 = (B1, B2, B3, B4, B5)
A2 = (B1, B2, B6, B7, B8)
A3 = (B2, B3, B8, B9, B10)
A4 = (B3, B4, B10, B11, B12)
A5 = (B4, B5, B12, B13, B14)
A6 = (B1, B5, B6, B14, B15)
A7 = (B9, B10, B11, B16, B17)
A8 = (B7, B8, B9, B17, B18)
A9 = (B6, B7, B15, B18, B19)
A10 = (B13, B14, B15, B19, B20)
A11 = (B11, B12, B13, B16, B20)
A12 = (B16, B17, B18, B19, B20)

The inversion of thinking here is that each nucleic acid participates in multiple assignments. The conventional thinking is that each amino acid should participate in multiple codon assignments, but that approach requires that selection of nucleotides and codons from all possible configurations is somehow pre-ordained. The mystical assignment process must have involved both nucleotides and amino acids simultaneously converging on codons, because codons could not have otherwise existed in any meaningful way before this assignment lottery occurred. It must have been a universal high-wire act of balancing molecular forces – all molecular forces. The classical perspective seems to assume some ‘safe haven’ for the numbers in the system before the assignments were made. What is the origin and nature of these relationships?

When nucleotide identities and positions within a codon are considered together, the Rafiki model is covered by 12 symbols. Of course, each assignment must also be associated with three nucleotides, and a careful analysis of the above set of relationships shows that this is true. This means that if a nucleotide, adenine for instance, can be plugged into A1 it can be plugged into any or all of the other 11 symbols as well. Nucleic acid triplets have 6 permutations as follows:

Permutation #1, P1 = 1, 2, 3
Permutation #2, P2 = 2, 3, 1
Permutation #3, P3 = 3, 1, 2
Permutation #4, P4 = 1, 3, 2
Permutation #5, P5 = 3, 2, 1
Permutation #6, P6 = 2, 1, 3

This is a cyclic permutation set of three members. It implies that in our new model we must accept that there are 6 permutations of all possible nucleic acid triplets, including seemingly trivial cases such as (Adenine, Adenine, Adenine). Each assignment represents a collection of all permutations of the three nucleic acids that are related to it. We have no way of knowing which triplets to discard in cases of redundancy within the cyclic permutation. The potential codon count seemingly goes to 120.

B1 = P(A1, A6, A2)
B2 = P(A1, A2, A3)
B3 = P(A1, A3, A4)
B4 = P(A1, A4, A5)
B5 = P(A1, A5, A6)
B6 = P(A2, A6, A9)
B7 = P(A2, A9, A8)
B8 = P(A2, A8, A3)
B9 = P(A3, A8, A7)
B10 = P(A3, A7, A4)
B11 = P(A4, A7, A11)
B12 = P(A4, A11, A5)
B13 = P(A5, A11, A10)
B14 = P(A5, A10, A6)
B15 = P(A6, A10, A9)
B16 = P(A7, A12, A11)
B17 = P(A7, A8, A12)
B18 = P(A8, A9, A12)
B19 = P(A9, A10, A12)
B20 = P(A10, A11, A12)

The danger here is in failing to recognize the meaning of any assignment within this system. We started with only 20 required assignments, because that is what the empiric evidence suggested that we do. But our assignment process immediately yielded multiple potential meanings to each triplet depending on its context within the model.

Notice that the set member represented by A1 participates in five of the assignments, and for each of these A1 is the initial base in the assignment permutation exactly twice. It also is the second and third base exactly twice. In this way every member plays a balanced role in the system.

A1 = (B1, B2, B3, B4, B5)
B1 = (A1, A6, A2), (A6, A2, A1), (A2, A1, A6), (A1, A2, A6), (A2, A6, A1), (A6, A1, A2)

This holds true for all of the 12 nucleotides and their related assignments, so each base is a primary initiator of five codons and a secondary initiator of five codons. Therefore, there are sixty primary initiators and sixty secondary initiators. We will assign each permutation a label so that we can demonstrate each symbol’s role as initiator, for example

C1 = (A1, A6, A2) and C61 = (A1, A2, A6):

Primary initiators
A1 = (C1, C2, C3, C4, C5)
A2 = (C6, C7, C8, C9, C10)
A3 = (C11, C12, C13, C14, C15)
A4 = (C16, C17, C18, C19, C20)
A5 = (C21, C22, C23, C24, C25)
A6 = (C26, C27, C28, C29, C30)
A7 = (C31, C32, C33, C34, C35)
A8 = (C36, C37, C38, C39, C40)
A9 = (C41, C42, C43, C44, C45)
A10 = (C46, C47, C48, C49, C50)
A11 = (C51, C52, C53, C54, C55)
A12 = (C56, C57, C58, C59, C60)

Secondary initiators
A1 = (C61, C62, C63, C64, C65)
A2 = (C66, C67, C68, C69, C70)
A3 = (C71, C72, C73, C74, C75)
A4 = (C76, C77, C78, C79, C80)
A5 = (C81, C82, C83, C84, C85)
A6 = (C86, C87, C88, C89, C90)
A7 = (C91, C92, C93, C94, C95)
A8 = (C96, C97, C98, C99, C100)
A9 = (C101, C102, C103, C104, C105)
A10 = (C106, C107, C108, C109, C110)
A11 = (C111, C112, C113, C114, C115)
A12 = (C116, C117, C118, C119, C120)

Each assignment set has six permutations

B1 = (C1, C28, C9, C61, C88, C69)
B2 = (C2, C8, C14, C62, C68, C74)
B3 = (C3, C13, C19, C63, C73, C79)
B4 = (C4, C18, C24, C64, C78, C84)
B5 = (C5, C23, C29, C65, C83, C89)
B6 = (C10, C27, C41, C70, C87, C101)
B7 = (C6, C45, C37, C66, C105, C97)
B8 = (C7, C36, C15, C67, C96, C75)
B9 = (C11, C40, C32, C71, C100, C92)
B10 = (C12, C31, C20, C72, C91, C80)
B11 = (C16, C35, C52, C76, C95, C112)
B12 = (C17, C51, C25, C77, C121, C95)
B13 = (C21, C55, C47, C81, C115, C107)
B14 = (C22, C46, C30, C82, C106, C90)
B15 = (C26, C50, C42, C86, C110, C102)
B16 = (C34, C56, C53, C94, C116, C113)
B17 = (C33, C39, C57, C93, C99, C127)
B18 = (C38, C44, C58, C98, C104, C118)
B19 = (C43, C49, C59, C103, C109, C119)
B20 = (C48, C54, C60, C108, C114, C120)

Although we achieved a absolute reduction from 192 to 12 nucleotides, we also note a peculiar increase in the number of required permutations from 64 to 120. This is due to the model’s inability to distinguish between seemingly trivial permutations. However, this new model is not a two-dimensional, one-to-one, sequestering grid; it is a multidimensional inter-relation network, which we will call an identity network. It is not unreasonable to suspect that within a network the seemingly trivial permutations actually could have unique meanings depending on their context.

We have a network capable of presenting any and all of the required permutations. It differs from the conventional grid on the important issue of bias; specifically it can present the data without weighting the nucleotides.

One glaring drawback: unlike a grid, the identity network does not lend itself to two-dimensional schematic representation. However, what it lacks in 2D it makes up for in 3D. We can easily use these relationships to generate a dodecahedron or an icosahedron, but they are dual to each other. In fact, the concept should be interpreted as a sphere, but polyhedrons are more effective when given a flat starting medium such as the paper.

Diagram of the symbolic relationships in the Rafiki model

A full appreciation of the relationships in this identity network requires the diagram be cut and folded.

Mike McNeil, a Rafiki sympathizer, brings up an interesting point about the above symbol identifiers. Mike is a bright guy, and a patent attorney, skilled in distilling an idea. He points out that since we are dealing with a known system of symbols with only four identities, the symbol identifiers should reflect this. In other words, perhaps A1-12 is sub-optimal. What about 4 sets of 3 inter-related identifiers: A12,3,4, A21,3,4, A31,2,4 and A41,2,3? Thanks for the additional burden, Mike. Fortunately, we can deal with this quite nicely, but both systems are informative to different circumstances. I feel that he has identified an actual truth in the new universe of yet underdeveloped combinatorial mathematics and the genetic code. In honor of this I will refer to these subscripts as McNeil subscripts. For now we will stick with our symbols of the non-McNeil variety, which is to say no subscripts at all.

We are finally able to return to the task at hand – making pretty patterns. We now have a weightless presentation format for our data. It is an unbiased permutation grid in three dimensions, and we can use it to see what kind of patterns nature has given us.

We start by examining some of the unexpected curiosities in the new model. Although nucleotides have become equal, triplets have become decidedly unequal. There are now three classes of triplets: primary, secondary and tertiary. Since color-is-king, so let’s assign some colors to these classes. Unfortunately, there is only one rainbow, so we are going to have to re-use amino acid and nucleotide colors. Please try not to get confused by this.

If we add the nucleotide initials to each permutation we can see how each triplet generates six permutations, but the triplets are not homogenous in their behavior.

When all possible permutations are present, the structure contains 4 primary, 12 secondary and 4 tertiary triplets. If we stylize these - and we should because we can - they look like this:

Triplets can now assume one of three classes, and we notice that within each class there are different types of permutations based on the class of the triplet. The Rafiki model contains the following distinguishable permutations:

If we combine the model with the color-is-king style, we produce the following two-dimensional map of the identity network of permutations. This is merely an un-weighted mathematical treatment of a 43 permutation set.

We could do the same with a dodecahedron, but let’s go bold; let’s go 3D. In 3D the dodecahedron-icosahedron debate can be mooted by mapping to a sphere.

This is merely a structure for holding data. It is a receptacle into which we can place any appropriate permutation data, such as the assignment data of codons and amino acids. (The I Ching fits well also – by coincidence.) This is essentially an unbiased view of data in our search of nature’s patterns. Apparently, the demons were in full malicious mode when they capriciously scattered their arbitrary and meaningless assignments across our unanticipated new model. Look at our grid in terms of permutation class and type.

To my mind, this is the first visual glimpse of the idea that cyclic permutations of nucleotide triplets – codon types – can provide informative units of genetic translation. Patterns in codon types are everywhere, I suspect, because these units carry genetic information. A signal with six separate channels, so to speak.

We started with the genetic code, or I should say the codon table, a linear phenomenon that has been deemed arbitrary and meaningless. We arranged the line in a two-dimensional grid, theoretically a no-no, and started to notice some patterns. On the strength of this, we further arranged the grid into three-dimensions and saw - guess what - more patterns. This opened a whole new space for investigation, the network space. In the network space several curious things happened. Nucleotides equalized and we jettisoned 180 that were no longer required. Triplets became combinatorial, and codons became differentiated based on their generative triplet and their location within that triplet. The formal recognition of a differentiated codon should have absolutely no meaning, and certainly no predictive value in the real world of codon assignment, right? There should be no pattern whatsoever based on such a ludicrous stratification, certainly no meaningful pattern. The whole thing is arbitrary and meaningless, so no pattern within it can be strategic or meaningful to the genetic code. Really?

Actually, the opposite is true - this is the only pattern that is not suspect. It is the only way to present the data in an un-biased, mathematically un-weighted format. The patterns seen in this presentation are the only ones that we can really trust. If we see a pattern here, some force of nature must have put it there.

The reason that networking the assignment table generates patterns that correlate across seemingly hokey parameters, such as codon differentiation, is because the assignment table is not linear, not one-dimensional, and it is not arbitrary, as dogma has insisted. Codons being assigned to amino acids is only a part of a larger system that is in fact a network, a system heretofore widely studied and cherished but poorly understood. We casually refer to codon-amino acid assignments as the genetic code, but this is incorrect. The code is more robust than the narrow dogmatic view.

Trying to look through the dark lenses of a linear model will destroy our ability to appreciate pretty colors making beautiful patterns. The Rafiki model treats the genetic code as a balancing network of inter-related components. Amino acids, nucleotides, and triplets are all inter-related. Amino acids cooperate with each other by, among other things, logically distributing themselves uniformly across the network of nucleotides.

Ironically, now that we have spent all this time un-weighting the data, the challenge now is to re-weight it. It is useful to have a hierarchy of some type to all 64 codons so that we can ‘see’ how they all relate to each other. Regardless of the weighting strategy, it seems logical that the data should be viewed within the structure of triplet permutations. Let’s see what we can come up with.

One most logical first effort is to use the weighting formula of the standard table, but plug in different values, ones that are consistent with the empiric observations of assignment data.

This is the weighting of the data used to generate the classic table. What if we change the weighting? Plugging in new nucleotide values into the formula gives us an entirely new table.

Merely by juggling the weighting we have created a rainbow of the water affinities, but we can do even more to them by re-proportioning the weights. We will use the following values relative to the classic table:

This data hardly looks more interesting than the data we had before, but it is. When we present it as a spectrum, according to the relative weight of every codon we clearly see the pattern of a rainbow.

I did not make this rainbow. The genetic code assignment process made the color progressions and packaged them into the white light we normally see. This is a water affinity rainbow within the genetic code. The conventional textbook grid is a similar, but less effective mathematical presentation of assignments. It is a partial filter that produces a stippled pattern of color. I merely acted as a prism to spread out the white light into its full spectrum.

It is true that humans will always see what they want to see, and I am human. Actually, we see what we must see under any given set of circumstances. Scientists are no different, and most models are constructed to show an anticipated result. This rainbow was created by a mathematical formula, but let’s be clear about the origin of the pattern and the mathematics behind it. Both the conventional table and the rainbow above were generated by the same, simple mathematical formula. The differences are due to the values inserted into the formula, and the final presentation format.

What good are they?

This rainbow is lots and lots and lots of fun. The first thing we can do is break it apart and see how it was put together. We can examine the components that make the weighting and make the rainbow.

See any patterns here?

What about now?

With this formula we can stretch out the classic table and see that each nucleotide provides a different channel to the overall signal that is a rainbow. There is an adenine channel, as well as cytosine, guanine and uracil channels in the data rainbow.

Four tiny rainbows are interwoven to make one large rainbow. It is almost impossible for me to believe that this level of assignment carries no meaning in the genetic code. Water affinity is a global force forming the pattern of the code, but must be only one of many. See folding a rainbow.

The bigger issue is in seeing multiple meanings in every nucleotide of every codon. This is probably what the code is doing, and probably how the code was formed. The weighting of the above rainbow is based on water affinities, primarily the second nucleotide in each codon. Is it possible that there are multiple dimensions of ‘genetic information’ in this system? Did the genetic code actually assign several amino acids to each of the four nucleotide types relative to all possible contexts within a nucleotide sequence?

We can see from this table, it is actually more informative to know the nucleotide missing from an amino acid assignment pattern than it is to know the third nucleotide in a single codon. The third position is the final two bits of context, but what if we subtract it out of our weighting?

We have now created a hierarchy of sixteen multiplets that correlate to the sixteen multiplets of the Rafiki map.

Each multiplet groups with three others to form a major axis around the four nucleotides in the code. All of the codon types are equally distributed in each major nucleotide axis.

With this weighting formula and by eliminating the weight of the third nucleotide in each codon we have assigned a numerical value to each multiplet in the code. We can apply a color to each multiplet to see how these multiplets are distributed within the range of assignments.

This table is an alternate view of the Rafiki map. We have merely recognized the multiplets and found a way to rank them. When we use the colors created by this ranking of multiplets to view the actual codon data we see the following pattern in the assignments.

The strength of this pattern is compelling, but it is merely another way of viewing the ‘clumping’ of assignments we are accustomed to seeing in the conventional table. The rainbow appears to have been diced up by this procedure, but this is merely an illusion caused by several complex factors. The rainbow returns when we arrange the major axes of the Rafiki map according to the rank of their homogenous multiplets.

The really interesting thing now is that we start to see some method to the madness of START, STOP, water affinity and the placement of the eight perfect multiplets in the very ‘center’ of this rainbow. Methionine can be seen at the far left, STOP is at the far right, and the most symmetry of assignments is in the very middle surrounding proline and glycine. Make no mistake about what we are looking at - I did not artificially create this pattern. I am not clever enough to make this pattern, but nature is.

The most clever thing that nature did with this particular rainbow was use it to anticipate frameshifts in nucleotide sequences. A forward shift will be well behaved into a single multiplet. A backward shift is spread across four multiplets. We can visualize how a frameshift is anticipated by the code by rotating the reading convention on the Rafiki map (a huge advantage of the Rafiki map.) This is how the multiplet rainbow appears in anticipation of a backward frameshift.

What this means is that the code is laid out so that backward frameshifting anticipates water affinity to a remarkable degree. Any lingering doubts about the importance of water affinity in forming the structure of the code should be erased by this demonstration. This is also a perfect illustration of what I mean by symmetry in the code. Only symmetry will allow such a mathematical trick of assignments. Twenty amino acids is an optimized number from the standpoint of frameshifting.

Another fun thing to do with the weighting formula is to isolate the 20 assignment sets from which we generate all cyclic permutations. This is easy to do by plugging in equal values for all position values in the triplet.

Again, there doesn’t appear to be any pattern of interest here, but we can put them back into our modified assignment table. Since no amino acid is assigned more than six codons, we can distribute all of the codons into six channels within the table.

I have absolutely no idea what this means, but it seems that some force in nature has tried to organize this data in some meaningful way. Perhaps an alternate view will make this more apparent. Consider a signal represented by a three-axes graph where each axis can carry two channels.

We can place the weighted data from the genetic code into this graph and perhaps begin to get a feel for the pattern in the code. Again, we are doing nothing more than looking for a way to rearrange the data in the classic table by accepting the formula behind the table, but juggling the bias.

Having played with this codon weighting formula - too much - I searched for a more logical way to weight the codons. I want to try to ‘see’ a codon in the same way that the code can see differences between molecules. The genetic code is nothing more than a balancing act of relative forces. What are the forces that are balancing and how does their appearance change from one molecule to the next. What are the physical toeholds available to make assignments in the code to begin with?

The disproportional nature of nucleotide position suggested an alternate way of weighting codons for visualization. We can move a step closer to seeing this by recognizing that there is a physical difference between a nucleotide as a molecule, and three nucleotides as a codon molecule.

We cannot merely string three nucleotides together in our minds, but we must logically combine them somehow to form a codon. The three different positions in the codon contribute disproportionately to the assignment of a codon: B2 > B1 > B3, so a type of codon weighting formula is needed to reflect this difference between nucleotide positions. We can mathematically find a nifty way to do this by interpreting every codon as a continued fraction. The weight of every codon is calculated as a continued fraction (CF) with the following formula:

As a continued fraction each codon receives three values, a numerator (N) a denominator (D) and a decimal value (Dec). By using a weighting scheme of continued fractions we get an absolute and relative size, and each codon takes on internal relative proportions. Rectangular icons that graphically depict a continued fraction can now illustrate these mathematical relationships between all codons and their parts.

I have chosen a convention where the portion of the rectangle contributed by the first part of the fraction is yellow, the second part is blue and the third part is red. These icons provide a quick visualization of all sixty-four codons and all of their parts relative to each other. One last observation that ties codons back into the un-weighted Rafiki arrangement of them: the Rafiki map places a network of twelve nucleotides in a dodecahedron. Therefore, all of the components of the map can be related to each other by powers of the golden mean, which is an infinitely repeating continued fraction where all values in the fraction equal one, i.e. 1;1,1,1,1... So the logic of representing code components as continued fractions has more than one potential application.

The actual assignments are remarkable for their consistency with respect to water affinity, and for the similarity of patterns across all six codon classes and types. One must wonder whether there are distinguishable physical differences between triplet classes and codon types on which the code takes advantage. It appears as if the code logically assigned at least one amino acid from each part of the hydropathy spectrum to all six codon types. It certainly gives the appearance of a harmonic music scale played out in water affinities and codon classes.

This is another glimpse into how symmetry should appear in the code. From a coding perspective, each codon appears to have a meaning based on all of its properties: size, shape, and consistency of parts, which makes sense in a setting where a molecular code must act upon all available parameters. The code in some sense has the ability to say the same thing in six different ways, at least with respect to relative water affinities. As if by deftly parsing the fine physical details of codons, six discrete meanings become available.

I love these continued fractions as representations of codons. They are so fabulous that I intuitively feel the patterns they create should be a part of how we see the code. Of course the first step is to convince people that we are not seeing the code properly in the first place. That message, I think, is getting lost in all the anger that I generate by suggesting that the classic paradigm is flawed and stereochemistry should play a role. Clearly our traditional view of the code is incomplete, with or without stereochemistry, and new perspectives are warranted.

The most literal view of stereochemical visualization of the code that I have attempted involves a discrete symbolic approach to universal spatial parameters. I have called this system ‘quantum geometry’ (Why? Why not.) It is a poor man’s super symmetry, if you will. (If you haven’t had enough of this by now then you should link with quantum geometry wackiness here.)

Rigorous mathematical treatments have been applied to the data elsewhere, super-symmetry, hypercubes, etc. but I will not visit them here. In virtually all treatments, strong patterns emerge. The key questions are how should we see these patterns and what possible meaning could they have within the code?

The objection that I keep getting to these approaches is that the patterns they generate seem invalid because they involve a vigorous, subjective massage of the data. I agree, but people who lodge these complaints are missing the point entirely. The data is already being thoroughly massaged when we peek at it in the conventional view of the genetic code from the Watson-Crick codon table. The Rafiki map is the only pristine, un-weighted view of the data that I can imagine. Many of the subsequent visualization schemes here are efforts to explore how we should intelligently weight the data - if it needs to be weighted at all. I’m just calling into question whether we might legitimately conclude anything from any patterns generated by any weighting.

<< Back


<Top> - <Home> - <Store> - <Code World> - <Genetic Code> - <Geometry>


Material on this Website is copyright Rafiki, Inc. 2003 ©
Last updated September 10, 2003 1:37 PM