Knowledge of Information
by Randy Isaac
The popularity of social media has fueled a surging appetite for information such as the status of our friends both genuine and of the Facebook variety. We sometimes refer to our collection of all information as our knowledge base. Yet we seldom stop and think about the very essence of information. What do we really know about information?
According to James Gleick in his comprehensive book The Information: A History, A Theory, A Flood, the earliest and most enduring definition of information was given by vicar and mathematician John Wilkins in 1641 as “…whatever is capable of a competent Difference, perceptible to any Sense…” In other words, information is any physical state that could be otherwise. For example, a coin lying on a table with heads up could be otherwise by having tails up. Hence, it is one bit of information. If both sides of the coin were identical, then there would be no information. Alternatively, the very presence of the coin on the table constitutes information since it could be absent. The presence or absence of a particle also constitutes information.
Information is therefore everywhere. IBM physicist Rolf Landauer stated in 1961 that “information is physical.” Every physical particle in the universe involves information by its existence, all of its properties, and its position and interaction with all other particles. For this reason, physicists see that our universe has far more information than the total number of elementary particles.
However, this perspective of information is a far cry from what we usually think of as information. Most of the information described above is inaccessible, unintelligible and irrelevant to the average person. To bridge the gap we need to turn to the communications engineer.
The communications engineer focuses attention on a very tiny subset of the physicists’ information. In this computer age, information is quantified and described by a series of 0’s and 1’s, using any physical object or device that has an “on” or “off” state and that can be transmitted rapidly from one location to another. Any physical state that represents a 0 could be otherwise by being a 1, and vice versa. By carefully defining a standardized set of conditions to represent those 0’s and 1’s, the communications engineer enables the transmission of a message. Claude Shannon was the Bell Labs engineer who did the most to define and quantify information in his seminal paper in 1948. He made it possible to maximize the amount of information that could be transmitted, enabling the distinction between the “off/on” states that constitute desired information from those that constitute noise, or undesired 0’s and 1’s.
The engineer does not care about the message being transmitted, only about the integrity of the on/off states being transmitted. The term information for such an engineer relates to the stream of on/off states that is capable of transmitting a message. For example, a TV signal is a coherent digital signal that conveys some image, but the engineer is oblivious to the actual image being transmitted, whether it is a football game or a classic movie.
But we who are receivers of that message aren’t really interested in such a mechanical view of information. We only care about the meaning of the information. Who won the game? Who is the guilty party in the murder mystery? For us, information is all about meaning. Where does that meaning come from and how does it get connected with the physical states described above?
The information that we care about in our daily lives is usually encoded into digital form using some standard code of meaning. For example, the English language is a code that endows physical features, such as written English letters, or spoken English sounds, with meaning. Those physical features are then coded into a binary digital form that communications engineers can easily transmit. The receiver of such information can then decode those 0’s and 1’s using the same code as the sender used. If the sender were to use a unique code unknown to others, then the receiver could not decode the message unless the sender found some way to transmit the secret code to the receiver. The code that is used is independent of the message. Various physical systems and codes can be used but it is important to note that the sender assigns a meaning to the physical states using a code. The meaning is not determined by the physical parameters that transmit the message. Only those who know how the sender encoded the message can verify whether or not the receiver has properly decoded the message. If an intelligent agent encoded the message using some abstract relationship, then an intelligent agent with knowledge of that relationship is required to decode it. A detailed analysis of the physical states that carry out the transmission of the message cannot determine the meaning of the message without knowledge of the code.
So far, we have discussed three perspectives of information. (1) The physicist sees information in every distinguishable physical state. (2) The communication engineer defines information as a set of well-defined physical states, usually electromagnetic waves, which can be easily transmitted by wires or wirelessly at the speed of light. (3) The receiver of such communications perceives information as the meaning of those transmitted physical states. These are three dramatically different perspectives. Other types of information can also be described but these three are sufficient to help us understand the essential concepts. Since so much conversation about information today relates to the field of life science, it is worth thinking about the different ways in which biologists define and understand information.
All biological organisms have cells that contain a single or multiple DNA molecules which consist of a chain of units called nucleotide base pairs. The collection of such molecules, constituting the genome, can be quite long, over 3 billion nucleotides long in humans. The DNA additionally can be modified in specific ways, which influences how the DNA behaves. The DNA is quite long, yet also incredibly thin, such that it can coil and pack into the comparatively small nucleus of our cells, with the help of proteins such as histones. The very configuration and shape of the molecule is critical to much of the function of the DNA.
When the biologist studies DNA information from a physicist’s perspective, the amount of information contained in the genome is unfathomably large since so many configurations are possible. The basic sequence of nucleotides together with the shape of the folded DNA molecule constitute information. Most of this information is not meaningful. However, the importance of the physicist’s perspective is in the way that information is modified, generated, and retained. The biochemical and physical processes that modify the structure are vital in understanding what constitutes information and how it is established. Cell division requires that the genome be duplicated; a set of enzymes accomplishes this quite reliably, yet every copy that is made appears to have some small variation. The fact that these variations are extremely small contributes to the stability and reproducibility of key features of an organism or cell. The fact that these variations exist and are not zero means that change is possible and new information is generated at each reproductive step. It is not possible, however, at the time of DNA replication to ascertain the meaning of that information. That is, it cannot be determined with certainty which states are good, bad or neutral for the organism’s or cell’s survival. The effect depends not only on the DNA information itself but on the environment in which the organism exists.
The communication engineer’s perspective enables the biologist to focus on the small portion of the genome that codes for proteins and other functional biochemical molecules, less than 2% in humans. These are called genes. The recipe for translating the nucleotide sequence into a protein or other functional molecule uses a bona-fide genetic code, in close analogy to the way human messages are decoded from a digital form. Notably, however, this genetic code is not external to the DNA but is itself encoded in the DNA. In contrast to every human-designed information system, it is not a meaning that is assigned to the physical structure by an external source but is inherent in the physical structure of the DNA itself. This genetic code is not unique and can therefore differ, though ever so slightly. For example, the mitochondrial DNA that resides in the part of the cell that provides energy for the cell uses a slightly different genetic code from the DNA in the nucleus. Furthermore, more than 20 other genetic codes have been discovered in nature. Each one is physically encoded in the DNA in an elegant recursive fashion in which the code translates the very proteins that determine the code.
Information in the genome takes many forms. In addition to the sequence of nucleotides in the genes themselves, there is information in the very shape of the folded DNA molecule and in the so-called epigenome, where various small molecules may be attached to the DNA. Even the non-genetic regions that comprise about 98% of DNA contain information that helps to govern the relative amounts of proteins to be produced. The amount of information contained in the genome is truly unfathomable.
The perspective of most interest to the biologist is the meaning of the information. What messages are conveyed by the information stored in this incredible molecule? The entire field of biochemistry is focused on deciphering the processes carried out by molecules coded in the DNA molecule. The primary category of meaning is the set of functions enabled by these molecules within the environment of the cell. Genes code for proteins and RNA molecules that together perform the myriad functions we recognize as life. These functions form a wide spectrum of activity which ultimately leads to the successful reproduction of the cell and eventually of the host organism. If reproduction fails, that particular DNA sequence ceases to exist and will never be repeated again. If reproduction succeeds, the original structure is replicated with only a slight modification. Since the modification is very small compared to the entire structure, the probability of survival is high. Since modifications always occur, there is just enough variation to provide a possibility of new or modified function.
A second category of meaning of DNA information is the ancestral history recorded in the molecule. Each cell is the product of a very long sequence of successful reproductive events with slight modification. Geneticists can therefore decipher many key features of the history of this ancestry by noting both similarities and differences with other DNA molecules.
It is noteworthy that the messages conveyed by the DNA molecule are not messages assigned to it by an external agent with an external code. The code itself is contained in the molecule and the messages are the biochemical activity of the molecule and not a meaning assigned to the molecule. The meaning and effectiveness of the messages depend on the interaction of the DNA with its environment. In other words, no intelligent agent is needed to either encode or decode or verify the information in DNA.
The fascinating question is how such an incredibly complex and effective molecule came to exist. The mystery of the origin of life has not yet been solved and is the focus of much interest and research. The complex, hierarchical structure of the genome with its embedded code has led many to speculate that an external agent must have been involved in its origin. But no argument convincing to most scientists has surfaced of any such agent or of any characteristic of DNA information that could only have arisen from an external agent. Others seek to apply the so-called law of conservation of information which claims that only intelligent agents can generate new information. However, that law applies only to certain kinds of information such as the meaning assigned to physical states. It does not apply to information in physical states of DNA molecules which is simply the biochemical potential of the molecule. The DNA molecule can change through processes such as point mutations, gene duplication, and other mutations, thereby changing the biochemical activity and hence the information. No abstract meaning is or needs to be assigned to the DNA. The meaningful messages in the DNA are intrinsically transmitted and encoded in the physical activity of the molecule itself. No intelligent agent is needed to generate or interpret the code.
Some day we may learn the secrets of the origin of life. The more knowledge we gain of the information in the genome, the more awesome it seems that such marvelous biological systems could exist. God’s creative power is abundantly evident in the very structure of every cell in our body. Just as God created galaxies, solar systems, and planets through the force of gravity, it seems that he created the diverse biosphere through the efficiency of biochemical reproduction with variation. What a privilege it is to be able to study God's work and to worship him.
Randy Isaac is a solid-state physics research scientist and executive director of the American Scientific Affiliation (ASA), where he has been a member since 1976 and a fellow since 1996. Isaac received his bachelor’s degree from Wheaton College in Illinois and his doctorate in physics from the University of Illinois at Urbana-Champaign.
He joined IBM to work at the Thomas J. Watson Research Center in 1977 and most recently served as the vice-president of systems technology and science for the company.