Survival of the Fittest: Darwin’s Evolutionary Theory Applied to Programming Languages

By: Eleni Apostolatos

Living organisms are physical manifestations of genetic data. Formally known as deoxyribonucleic acid (DNA), the genetic code of living creatures is composed of two strands with varying configurations of four bases—adenine, cytosine, guanine, and thymine. DNA is interpreted and expressed by native molecular machinery within cells. In transcribing and translating the bases, the molecular processes essentially create us.

Darwin’s theory of evolution, known as survival of the fittest, relates a natural pattern for selection in biological systems: sequences essential for the survival of a species persist through time. Interestingly, research in past years has shown that the same logic can be applied to computer code.

The similarities between DNA in bacterial genomes and programming code in large-scale computer software has been the subject of study in a research project conducted by computational biologist Sergei Maslov of Brookhaven National Laboratory and graduate student Tin Yau Pang from Stony Brook University.

The project’s aim was to elucidate why some genes or computer programs are more common than others, and understand why certain sequences cannot be eliminated over time. In an interview, Maslov explained: “If a bacteria genome doesn’t have a particular gene, it will be dead on arrival. . . The same goes for large software systems. They have multiple components that work together and the systems require just the right components working together to thrive” (1).

Programming languages, designed to communicate instructions to a computer, are generally first compiled to binary code, encoding 0 or 1. The instructions are then executed by machinery in the host computer to control the behavior of the machine and perform the set of commands.

Maslov and Pang used data from the DOE Systems Biology Knowledgebase (KBase), which contains the sequencing of bacterial genomes, and focused on the frequency of important sequences in the metabolic processes of 500 bacterial species. They then compared their analysis to the frequency of installation of 200,000 Linux packages on more than 2 million computers (2). Linux is an open source software collaboration that grants programmers the ability to edit and modify source code to construct or enhance programs for public use.

The results indicate that the most frequently detected sequences in the biological and computer systems are those that promote the largest number of descendants. In short, if an element is more heavily relied upon by others, it is more likely that it will be required for the proper functionality of the system. Conceptually, this has been known to be true in biological systems since Darwin. Certain traits are more useful for survival, in light of the survival of the fittest phenomenon.

Maslov and Pang produced a formula that predicts and accurately reflects the number of essential components in bacterial or computing systems. The simplified equation takes the square root of the number of interdependent components to obtain the number of crucial components that the whole system requires. The calculaiton is true for both complex systems. Their results are published in the Proceedings of the National Academy of Sciences.

A hypothesized reason for the similarities between the biological and computer systems is attributed to both being open access systems composed of sequences that are independently installed. That is, bacteria share genetic data freely through a common pool of genes that can be exchanged via horizontal gene transfer. Linux operating systems similarly grant free installation of different components that are then constructed and shared by many programmers independently. The result of Maslov and Pang’s study may not hold true for other operating systems, such as Mac OS, since other programs do not follow the open access approach Linux does.

Future studies that examine the similarities between genetic and computer codes will help unveil features of both, shedding light on the potential applications and utility of the two in current research and technology. Parallels like these can have an impact in growing technological fields, such as genetic engineering and robotics.

Eleni Apostolatos ’18 is a sophomore in Leverett House.


[1] “Researchers Find Surprising Similarities Between Genetic and Computer Codes.” Brookhaven National Laboratory Newsroom. U.S. Department of Energy, 28 Mar. 2013. Web. 30 Dec. 2015.

[2] Pang T. Y. ;Maslov S. “Universal Distribution of Component Frequencies in Biological and Technological Systems.” Proceedings of the National Academy of Sciences USA 110.15 (2013): Proceedings of the National Academy of Sciences USA, 2013, Vol.110 (15). Web.and Infection. 2013, 19(10), 889-901.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: