by Jennifer Walsh
Every human grows from a single-celled embryo that contains an entire genome of determinants for what this embryo will become. For each one of us, this single cell became two, then four, and its genome became the genome of every cell in our body. However, over a lifetime of cell divisions and routine functioning, mutations in individual cells accumulate. The cells in our body, the highly infectious viruses, the bacteria in our gut, and the cells of a deadly tumor, have variable discrepancies between their genomes. Current biology excels at finding a given mutation in a host of cells that may beget a genetic disorder or lactase persistence, but we still lack the ability to find discrepancies between individual cells within a larger population. However, new technologies with single cell precision have the potential to transform research ranging from microbiology to disease genetics.
The ability to extract an entire genome from a single cell could revolutionize our ability to differentiate genotypes within a population of cells and pinpoint cells that have spontaneous mutations in their genome that separate them from the others around them. Without single cell sequencing, genetic variation among single cells is generally intractable because sequencing techniques require many cells to provide enough input to create a readout. Single cell sequencing has the potential to enhance the study of topics ranging from cancerous tumor development to neuronal differentiation in the brain.1 With this broad set of motivations, scientists in the last decade have undertaken the task of finding a method to accurately and reliably sequence the genomes of single cells as the next step in the sequencing revolution.
How Single Cell Sequencing Works
Sequencing the DNA of a single cell relies on cumulative advances in three techniques that have been dramatically improved over the last couple decades. Single cell sequencing relies on the ability to (1) isolate a single cell, (2) amplify its genome efficiently and accurately, and (3) sequence the DNA. One of the inherent difficulties as well as advantages in sequencing individual cells comes from being able to compare both small and large differences between the genomes of distinct cells. Consequently, effective ways of sorting cells are critical to achieving this goal. Current sequencing techniques have not reached the necessary sensitivity to be able to sequence DNA directly from a cell, without any artificial amplifications, where more copies of the DNA sequence must be made accessible to be parsed in the sequencing process. This amplification does not need to be perfect, and often involves multiple rounds of replication of the genome after it has been fragmented randomly. These fragments are then sequenced and a coherent, linear genome sequence is put together analytically. Most sequencing methods rely on having these smaller fragments of DNA to analyze, and for most methods, numerous copies of each fragment are necessary. The novelty of the technology entails that there is not a universal “best” method.1 However, single cell DNA sequencing, as it is happening in labs nationwide today, tends to involve each of the three steps outlined above and in more detail below, in some combination.2
Challenges Facing Single Cell Sequencing
The first problem is one of discovery – the primary goal of single cell sequencing is to find interesting differences in the DNA of individual cells within the same organism, system, or even tissue. Many inventive methods of modern DNA sequencing were developed long before the prospect of single cell sequencing was on the horizon.
Depending on the organism being examined, the most straightforward, yet unsustainably time-consuming, way to isolate a cell is often just to isolate it by hand with a micropipette. Another widely used method is single-cell fluorescence activated cell sorting (FACS), which automates the selection process on the basis of specific cellular markers. By running the cells through a very thin column, barely larger than the cells themselves, the cells can be separated into a single-file line. Then by vibrating the system, the cells separate into individual droplets that are sorted by their characteristic response to fluorescence from a laser. FACS represents only one of many imaginative strategies, and countless combinations of these tools that have different advantages and drawbacks for a given research goal. Such alternate strategies include using microfluidics,2 and different methods can be optimized for the isolation of different classes of cells.
The second major obstacle that researchers face in attempting to develop single cell sequencing technologies is the inherent limitation of the amount of DNA present in a single cell. The accuracy of current sequencing machines depends on the number of copies of a given DNA fragment, and each cell has only one copy of the desired genome. Therefore, the first and most important step of single cell sequencing is an amplification of the cell’s genome with minimal technical errors that cause inaccuracies in the DNA sequence.
This amplification problem has already been tackled for RNA sequencing, where complementary DNA (cDNA) is amplified with sufficient accuracy because every cell has multiple copies of every RNA transcript.3 RNA-sequencing processes have been optimized to require as few as 5-10 copies.2 As described below, PCR is the standard approach to amplification, but other recent advancements, like MDA, provide other advantages.
Polymerase Chain Reaction (PCR) has been the cornerstone of modern biology research and exponentially amplifies a DNA segment chosen by the researcher. Cycles of DNA strand separation and base pair addition can be repeated numerous times to achieve the desired level of amplification of the original DNA fragment.
Presented in a 2005 paper, Multiple Displacement Amplification (MDA) is a foundational amplification method for single-cell DNA sequencing. The MDA process consists of amplifying the genome non-specifically through the elongation of random primers throughout the genome. These primers then create duplicate, overlapping DNA fragments for sequencing, which can be pieced together to recreate the entire genome.4 This method is better than PCR at replicating different parts of the genome at a more consistent amplification rate because the overlapping fragments remain attached to the DNA template strand and can therefore displace one another.5
The best way to get an accurate sequencing readout would be a process that mixed PCR and MDA to maximize the number of copies and maximize their fidelity with the original genome. Multiple Annealing and Looping-Based Amplification Cycles (MALBAC) integrates both of these previous methods and is currently the most successful amplification method for single cell sequencing. Developed by Professor Sunney Xie’s lab at Harvard University, MALBAC uses both MDA and PCR to amplify the genome in a way that minimizes the discrepancies in amplification rate of different DNA fragments.6,7 MALBAC performs five cycles of MDA such that the fragments loop together so they cannot be amplified again unless the temperature is increased to denature the DNA template and looped strands.
MALBAC is currently the most effective method for detecting many genetic abnormalities, from cells having an extra chromosome to single DNA base pair changes because it produces a relatively uniformly amplified genome that can allow for specificity in interpreting the sequencing results.2
New Discoveries from Single Cell Genomics
Many discoveries have already been made as the technology for single cell sequencing continues to improve. Genetic variation in cancerous tumors represents a significant application for this technology where it is already known that tumors develop from spontaneous mutations and that tumors themselves are genetically heterogeneous.1 While introducing MALBAC, Zong et al. 2012 used this method to show that the base mutation rate of a cancer cell is ten times larger than the rate for a germline cell – a finding made possible by the reliable amplification rate. Furthermore, from analysis of the number of short, repeating DNA sequences (like a series of inserted G’s or repeated codons), scientists discovered that an early, genetically unstable state in tumor cells causes rapid tumor growth.8
This single cell sequencing can be highly valuable anywhere where there is suspected genetic heterogeneity between cells, and other fascinating new opportunities for discovery lie in places like neuroscience and the gastrointestinal system.
Improvements for Future Discoveries
The most pressing issue in the development of single cell DNA sequencing, perhaps obviously, is ensuring the accuracy of the resulting sequence. Fortunately, letting the cell grow and divide on its own, then sequencing that population of cells is a reliable check of single cell sequencing accuracy. However, the problem of implementing a method that could replicate the genome nearly perfectly and entirely still remains. MDA and PCR are biased to work on certain parts of the genome and ignore others, influencing the sequence read off after amplification. MALBAC is a first-rate attempt to suppress extra replication of some genomic regions, but there is always room for improvement. Professor Xie, in an interview with Nature Methods said, “By no means is MALBAC the end game. We’re trying to do better”.1
One path forward is to improve DNA amplification so that sequencing machines can analyze many, copies of the cell’s original DNA without too many mutations during amplification. Research to this end would be able to focus on DNA amplification techniques and utilize preexisting DNA sequencing technology. However, a second path forward is to streamline the entire process by removing the amplification step and determining the cell’s genome from its singular DNA sequence. Instead of sequencing the result of extracting and replicating the cell’s DNA, scientists could glean the cell’s DNA sequence by taking advantage of its existing cellular DNA machinery and processes.
A sequencing technique relying on the cell to amplify its own genome has already been developed for short sections of the genome.9 This technique uses DNA bases that have been fluorescently tagged based on the identity of the nitrogenous base (ATGC). Unlike other sequencing methods, the tags are cleaved off the bases as they are synthesized instead of remaining in the DNA to be later recognized by a sequencing machine. The tracking of the release of these fluorescent molecules, leaving the synthesized DNA strand intact, can then be recorded and the sequence can be determined without significantly disturbing intracellular activity.
The ability to acquire and sequence the entire genome of an individual cell promises possibly transformative understandings of biological systems and single-celled organisms, of spontaneous, somatic mutations in the human body, and even of which genes we pass on to our offspring.
The potential to acquire this entire genome has led to exciting new scientific progress already. Antibodies in ß-cells are known to be genetically heterogeneous, so sequencing of antibodies could lead to breakthroughs in our understanding of the microscopic workings of the immune system. Efficient sequencing of a single cell also promises improvements in safety and accuracy in prenatal genetic screening. Though we anticipate a wealth of knowledge becoming available with an ideal technology, our increasingly effective attempts have led to everything from disproven hypotheses to exciting new insights.
For applications to bacteria: Lasken, Roger S. & Jeffrey S. McLean (2014). Recent advances in genomic sequencing of microbial species from single cells. Nature Reviews: Genetics. 15(9): 577-584.
RNA Sequencing: Wang Z, Gerstein M, Snyder M (2009). RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews: Genetics, 10(1): 57-63.
Jennifer Walsh ’17 is a sophomore in Lowell House concentrating in Physics.
- Chi, K.R. Nature Methods. 2014, 11(1): 13-17.
- Macaulay I.C.; Voet T. PLOS Genetics. 10(1). Retrieved March 28, 2015, from http://journals.plos.org/plosgenetics/.
- Brady G., Iscove N.N. Methods Enzymol. 1993, 225:611–623.
- Lasken R. et al. Nat. Reviews Genetics. 2014, 15(9): 577-584.
- Nawy, T. Nature Methods, 2014, 11(1): 18.
- Zong C. et al. Science, 2012, 338: 1622-1626.
- Reuell, P. One Cell is All You Need. Harvard Gazette, 2013.
- Navin N. et al. Nature, 2011, 472(7341): 90-94.
- Coupland P. et al. BioTechniques, 2012, 53(6): 365-372.