In an era of mass digitization, more data (i.e. text, photos, videos, etc.) has been generated in the last 2 years than all information preceding it, with over 3 million gigabytes added every day to our global data volume of 64×1012 GB (Ceze, et al., 2019). By 2025, our data volume is predicted to reach 180×1012 GB (Figure 1). Within the next 50 years, present-day optical and magnetic storage methods will reach their limit, necessitating novel and economically viable forms of long-term data storage, as current media can not sustain our rapid accumulation of information (Hilbert & López, 2011). One alternative established by Harvard University geneticist George Church (2012) has proven most promising in the search for a definitive solution: DNA storage. This initial experiment had a 52,000-word book encoded in snippets of DNA; however, major advances have been made since this initial proof-of-concept (Church, et al., 2012).

In conventional computers, sequences of binary digital code (combinations of 1s and 0s) are used to store data. When storing data in DNA, complementary pairs of Adenine and Thymine or Guanine and Cytosine nucleotide base sequences provide a medium analogous to binary coding and can be used to store and encode information in the form of variations in codons (Ceze, et al., 2019). Most digital data is currently kept in exabyte data centres, each storing 1 billion GB, massive sites spanning multiple football fields that cost over $1 billion USD annually to create and sustain (Doricchi, et al., 2022). Using DNA, all of our world’s digital information can be stored within the volume of a teacup with additional space left over, having the ability to theoretically store 220,160,000 GB in one gram. This would weigh ~88 million times less than the equivalent amount of data being stored on a solid-state drive (SSD). Further, DNA has a storage density 10 million times larger than current magnetic tape and SSD while also consuming at least 1000 times less energy (Ceze, et al., 2019).
There are three stages that compose DNA storage: encoding, writing, and storing (Figure 2). First, a computational algorithm translates 1s and 0s from digital data into corresponding nucleotide sequences (Ceze, et al., 2019). Second, DNA strands are synthesized in the computed base sequence. This is widely accomplished using phosphoramidite synthesis. DNA phosphoramidite nucleosides are coupled on a solid support and stepwise addition of nucleotides occurs at the 5’-terminus until a desired sequence is produced. Afterwards, the finished oligonucleotide is cleaved from its anchored support (Russell, et al., 2008). Synthesis can occur either in a column or in an array. Third, encoded DNA is stored under regulated conditions in-vivo or in-vitro. In-vitro, DNA can be stored as droplets, on silicon chips, or in solution (Ceze, et al., 2019). In-vivo, DNA is often stored in bacterium or fungi such as synthetic yeast chromosomes, which also enables stable replication (Doricchi, et al., 2022). DNA is capable of being preserved in these mediums for over 1000 years at room temperature with greater durability than our most stable magnetic tapes by over 300-fold (Ceze, et al., 2019).
Next, when data is retrieved from DNA it must undergo three steps: random access, readout, and decoding (Figure 2). First, using polymerase chain reaction (PCR) techniques, a specific base sequence containing a particular PCR primer is targeted selectively from the stored DNA pool and amplified (Organick, et al., 2018). Next, fluorophores specific to each nucleotide are used to label and sequence the DNA in solution based on the detection of fluorescence outputs (Doricchi, et al., 2022). Lastly, the computational algorithm will decode this sequence back to binary digital code to reproduce the data (Ceze, et al., 2019).

While DNA storage is extremely promising, it faces a variety of challenges that must be overcome before mainstream adoption. Phosphoramidite synthesis among other data writing methods faces difficulty in producing sequences greater than 200 bases, resulting in exhaustive amounts of encoded DNA strands being created, decreasing the efficiency of subsequent steps of storage and retrieval (Doricchi, et al., 2022). During data retrieval, sequencing DNA often takes extensive durations of time and involves trained personnel, optical devices, and fluorophores which are all costly (Heckel, et al., 2019). Based on our prevailing technology and methods, it is estimated that storing 1000 GB of data in DNA would cost $800 million USD compared to $15 USD on magnetic tape (Ceze, et al., 2019). To compete with solid-state and magnetic storage, the price of DNA synthesis would have to lower by roughly 6 orders of magnitude (Doricchi, et al., 2022). Notably, DNA storage currently relies on methods and technology developed more broadly for the life sciences and have not been adapted for this express purpose (Ari, et al., 2016). As time progresses, new and specialized methodologies will emerge to allow for more efficient and cost-effective DNA storage. Researchers such as MIT professor of biological engineering Mark Bathe believe that based on historical price trends in flash drive storage, we will see this novel method become viable within the next two decades (Trafton, 2021).
There are several hurdles that DNA storage must surpass before its commercial use; though as research continues under the pressure of increasing obsolescence of magnetic tapes and SSDs, this novel technique appears more than ever to be an inevitability. In the near future, as life and technology continue to merge, DNA may not only be the foundation of our lives, but also for technology itself.
Works Cited
Ari, Ş. and Arikan, M., 2016. Next-generation sequencing: Advantages, disadvantages, and future. Plant Omics: Trends and Applications, [online] pp.109–135. Available at: <link.springer.com/chapter/10.1007/978-3-319-31703-8_5> [Accessed 18 Mar. 2023].
Ceze, L., Nivala, J. and Strauss, K., 2019. Molecular digital data storage using DNA. Nature Reviews Genetics, 20(8), pp.456–466.
Church, G.M., Gao, Y. and Kosuri, S., 2012. Next-generation digital information storage in DNA. Science, [online] 337(6102), pp.1628–1628. 10.1126/science.1226355
Doricchi, A., Platnich, C.M., Gimpel, A., Horn, F., Earle, M., Lanzavecchia, G., Cortajarena, A.L., Liz-Marzán, L.M., Liu, N., Heckel, R., Grass, R.N., Krahne, R., Keyser, U.F. and Garoli, D., 2022. Emerging approaches to DNA data storage: Challenges and prospects. ACS Nano, 16(11), pp.17552–17571. 10.1021/acsnano.2c06748
Heckel, R., Mikutis, G. and Grass, R.N., 2019. A characterization of the DNA Data Storage Channel. Scientific Reports, 9(1), p.9663.
Hilbert, M. and López, P., 2011. The world’s technological capacity to store, communicate, and Compute Information. Science, 332(6025), pp.60–65. 10.1126/science.1200970
Liang, D., Wang, L.Z., Chen, F. and Guo, H.D., 2014. Scientific big data and Digital Earth. Chinese Science Bulletin, 59(12), pp.1047–1054.
Organick, L., Ang, S.D., Chen, Y.-J., Lopez, R., Yekhanin, S., Makarychev, K., Racz, M.Z., Kamath, G., Gopalan, P., Nguyen, B., Takahashi, C.N., Newman, S., Parker, H.-Y., Rashtchian, C., Stewart, K., Gupta, G., Carlson, R., Mulligan, J., Carmean, D., Seelig, G., Ceze, L. and Strauss, K., 2018. Random access in large-scale DNA data storage. Nature Biotechnology, 36(3), pp.242–248.
Russell, M.A., Laws, A.P., Atherton, J.H. and Page, M.I., 2008. The mechanism of the phosphoramidite synthesis of polynucleotides. Organic & Biomolecular Chemistry, 6(18), pp.3270–3275.
Trafton, A., 2021. Could all your digital photos be stored as DNA? MIT News. [online] 10 Jun. Available at: <news.mit.edu/2021/dna-data-storage-0610> [Accessed 17 Mar. 2023].