跳至主要内容

New technology makes DNA data storage possible

 In a new study, scientists at the Massachusetts Institute of Technology have developed a technique to tag and retrieve DNA data files, which could make DNA data storage possible.

At this point, there are about 10 trillion gigabytes (gigabytes) of data on the planet, and every day, humans churn out another 2.5 million gigabytes of data in emails, photos, social media feeds and other digital files. Much of this data is stored in huge facilities called exabyte data centers (1EB is 1 billion gigabytes), which can be the size of several football fields and cost about $1 billion to build and maintain.

Many scientists believe another solution to the massive data storage problem lies in the biological macromolecule that contains our genetic information: deoxyribonucleic acid (DNA). Since the beginning of life on Earth, DNA has evolved to store huge amounts of information at extremely high densities. Mark Barth, a professor of bioengineering at the Massachusetts Institute of Technology, says a coffee cup filled with DNA could theoretically store all the world's data.

"We need new solutions to store the vast amount of data the world is accumulating, especially archival data," he says. "DNA is even 1,000 times denser than flash memory. Another interesting property is that once the DNA polymer is made, it doesn't consume any more energy. You can write data into DNA and store it forever."

Scientists have shown that images and text can be encoded into DNA, but we still need a simple way to pick out the required files from the many mixtures of DNA fragments. In the new study, Mark Barth and colleagues demonstrated a way to encapsulate each data file in a six-micron silicon dioxide spherical "capsule," using short sequences of DNA as tags to display the file's contents.

Using this method, the researchers accurately extracted individual images stored as DNA sequences from DNA files containing 20 images. This method can be scaled up to 1020 files, taking into account the number of tags available.

Stable storage medium

Digital storage systems encode text, photos and other types of information as A series of zeros and ones, and the same information can be encoded in DNA using the four nucleotides (A, T, G and C, adenine, thymine, guanine and cytosine) that make up the genetic code. For example, G and C can stand for 0, while A and T stand for 1.

DNA has several other characteristics as a storage medium. For one thing, it is very stable, and relatively easy to synthesize and sequence (though currently expensive). Second, it has a very high storage density -- one nucleotide is equivalent to two bits, about one cubic nanometer. As a result, data stored in THE form of DNA could easily fit in the palm of our hand.

This new way of storing data faces a number of obstacles, starting with the cost of synthesizing such a large amount of DNA. Currently, it costs $1 trillion to write one petabyte (1 million GIGABytes) of data. To compete with magnetic tape, which is commonly used to store archival data, Barth estimates that the cost of DNA synthesis needs to fall by about six orders of magnitude. He noted that this goal could be achieved within a decade or two, just as the cost of storing information on flash memory has fallen dramatically over the past few decades.

In addition to the cost, another major bottleneck to using DNA to store data is the difficulty of sorting through all the files we want.

"What would happen if the technology for writing DNA were so advanced that it was cost-effective to write one exabyte or one zettabyte (ZB) into DNA? You'd have a whole bunch of DNA, which is tons of documents or images or movies and stuff, but you'd need to find a particular image or movie in it, "Barth said." It's like looking for a needle in a haystack."

Currently, DNA files are usually retrieved using PCR (polymerase chain reaction). Each DNA data file contains a sequence bound to a specific PCR primer. To read a particular file, the primer needs to be added to the sample to find and amplify the desired sequence. However, a disadvantage of this approach is that there may be crosstalk between the primer and DNA sequences other than the target sequence, resulting in unnecessary file output. In addition, the PCR retrieval process uses enzymes that eventually consume most of the DNA in the library.

"It's a bit like looking for a needle in a haystack, because all the other DNA isn't amplified, so basically it's thrown away." Barth said.

Solve DNA file retrieval problem

The MIT team has developed a new retrieval technique that it hopes will replace the PCR method. They encapsulated each DNA file in a tiny silica capsule, each labeled with a "bar code" of single strands of DNA that corresponded to the file's contents. To demonstrate the cost-effectiveness of this approach, the researchers encoded 20 different images into DNA fragments about 3,000 nucleotides long, which is roughly equivalent to 100 bytes (their study also showed that the capsules could hold up to 1 GB of DNA files).

Each file in the study was labeled with a barcode, such as "cat" or "plane." When researchers want to extract a specific image, they take a DNA sample and add primers corresponding to the target tag. For example, images of tigers correspond to labels like "cat," "orange," and "wild," while images of domestic cats correspond to "cat," "Orange," and "domestic."

These primers are labeled with fluorescent or magnetic particles to facilitate extraction and identification of matching fragments from samples. In this way, the researchers can remove the required files and put the rest of the DNA back in place, storing the data. Their search process allows Boolean logic statements such as "presidents and the 18th century" to produce results for "George Washington," much like Google's image search.

"At the current proof-of-concept stage, our search speed is 1, 000 bytes (1KB) per second," said James Baner of MIT, another lead author of the paper. The speed of our filesystem searches is determined by the amount of data per capsule, which is currently limited by the high cost of writing 100 megabytes (megabytes) of data onto DNA and the number of classifiers that can be used in parallel. If DNA synthesis becomes cheap enough, we can use this method to maximize the amount of data stored per file."

The barcodes the researchers used - single-stranded DNA sequences - were taken from a library of 100,000 sequences developed by Stephen Elledge, a professor of genetics and medicine at Harvard Medical School. If you attach 2 of these tags to each file, you can uniquely mark 1 010 different files; If there are four labels on each file, 1 020 files can be uniquely tagged.

George Church, a professor of genetics at Harvard Medical School who was not involved in the research, described the technology as a "giant leap forward in knowledge management and search technology."

"Rapid advances in writing, copying, reading and using DNA for low-energy archival data storage have made it extremely difficult to accurately retrieve data files from huge databases (1021-byte, zeta scale)," "What's striking about this new study is that it addresses this problem using a completely separate outer layer of DNA, extending the different properties of DNA (hybridization rather than sequencing), and using existing instruments and chemicals," Church said.

Barth envisions this DNA encapsulation technology being used to store "cold" data, that is, data kept in archives but not often accessed. Currently, his lab has founded a startup called Cache DNA, which is developing long-term DNA storage technology for both long-term DNA data storage and short-term clinical and other existing DNA samples storage.

"While it may be some time before we can use DNA as a data storage medium, there is a pressing need for low-cost and large-scale storage solutions for DNA and RNA samples in COVID-19 testing, human genome sequencing, and other areas of genomics." "Said Bath.


评论

此博客中的热门博文

Moroccan football team: "The most familiar stranger"

   When I was still in college ten years ago, I led a sightseeing group of more than 30 Moroccan students. Before meeting them, my general impression of the Moroccans was that they are from North Africa but closer to the Arab world. They have religious beliefs, are used to worship, and are inextricably linked with France.   When I saw the real person, I realized that the North Africans in front of me were actually a group of children playing with each other and having fun in time. They were about the same age as me at the time. I have all kinds of nicknames and nicknames. During the process of taking them to Badaling, the Summer Palace and Houhai, two classmates and I, together with more than 30 Moroccan students, realized "cultural integration" and "world unity" in the small group to some extent.   During the World Cup in Qatar, I was surprised to find that the little-known Morocco team, which was eliminated in the group stage of the last World Cup, after miraculou

Zeigarnik effect

  As a freelancer, you have to fight procrastination every day. "I've made up my mind many times, but I just can't change it. Is it because I'm slow or slow?". In fact, many procrastinations are irrational. Many obstructions are imagined by myself. So distract, postpone, avoid confrontation. It's cool to procrastinate, and it's cool to procrastinate all the time, so I can't do it. Concentration is also related to physical strength. When the physical strength is exhausted, it is even more difficult to concentrate. You’ll tell yourself: I’m too tired to do this—okay, another perfect procrastination.   In 1927, Bruma Zeigarnik's senior research found that people are more likely to care about unfinished and interrupted work than completed work. This is the Zeigarnik effect. For example, we often don't care much about what we have got, but we will especially cherish what we have worked hard but haven't got. Therefore, the TV series will tell you

Hebei Xingang Pharmaceutical Co., Ltd.

Hebei Xingang Pharmaceutical Co., Ltd is located in the industrial park of Zhao County, Shijiazhuang, Hebei, near the world-famous ZhaoZhou Bridge. Our facility neighbors the Qinyin Expressway and 308 National Highway on the east, and it neighbors the Jingzhu Expressway and 107 National Highway on the west. It is located 30 km from Shijiazhuang High-speed Train Station and 50 km from Shijiazhuang International Airport. Our company mainly focuses on the research, production and retail of rifamycin and its derivatives, and pharmaceutical raw materials and intermediates. Our products mainly include, Rifamycin S Sodium, Rifamycin S, 3-Formyl Rifamycin SV, Rifamycin SV Sodium, Rifampicin, Rifandine, Rifaximin, Rifapentine, Rifabutin, Rilmenidine, and so on. We are currently the world’s main manufacturer of anti-tuberculosis drugs and rifamycin and its derivatives. Hebei Xingang Pharmaceutical Co., Ltd was established in 1996. Upon establishment, the company had a clear developmental goal o