DNA data storage closer to becoming reality

DNA could replace hard drives, tape and other storage mediums one day. It’s not as far-fetched as it sounds, and significant progress is being made, including new, zero-error individual-file data retrieval.

DNA data storage closer to becoming reality
National Human Genome Research Institute

Hundreds of megabytes of data have been encoded using DNA in the last few years by scientists. But more recently, not only has the media been stored perfectly in the synthetic variant of the genetic instructions that make up all organic life, but archived data files have been individually retrieved with zero errors, too.

It appears that Microsoft Research’s target of a DNA storage system actually functioning within a data center by the turn of the decade, as reported by MIT’s Technological Review a year ago, might be becoming increasingly viable.

We know of organic DNA (deoxyribonucleic acid) through the study of genes that make up living organisms. Large amounts of information is held, and it lasts a long time — a 45,000 year old human femur bone was DNA-sequenced, or decoded, a few years ago, for example.

It’s for those two principal reasons — data density and longevity — that researchers want to figure out ways to use a reimagined, synthetic form of DNA sequencing to store our ever-increasing quantities of data: More data should be held in higher densities than with traditional data center storage, and DNA particulars should last longer than those on solid state, tape, or drive. A 45,000-year or more lifespan could keep books, historical facts, art, and so on alive indefinitely.

Random access in large-scale DNA data storage

Scientists from Microsoft and University of Washington say they’re making progress.

In 2015, I reported on Swiss experiments in which researchers said they were making advances in error correction — gaps in DNA sequences blight tests. False encoding of information, along with chemical degradation, had caused failures.

This new development, announced by Microsoft in February, is of 35 distinct files, consisting of a total of 200 MB of data that have been flawlessly written and, importantly, individually recovered. The tests “demonstrate a viable, large-scale system for DNA data storage and retrieval,” Nature Biotechnology says in an abstract of the group’s paper.

The fact that the files were individually recovered is the big deal here, they say. That’s because in previous experiments, all of the data had to be pulled in order to rebuild just a subset, or just one individual file. In other words, the entire mass of DNA had to be decoded. That’s time consuming — sequences need to be run multiple times for error correction reasons.

This current set of experiments, however, uses a kind of random access in the same way a PC does. It solves the issue, the team says. “We can recover each file individually and with no errors, using a random access approach.”

The key to the advance has been in data validation and in writing an algorithm to speed up reading of the thread- or chain-like DNA.

“DNA data storage has the potential to complement or eventually replace tape, the densest [currently] commercially available storage medium for archival storage,” the paper concludes.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: 10 new UI features coming to Windows 10