Data is a massive problem for the intelligence community. From the satellite images produced by the National Reconnaissance Office to the bulk communications data swept up by the National Security Agency, the intelligence community is collecting more information than ever before. But where to store it?
Data centers are massive warehouses and megawatts of power. The resource-intensive nature of these facilities makes them difficult to scale, and ultimately unprepared for a torrent of data.
Now, the Intelligence Advanced Research Projects Activity—the organization charged with tackling some of the intelligence community’s most difficult problems—thinks it has a solution: synthetic DNA. On Jan. 15, IARPA officially launched the Molecular Information Storage (MIST) program, an effort to use synthetic DNA to store exabytes (one million terabytes) of data.
IARPA has awarded multi-phase contracts to two teams pursuing a solution: up to $23 million for the Molecular Encoding Consortium and up to $25 million for Georgia Tech Research Institute.
“The MIST program is a data storage moonshot to develop technologies that allow us to shrink an exabyte-scale data warehouse down to a tabletop form factor, with equally large reductions in operation and maintenance costs,” said IARPA Program Manager David Markowitz. “This would be a transformative capability for big data stakeholders in government and industry.”
If successful, MIST will result in new devices capable of both writing data to and reading data from synthetic DNA media at scale. The goal is to make this technology commercially viable within three to five years.
To store data on synthetic DNA, digital files must first be converted from binary code to the format used for DNA—sequences of A,C,T and G. DNA is then synthesized in short segments that are identified by a sort of barcode that identifies where that segment exists in relation to the entire sequence, allowing readers to recall or copy only the necessary data.
“With digital data growing at an exponential rate, there is increasing interest and excitement about using nature’s storage medium, DNA, to store digital data,” said Emily Leproust, CEO and co-founder of Twist Bioscience, one of the companies working with the Georgia Tech Research Institute. “With the government’s commitment to fund this exciting new area of storage, we believe that as part of this consortium of specialists, we can truly revolutionize the DNA synthesis process, and reduce the cost of synthesis for DNA data storage by many orders of magnitude.”
According to Twist Bioscience, the goal of their effort will be to create a device capable of writing enough synthetic DNA per day to reduce data storage costs to as low as $1 per gigabyte.
“Fifty years ago, DNA data storage was considered science fiction – today, it is science with a path toward broad implementation,” Leproust said. “We expect in the next three to five years, with the proper amount of government and industry investment, it will become a reality for long-term storage.”
The Molecular Encoding Consortium is led by the Broad Institute of MIT and Harvard and includes DNA Script and Professor Donhee Ham’s research group at Harvard University.
Meanwhile, the Georgia Tech Research Institute is teaming with the Twist Bioscience Corp., the University of Washington, Microsoft and Roswell Biotechnologies for their effort. These new systems will be tested independently by Los Alamos National Laboratory, Sandia National Laboratories and the U.S. Army Research Laboratory .
Nathan Strout was the staff editor at C4ISRNET, where he covered the intelligence community.