Before the military can harness data, it helps to understand exactly what kind of resource data is.

Metaphors can convert complexity into easily understandable phrases. But leading thinkers and the executors of military modernization have wrestled with the question of whether data is more like oil, ore or ice.

“Data is like ice,” said John Ferrari, chief administrative officer of QOMPLX, an analytics and insurance software company. “If you put one cube in your glass out on the summer on your patio, it’ll melt. But if you put four bags of ice cubes together around a keg it will stay cold the entire time. Being together makes it exponentially more valuable.”

Ferrari’s remarks came as part of the Association of the United States Army symposium on AI and Autonomy in Detroit, Nov. 20-21. While the ice metaphor may be a stretch, it captures well that part of what makes data useful is its relation to other data: that big data is built out of the vast array of connections found when machines process tremendous data sets.

“The Defense Department is in the business of data destruction,” said Ferrari, who retired as a major general in 2019 after 32 years in the Army. He contrasted the baseline data available for free to any Gmail user, 15 gigabytes, with the 4 gigabytes granted to the most senior general officers and civilians in the Pentagon. “We are also in the business of being enormously afraid of aggregating data. If you put that data set together, yeah they’re both unclassified, maybe they’ll be classified. Whatever you do, don’t bring the data together, keep it siloed.”

One way for the Pentagon to better harness the data it would be reducing hurdles, like those related to classification, for aggregating data. Another challenge is figuring out what to do with the where to process it.

Matt Benigni, chief data scientist for Joint Special Operations Command, described a scenario where a network team weighed how to send captured enemy data back to the United States. The team had developed an algorithm to process the data and were trying to determine how to share that work with others in the field.

This is “a fundamentally flawed way to go with that problem when you can just have server-side compute,” Benigni said. “The task force that operates in Iraq and Syria just crossed the threshold where more data is generated inside the tactical bubble than outside.”

To Benigni’s eye, the problem was a matter of extraction: instead of bringing the algorithm to the data, the teams could send the data back stateside where it would be refined and processed. He likened this to ore, where the breaking apart of a mass, one that’s found together, is required to discover the useful insights hidden inside. To this end, Benigni cited an effort with MIT’s Lincoln Labs to process that data and fuse datasets derived from field extraction.

But assembling existing data in a useful form is only part of the problem facing the Department of Defense. Another component is making sure that the data it chooses to collect going forward is actually useful.

Consider the potential for data collection from a new tank.

“Is the data coming off that tank? Is the data architecture designed from when we put that tank together or is it made to be used in AI machine learning?” Ferrari asked. “ My guess is it’s not because we didn’t put a requirement in there. We don’t even know what that requirement is.”

Ferrari added that, for the next five to 10 years, the United States can get by without knowing exactly how that data will be collected and used, but future designs need to take that into account. In 20 years, for example, tanks need to be designed to collect data with the explicit goal of running that data through AI processes. This way Army leaders can better understand how the tank functioned in battle and observed its environment.

Understanding how to design with AI in mind doesn’t require graduate school level technical knowledge, but as the struggle over metaphors for data showcased, it needs people who can get to a shared understanding. And that means building skills throughout the force.

“The decision maker needs to clearly understand the limitations of the modeling behind the analytics. And that’s on the data scientist [to] explain that. But there’s going to be varying levels of expertise on either side. They need to know the limitations,” Benigni said.

Algorithms, after all, are just the black boxes into which scientists plug available data, and the tooling of the algorithm is at least as responsible as the numbers that go into it for what results it produces. Troubleshooting AI, especially data responsible for military decisions, will take insight into both what information the algorithm was fed and how it was told to weight what parts of it.

So does everybody in the Pentagon need to know how to wrangle data?

“That answer is emphatically ‘no,’” said Brian Miloski, president of Alqimi Technology, an IT solutions company. “Data scientists have a responsibility to communicate. There’s too much black box, too much pixie dust stuff going on. And data scientists have to be taught how to communicate the results more effectively to the layman.”

Because, ultimately, the final products are going to be used in the field by people with basic training, rather than the refinement of graduate or professional education.

“The people who designed those tools need to design them so that mortals can use them,” said Ferrari. “So if we can only use these tools if we have a PhD from Carnegie Mellon, hey, great, not gonna work.”

Kelsey Atherton blogs about military technology for C4ISRNET, Fifth Domain, Defense News, and Military Times. He previously wrote for Popular Science, and also created, solicited, and edited content for a group blog on political science fiction and international security.

More In Artificial Intelligence