Battle does not lend itself to data collection.

Conflict is a mess of confounding variables, of disruption and interference, and it often happens spontaneously, without warning or preparation. While the modern battlefield is a sensor-rich environment, those sensors themselves are targets and subject to assaults of their own. Sometimes, malfunctions can be caused by something as ubiquitous as glare from the sun.

But that creates a problem when it comes to designing autonomous vehicles, which need data to train and function.

Enter synthetic data, or virtually created sensor inputs that reflect a scenario that could occur but were not recorded outside a simulation.

For rare events, or scenarios where data is especially hard to collect, synthetic data can be incorporated into a training model that allows the AI to learn from both real and simulated experiences. One example is training images for an orbital-debris-spearing space robot, a phenomenon about which there was no prior data.

Another, more terrestrial example, is training self-driving cars on realistic-but-rare scenarios like a reflective flatbed crossing a highway at dusk, with the sun’s glare rendering it unintelligible to visual sensors trained only on daylight at noon.

“Calling on our roots as a gaming company, we’ve created a completely synthetic environment in which we can test and train these cars,” said Margaret Amori, of NVIDIA Federal. NVIDIA makes graphics cards — powerful tools, originally used to render video games, which now underpin much of high-powered image processing.

“It’s just not humanly possible to drive the millions and millions of miles needed in order to consider a car trained and ready to go,” Amori said. “We can create all sorts of crazy niche conditions and dangerous scenarios that we wouldn’t want to replicate in the real world. I mean, imagine a world that is like ‘Grand Theft Auto’ without the crime.”

Amori’s remarks came as part of the Association of the United States Army symposium on AI and Autonomy held in Detroit in November. Why might the Army be interested in algorithms learning to drive cars through the streets of a pacified Grand Theft Auto game?

“It’s very physics based and very realistic and accurate,” Amori said. “So you can literally move the sun around, you can change it from summer to winter with a click of a button and create all sorts of conditions like sun glare, which is very problematic for our sensors.”

Training autonomous systems to operate in less-than-ideal conditions is a great use-case for virtual environments, though synthetic data is not the whole of the project.

Figuring out the right mix of synthetic and real data on which to train AI is still, for companies in the field, a work in progress. NVIDIA has so far noted an improvement in AI navigation when using a mix of at least 50-percent synthetic data, though the ratio likely varies from task to task.

And when it comes to military scenarios, synthetic data may be the only data available.

“You can just create that menu of driving data or the Army contents scenarios,” said Frank Schirrmeister of Cadence Systems. There’s a space for startups to “basically automate the creation of synthetic data based on requirements.”

Sensors for Army autonomous vehicles will, after all, need to know how to function in the glare of the sun, but they will also need to know how to handle the sudden dust and flashes of light and obstructions of explosions and other battlefield chaos. Working on modeling that in the lab, to train real-world robots for the battlefields of the future, is likely a growth area.

Consider it a matter of, ahem, demand and control.

Kelsey Atherton blogs about military technology for C4ISRNET, Fifth Domain, Defense News, and Military Times. He previously wrote for Popular Science, and also created, solicited, and edited content for a group blog on political science fiction and international security.

More In Artificial Intelligence