The first offset strategy centered around nuclear capabilities, the second on stealth technology. Now, with competing powers once again approaching parity with the U.S. in critical military capabilities, the Department of Defense is pursuing a third offset strategy — one that puts Artificial Intelligence at the forefront of U.S. defense policy.

This new offset strategy aims to improve the U.S.’s information management capabilities through the adoption of cutting-edge technologies, including AI systems. Because in a data-driven military landscape, the strategic advantage lies with entities that can collect, process, and respond to information faster and more accurately than their adversaries.

Speed and accuracy are often trade-offs with AI systems — particularly for complex, multi-faceted enterprises. DoD efforts to assimilate Joint Force networks for greater operational speed and reliability are underway. However, legacy AI technologies lack the computing power to process information effectively across domains and data sources, especially at scale.

So, how can the U.S. advance its AI capabilities and maintain a strategic edge? Recent developments in AI point to multimodal capability as a key differentiator.

Past limitations, future potential

Although there’s debate about whether the U.S. still boasts better AI capabilities than its peers and near-peers, it’s undeniable that competitors are closing the gap. And even if the DoD still holds the upper hand in AI, its advantage is no longer enough to rely on technological superiority as a peace-keeping measure.

The problem isn’t that the DoD lacks the personnel or computing power to outpace other nations. The problem is that U.S. AI networks struggle to convert massive amounts of data into usable insights with enough speed and accuracy to project superiority.

This challenge is difficult to overcome with conventional AI and machine learning systems, which lack the general-purpose capabilities to integrate unlike data types and seamlessly manage information across domains. The objectives of the U.S.’s third offset require the elimination of these barriers in support of more efficient data processing and, subsequently, more informed decision-making on the part of military personnel.

Up to now, the DOD has undertaken efforts to improve information management capabilities through the Joint All-Domain Command and Control network, which addresses long-standing issues from data silos and stovepipe systems. However, a consolidated Joint Force network can only operate as fast and accurately as AI systems allow it to — hence the pursuit of superior AI capabilities.

A model for long-term AI leadership

The emergence of multimodal AI (also known as foundation models) represents a significant breakthrough in AI technology for both the private sector and the military.

While past generations of AI systems relied on task-centric infrastructure — where each use case required its own model and associated training — multimodal AI eliminates those rigidities through in-context learning. This learning structure gives multimodal AI the flexibility to process various data types with a combination of algorithms, accelerating information collection and processing across networks for more sophisticated data analysis and decision-making.

Put simply, this multimodal structure generates relevant insights from multiple data sources much faster — and on a much larger scale — than previously possible.

Multimodal AI as the first line of defense

The DoD’s ability to use AI to gain full situational awareness through a multi-domain defense strategy becomes much more robust with the versatility of multimodal AI. It’s more accurate than conventional models and capable of zero-shot and few-shot learning. For example, a Contrastive Language-Image Pre-Training model can classify images from a given set of language-expressed categories without needing fine-tuning.

The adaptability of multimodal models allows them to cut through the complexity of the data that’s generated and integrated across domain networks to help operators understand all available options and inform the best course of action. If an adversary launches an attack by sea, AI can rapidly determine if the proper response is to fire missiles, launch fighters, or execute a cyberattack.

Additionally, the development of greater AI functionality will feature an iterative process. Wide-ranging applications for multimodal AI promise to enhance human-machine collaboration across all fronts to support more vital mission capabilities — for personnel on the front lines and in the data centers.

It’s important to note, however, that multimodel systems need to have a large amount of processing power, sizeable on-chip memory, and enough attached memory to handle data efficiently. So, an integrated hardware-software systems approach is necessary to create the right balance of computing, memory, and communication for data-intensive dataflow operations. Ideally, these systems should be flexible to handle the inference and incremental training for superior model creation.

Taking outdated AI to task

Consider the value of multimodal for intelligence, surveillance, and reconnaissance data collection in a Joint Force network. Satellite systems generate immense amounts of audio and visual data for ISR. While task-centric AI models struggle to interpret unlike data inputs or recognize meaningful patterns across various data sources, a foundation model functions as an overarching data processing hub. Within this hub, scalability and contextual learning capabilities mean multimodal AI can operate with the same computing productivity as hundreds of task-centric models.

In the case of ISR data, AI systems under a multimodal system can recognize patterns from audio and visual inputs to identify and flag if, for example, satellite video footage of an adversary’s tank movements matches radio frequencies, indicating a mass military mobilization. The AI will quickly and correctly make this connection and provide its operators with the relevant insight they need to craft the best response.

The U.S. has used rising military parity as an impetus to seek superior technological advantages since the Cold War. And in the era of the third offset strategy, where AI capabilities represent the latest proving ground for conventional military deterrence, multimodal AI is the cutting-edge innovation at the center of it all.

Col. Doug Drakeley (Ret.) is an advisory board member and industrial specialist at SambaNova Systems, a supplier of AI platforms and services based in Palo Alto, California.

More In AI