Can Novel Foundation Models Play a Role in National Security?
January 17, 2023

Can Novel Foundation Models Play a Role in National Security?

‍Devaki Raj and Robert Miller
‍Devaki Raj and Robert Miller

Devaki Raj, CEO of CrowdAI, and Robert Miller Head of Government Solutions 

As CEO of a company building computer vision tools for the US government (USG), when ChatGPT news started springing up everywhere, I wanted to understand how large foundation models* might serve dual civil-military uses and the opportunities and risks that they pose. The potential impact, at both ends of that spectrum, are great; therefore, the purpose of this piece is to discuss the national security implications of foundation models for policymakers and our defense and intelligence (D&I) partners. (In my last blog (link), I wrote about how foundation models may impact the AI startup landscape). 

*Note: A "foundation model" is just a fancy way of saying "large unsupervised model", which is becoming state-of-the-art for many tasks. More formally, a foundation model is a model trained on a broad set of unlabeled data that can adapt to novel situations. What makes foundation models unique is that the model can apply information from one situation to another. 

Further Technical Note: GPT is trained on a large corpus of text with the task to anticipate the next word in a sentence. Another category, called Diffusion models, are trained on image-caption pairs to automate tagging of pictures.

Foundation models can be valuable to a variety of military and intelligence tasks, for example: 

  1. Text to Image - Creating synthetic images (scenes) of foreign locations and military equipment to train novel computer vision models. This technique may one day help generate highly-performant, deep-learning models, especially for rare or exotic objects (e.g the North Korean KN-08); but, a limitation of the text2img mode is that the specific class or vernacular must have been used during training, which may not always be the case. The D&I communities are famous (notorious?) for their unique lexicon and ever-changing flourishes. (An Army colonel once welcomed a group of us to a table of “lick-ems and suck-ums” to describe a continental breakfast. Not many people know what are lick-ems and suck-ums, but a lot more know a continental breakfast). Differences of word choice can vary even within an organization: the differences between the 10th Mountain and 1st Airborne, ostensibly, both the same Army, couldn’t be more stark. Models can be fine-tuned, using even a small labeled dataset to learn new concepts; but, the ubiquity of colloquialisms and jargon that embody the D&I experience will be a constant source of learning.
  1. Image to Image - Creating synthetic images to match commercial, Allied, or national sensors that show military equipment to train novel computer vision models. Similar to the above example, these models augment training data to build hyper-specific models where general models lack enough examples to be performant. The img2img mode can be used to create synthetic examples, but it’s critical that the generated images match their real-world twin, known as domain matching. (It is our assessment that, generally, this capability remains nascent for most GEOINT sensors and is over reliant on generative adversarial networks to close technical gaps).
  1. Natural Language Processing - Content translation and summarization of communications intelligence (COMINT), sensitive site exploitation (SSE), or online chats in foreign languages. And yeah, ChatGPT is already good at this (at least for commonly spoken languages) but on the “dirty” internet and not necessarily on JWICS and SIPRnet or other government systems. 
  1. Image to Text - Image captioning can be used to label data, but it could also be used to translate computer vision output into human-readable prose. The process of exploiting imagery and generating text descriptions is the core task of imagery analysts; but one that could be augmented by image captioning without sacrificing context or probabilistic caveat language. 

These remarkable tools could impart significant mission impact, especially while the common criticism is that the US is falling behind its geopolitical adversaries in AI implementation. But, beyond the well-trodden procurement and security barriers to entry, why aren’t these tools already ubiquitous? There are still greater challenges that developers will have to contend with in order to be successful in national security:

  1. Algorithms need to translate from commercial to D&I-specific tasks;
  2. Air gapped or edge environments; and,
  3. IP and Data Provenance 

Algorithms need to translate to USG-specific tasks

Foundation models are trained on lakes of publicly available information (PAI). This data is massive and found readily online. However, USG missions, colloquialisms, and data sources are unique. In places, the USG is already building toward algorithms that generalize to hundreds of different tasks within their mission sets. But. it is our assessment that optimizing for analytic performance is still being prioritized over nearly all else. While analysts and operators may accept the need for, and even the benefit of AI, many (if not most) remain skeptical. Operators and analysts care primarily about mission impact and care little about which tool gets the job done or how.

Anecdotally, analysts express fatigue from learning new software. So, it will take effort  to make foundation models accessible and functional to analysts and operators. To make foundation models useful, techniques for fine-tuning, transfer learning, and few-shot learning will increasingly become requirements to leverage the government’s vast stores of exquisite data.

No Edge Deployment

Foundation models are huge. This means that edge deployment to mobile devices, such as ATAK or remote cameras (think low Earth orbit), which aren’t broadband connected, will be prevented from using foundation models. We believe that this will be a short- to mid-term issue, however. Foundation models trained on narrow specific data with the right pruning and model weight distillation will drive down the size of models. Eventually, with cheaper and more performant hardware, edge and continuous deployments will be more common, even in space. 

Note: The two above issues are technical challenges , which with adequate time and investment will be overcome. However, it is the latter issue that also raises policy concerns, which I want to spend some time addressing. 

IP and Data Provenance

There are just a handful of major players building these large, foundation models. Universally, the models are data hungry, trained on publicly available information, and are expensive to train. The organizations building them are hyper-focused and are heavily funded (internally or externally) to develop these models. These organizations fall into three groups:  

  1. Research organizations, whose IP is a black box (e.g. OpenAI)
  2. Research organizations, whose IP is open (e.g. Stability AI)
  3. Large cloud companies, whose IP is fairly closed (e.g. Google, Microsoft, IBM, Amazon)

But there are two primary issues: IP ownership and data provenance. This is where competition can benefit national security. Some companies, like OpenAI, neither publish their weights nor provide insight into what data their models were exposed to in training. Stability AI provides model weights to the “Stable Diffusion” model as open-source, and they are transparent about the training set they use (i.e. LAION-5B dataset). 

Companies whose IP and model weights are open are positioned to work more closely with D&I organizations. However, those very same open-sourced foundation models may not only prove to be of diminishing value, but potentially dangerous to our national security. Google Maps, for example, has been a boon to the world for getting from point A to point B. However, in 2007, coalition forces operating in Southern Iraq recovered hardcopy Google printouts used in planning attacks on British military bases. It is inevitable that foundation models will one day serve adversarial, dual civil-military purposes; so, a comprehensive analysis by D&I professionals, with industry’s help, must be forthcoming to understand more clearly these risks and the AI supplychain that supports their development and use.

Provenance of AI models matters. In the public space, where generating kitten art and college term papers are the norm, the IP and provenance of the contributing model don’t even register with users. However, in the D&I community, where decisions impact the lives and livelihoods of people, those considerations (and more) are paramount. 

Eventually, for training data, we expect to see hardware manufacturers start cryptographically signing data created from their equipment. But, well-before that can happen, there are tradecraft considerations that will need to be addressed.

Intelligence Community Directive 203, for example, dictates the Analytic Standards that authors must follow in their published works. Not only must they use caveated language, but they must identify assumptions, as well as share complete bibliographies to support reverse engineering their assessments. First published in 2007, this directive was the result of the intelligence failures surrounding the September 11 attacks and Iraq’s weapons of mass destruction program that were our casus belli. In particular, the WMD Commission Report stated:

“Perhaps most troubling, we found an Intelligence Community in which analysts had a difficult time stating their assumptions up front, explicitly explaining their logic, and, in the end, identifying unambiguously for policy makers what they do not know...”

Today, artificial intelligence broadly challenges the lessons from the Report and the resulting policy improvements. Foundation models neither can be backed out to their component pieces, nor can they provide what assumptions were made during inference. The breadth of what users do not know about the underlying models is total, and this should be cause for deep concern among analytic users. At CrowdAI, we can reduce a model to each and every image used for its training, and we can and do answer these questions with every model we put into operation. This should be table stakes for working in the D&I spaces.

Beyond the general lack of understanding of how these foundation models are trained and operate, their PAI sources leave ample room for corruption. Adversarial images, for example, can be perturbed in such a way that models trained from them are unable to correctly identify specific objects. A potentially corrupted foundation model could undermine multiple, very real intelligence missions ranging from strategic indications and warning (I&W) to tactical combat support. A computer vision model used by the IC that cannot find its target is precisely the outcome an adversary would welcome, if not pursue with vigor. 

Risk and opportunity come in many colors. While the national security community grapples with the rapid march of AI, we at CrowdAI remain committed to helping our partners bridge policy and technology, and what it really means to enable a national security workforce with cutting edge AI.

Devaki Raj is CEO and co-Founder of CrowdAI. She is a University of Oxford educated data scientist and former Googler. Robert Miller is Head of Government Solutions at CrowdAI. He is a former federal civilian employee, having worked on Capitol Hill and at The White House, NGA, and CIA. 


2 Commission on the Intelligence Capabilities of the United States Regarding Weapons of Mass Destruction, Report to the President of the United States (Washington, DC: Government Printing Office, 2005), p389; Accessed on 1/11/2023, URL:

Advancing AI
Understanding AI
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.