January 17, 2023

Can Novel Foundation Models Play a Role in National Security?

‍Devaki Raj and Robert Miller

Devaki Raj, CEO of CrowdAI, and Robert Miller Head of Government Solutions

‍

As CEO of a company building computer vision tools for the US government (USG), when ChatGPT news started springing up everywhere, I wanted to understand how large foundation models* might serve dual civil-military uses and the opportunities and risks that they pose. The potential impact, at both ends of that spectrum, are great; therefore, the purpose of this piece is to discuss the national security implications of foundation models for policymakers and our defense and intelligence (D&I) partners. (In my last blog (link), I wrote about how foundation models may impact the AI startup landscape).

*Note: A "foundation model" is just a fancy way of saying "large unsupervised model", which is becoming state-of-the-art for many tasks. More formally, a foundation model is a model trained on a broad set of unlabeled data that can adapt to novel situations. What makes foundation models unique is that the model can apply information from one situation to another.

Further Technical Note: GPT is trained on a large corpus of text with the task to anticipate the next word in a sentence. Another category, called Diffusion models, are trained on image-caption pairs to automate tagging of pictures.

Foundation models can be valuable to a variety of military and intelligence tasks, for example:

Text to Image - Creating synthetic images (scenes) of foreign locations and military equipment to train novel computer vision models. This technique may one day help generate highly-performant, deep-learning models, especially for rare or exotic objects (e.g the North Korean KN-08); but, a limitation of the text2img mode is that the specific class or vernacular must have been used during training, which may not always be the case. The D&I communities are famous (notorious?) for their unique lexicon and ever-changing flourishes. (An Army colonel once welcomed a group of us to a table of “lick-ems and suck-ums” to describe a continental breakfast. Not many people know what are lick-ems and suck-ums, but a lot more know a continental breakfast). Differences of word choice can vary even within an organization: the differences between the 10th Mountain and 1st Airborne, ostensibly, both the same Army, couldn’t be more stark. Models can be fine-tuned, using even a small labeled dataset to learn new concepts; but, the ubiquity of colloquialisms and jargon that embody the D&I experience will be a constant source of learning.

Image to Image - Creating synthetic images to match commercial, Allied, or national sensors that show military equipment to train novel computer vision models. Similar to the above example, these models augment training data to build hyper-specific models where general models lack enough examples to be performant. The img2img mode can be used to create synthetic examples, but it’s critical that the generated images match their real-world twin, known as domain matching. (It is our assessment that, generally, this capability remains nascent for most GEOINT sensors and is over reliant on generative adversarial networks to close technical gaps).

Natural Language Processing - Content translation and summarization of communications intelligence (COMINT), sensitive site exploitation (SSE), or online chats in foreign languages. And yeah, ChatGPT is already good at this (at least for commonly spoken languages) but on the “dirty” internet and not necessarily on JWICS and SIPRnet or other government systems.

Image to Text - Image captioning can be used to label data, but it could also be used to translate computer vision output into human-readable prose. The process of exploiting imagery and generating text descriptions is the core task of imagery analysts; but one that could be augmented by image captioning without sacrificing context or probabilistic caveat language.

These remarkable tools could impart significant mission impact, especially while the common criticism is that the US is falling behind its geopolitical adversaries in AI implementation. But, beyond the well-trodden procurement and security barriers to entry, why aren’t these tools already ubiquitous? There are still greater challenges that developers will have to contend with in order to be successful in national security:

Algorithms need to translate from commercial to D&I-specific tasks;
Air gapped or edge environments; and,
IP and Data Provenance

Algorithms need to translate to USG-specific tasks

Foundation models are trained on lakes of publicly available information (PAI). This data is massive and found readily online. However, USG missions, colloquialisms, and data sources are unique. In places, the USG is already building toward algorithms that generalize to hundreds of different tasks within their mission sets. But. it is our assessment that optimizing for analytic performance is still being prioritized over nearly all else. While analysts and operators may accept the need for, and even the benefit of AI, many (if not most) remain skeptical. Operators and analysts care primarily about mission impact and care little about which tool gets the job done or how.

Anecdotally, analysts express fatigue from learning new software. So, it will take effort to make foundation models accessible and functional to analysts and operators. To make foundation models useful, techniques for fine-tuning, transfer learning, and few-shot learning will increasingly become requirements to leverage the government’s vast stores of exquisite data.

No Edge Deployment

Foundation models are huge. This means that edge deployment to mobile devices, such as ATAK or remote cameras (think low Earth orbit), which aren’t broadband connected, will be prevented from using foundation models. We believe that this will be a short- to mid-term issue, however. Foundation models trained on narrow specific data with the right pruning and model weight distillation will drive down the size of models. Eventually, with cheaper and more performant hardware, edge and continuous deployments will be more common, even in space.

Note: The two above issues are technical challenges , which with adequate time and investment will be overcome. However, it is the latter issue that also raises policy concerns, which I want to spend some time addressing.

IP and Data Provenance

There are just a handful of major players building these large, foundation models. Universally, the models are data hungry, trained on publicly available information, and are expensive to train. The organizations building them are hyper-focused and are heavily funded (internally or externally) to develop these models. These organizations fall into three groups:

Research organizations, whose IP is a black box (e.g. OpenAI)
Research organizations, whose IP is open (e.g. Stability AI)
Large cloud companies, whose IP is fairly closed (e.g. Google, Microsoft, IBM, Amazon)

But there are two primary issues: IP ownership and data provenance. This is where competition can benefit national security. Some companies, like OpenAI, neither publish their weights nor provide insight into what data their models were exposed to in training. Stability AI provides model weights to the “Stable Diffusion” model as open-source, and they are transparent about the training set they use (i.e. LAION-5B dataset).

Companies whose IP and model weights are open are positioned to work more closely with D&I organizations. However, those very same open-sourced foundation models may not only prove to be of diminishing value, but potentially dangerous to our national security. Google Maps, for example, has been a boon to the world for getting from point A to point B. However, in 2007, coalition forces operating in Southern Iraq recovered hardcopy Google printouts used in planning attacks on British military bases. It is inevitable that foundation models will one day serve adversarial, dual civil-military purposes; so, a comprehensive analysis by D&I professionals, with industry’s help, must be forthcoming to understand more clearly these risks and the AI supplychain that supports their development and use.

Provenance of AI models matters. In the public space, where generating kitten art and college term papers are the norm, the IP and provenance of the contributing model don’t even register with users. However, in the D&I community, where decisions impact the lives and livelihoods of people, those considerations (and more) are paramount.

Eventually, for training data, we expect to see hardware manufacturers start cryptographically signing data created from their equipment. But, well-before that can happen, there are tradecraft considerations that will need to be addressed.

Intelligence Community Directive 203, for example, dictates the Analytic Standards that authors must follow in their published works. Not only must they use caveated language, but they must identify assumptions, as well as share complete bibliographies to support reverse engineering their assessments. First published in 2007, this directive was the result of the intelligence failures surrounding the September 11 attacks and Iraq’s weapons of mass destruction program that were our casus belli. In particular, the WMD Commission Report stated:

“Perhaps most troubling, we found an Intelligence Community in which analysts had a difficult time stating their assumptions up front, explicitly explaining their logic, and, in the end, identifying unambiguously for policy makers what they do not know...”

Today, artificial intelligence broadly challenges the lessons from the Report and the resulting policy improvements. Foundation models neither can be backed out to their component pieces, nor can they provide what assumptions were made during inference. The breadth of what users do not know about the underlying models is total, and this should be cause for deep concern among analytic users. At CrowdAI, we can reduce a model to each and every image used for its training, and we can and do answer these questions with every model we put into operation. This should be table stakes for working in the D&I spaces.

Beyond the general lack of understanding of how these foundation models are trained and operate, their PAI sources leave ample room for corruption. Adversarial images, for example, can be perturbed in such a way that models trained from them are unable to correctly identify specific objects. A potentially corrupted foundation model could undermine multiple, very real intelligence missions ranging from strategic indications and warning (I&W) to tactical combat support. A computer vision model used by the IC that cannot find its target is precisely the outcome an adversary would welcome, if not pursue with vigor.

Risk and opportunity come in many colors. While the national security community grapples with the rapid march of AI, we at CrowdAI remain committed to helping our partners bridge policy and technology, and what it really means to enable a national security workforce with cutting edge AI.

‍

Devaki Raj is CEO and co-Founder of CrowdAI. She is a University of Oxford educated data scientist and former Googler. Robert Miller is Head of Government Solutions at CrowdAI. He is a former federal civilian employee, having worked on Capitol Hill and at The White House, NGA, and CIA.

¹https://foreignpolicy.com/2007/02/06/tuesday-map-iraqi-insurgents-use-google-earth-to-target-brits

²Commission on the Intelligence Capabilities of the United States Regarding Weapons of Mass Destruction, Report to the President of the United States (Washington, DC: Government Printing Office, 2005), p389; Accessed on 1/11/2023, URL: https://irp.fas.org/offdocs/wmd_chapter8.pdf

‍

May 22, 2023

“Small Devices, Big Impacts: Streaming Computer Vision Models at the Edge”

Running a computer vision model on a cell phone or mobile device is a powerful tool that can enable real-time analysis of images and videos, which can be useful in a variety of applications. While there are challenges to streaming computer vision models on small devices, CrowdAI has developed a roadmap of techniques and tools to overcome these challenges. By leveraging cloud driven API connections for invoking inference from a trained model, CrowdAI sees a pathway to real-time analysis of imagery and video on small devices operating at the edge. Additionally, the geospatial benefits of building models from media captured on cell phones can offer unique advantages for training, monitoring, and analyzing objects of interest.

Zeke Foppa and Taylor Maggos

May 8, 2023

Deploy Anywhere; Use Every Camera: The Power of the CrowdAI Platform

In today's world, where we are surrounded by computers and cameras of all types and sizes, it's essential for machine learning services to be deployment-agnostic and camera-agnostic. Being able to work in any cloud, hardware, or software environment; and to use any camera or sensor is an invaluable advantage that has become increasingly important in recent years as the use of cameras has exploded in various industries. These features allow for greater flexibility and ease of use—exactly what CrowdAI strives to provide—enabling ML to be used in a wider range of applications.

Patrick Collins and Taylor Maggos

May 1, 2023

Exploring how SAM and GroundingDino Increase Opportunities to Accelerate Semi- and Fully Automated Bounding Box Data Labeling

Going from a complex segmentation model to a simpler bounding box object detection model using SAM may seem like a bit of overkill, but there are some instances where an object detection model is favored over a segmentation model. For example, if we have a photo of a street with a bunch of pedestrians, a detection model can provide insight into how many people are there, their location in the frame, and how they interact with each other; segmentation masks wouldn’t give us as useful information since they would just be silhouettes of standing or walking people. Another benefit is that object detection models are designed to be more robust to variations in object size, rotation, and aspect ratio, making them ideal for identifying objects with diverse geometries. Lastly, when computational resources are limited, object detection models tend to be less computationally intensive than segmentation models, which can require more processing power and memory to run efficiently.

Zeke Foopa and Taylor Maggos

Can Novel Foundation Models Play a Role in National Security?

Get AI insights and best practices in your inbox

“Small Devices, Big Impacts: Streaming Computer Vision Models at the Edge”

Deploy Anywhere; Use Every Camera: The Power of the CrowdAI Platform

Exploring how SAM and GroundingDino Increase Opportunities to Accelerate Semi- and Fully Automated Bounding Box Data Labeling