“Small Devices, Big Impacts: Streaming Computer Vision Models at the Edge”
May 22, 2023

“Small Devices, Big Impacts: Streaming Computer Vision Models at the Edge”

Zeke Foppa and Taylor Maggos
Zeke Foppa and Taylor Maggos

Computer Vision models have become widespread: they are now applied to a broad variety of everyday applications. Applications have been built for end-users to identify plants, foods, and landmarks, as well as friends & family. Enabling these applications on consumer devices, such as mobile phones, enables widely-accessible, real-time analysis of images and videos, for a variety of applications.

However, these advancements are not free—running a computer vision model directly on a mobile device can be challenging, due to the limited resources available.

In this blog, we’ll look at the future of Computer Vision in Cloud-Enabled contexts, i.e. ones where network access is readily available, such as in warehouses, production facilities, and other large institutions and urbanized contexts. In an upcoming post, we’ll cover the future of Computer Vision in Network-Limited Environments such as submarines and satellites, as well as air-gapped networks and VPCs.

Realtime Model Results from CrowdAI’s Cloud Vision Infrastructure

CrowdAI’s Vision platform enables users to deploy trained Computer Vision models onto CrowdAI’s proprietary cloud infrastructure—all without a single line of code. This allows the models to be run from anywhere, and the results are accessible everywhere. The Platform’s Vision Pipeline functionality enables users to stream trained model predictions in near real-time. Users are able to create Pipelines connected to data sources from anywhere across the web, connect their trained model, and export the results anywhere. Once the Pipeline is running, it’s all automatic—and no code required! Users can move quickly & simply with a rich variety of preconfigured data sources and export locations.

This powerful functionality enables any user with a network-connected, camera-enabled device, such as a smartphone, to tap into the vast power of Computer Vision at their fingertips. This can be further combined with conventional web results or Large Language Models to provide a rich experience directly on-device.

In grocery stores, shoppers can be using CrowdAI-powered technology to access information about hard-to-pronounce ingredients, or price comparisons for similar products at other stores. In that same store, employees can leverage cheap camera-enabled devices to note which shelves are empty and automatically extract the product names from the tags (Figure 1). And, perhaps most importantly of all, they can make self-checkout a less frustrating mess by reducing the need for error-prone barcode-scanning and finicky product-bagging requirements.

Figure 1: An employee can walk through isles at the store to quickly scan shelves using a small device, send the media through the cloud to the trained model, and then receive near real-time outputs from the model showing where shelves are empty corresponding to product tags on the shelves. 

But we’re not limited to everyday scenarios—clients with industrial production lines can leverage this same highly-accessible technology to provide early warnings about machinery wearing out, cracked safety infrastructure, or dangerous spills (Figure 2). Insurance companies can assess underlying damage and risks more effectively, offering more accurate pricing to everyone. Farmers wanting to detect infection and disease in their crops can walk through the fields with their phones and have real-time results output to the same device showing where disease is spreading. Really, there are endless reasons as to why having a computer vision model accessible on your smartphone and operating at the edge is a valuable tool.

Figure 2: In a larger factory setting, end-users can gather video or imagery of their product in production, send the media through their trained CrowdAI model, and receive model predictions back on the same device to visualize defects, in this case dents on cans, on products in the production line. 

Easily Invoke Inference Through an API 

Invoking inference through an API means to make a request to an API endpoint in order to obtain predictions or results from a machine learning model. Inference refers to the process of using a trained model to make predictions on input data which the model has never seen before. 

When you invoke inference through an API, you send input data to the API endpoint, and the API server uses the provided data as input for the machine learning model. The model then processes the data and generates predictions in the form of mask assets over the new media. The results are returned as a response from the API, which the end user will see displayed on their small device functioning at the edge. 

By using an API to invoke inference, CrowdAI can integrate machine learning models into smartphones and small devices. This allows end-users to leverage the power of the model's predictions or capabilities without having to directly interact with or manage the underlying model implementation.

The task will be simple, end-users can collect data at the edge through their phone, invoke inference through an API, and visualize the outputs which are returned directly to the device they are collecting data from (Figure 3). This capability allows for monitoring in real-time at the edge. 

Figure 3: Using any small device, such as a smartphone, users can collect media (in this case rust on machinery), invoke inference on the media through an API to their trained model, and then receive the outputs as a mask asset of the rust over their media on the same device in near real-time.

Mobile Device Bonus - Geospatial Analytic Technology

Building a model using media from a cellular device could offer some hidden advantages when it comes to geospatial models. The majority of cell phones today track the geospatial location of each photo and video taken, and all cell phones today offer location based user experience (UX) services. Leveraging this technology, CrowdAI can display map-views of datasets and pin each piece of media in GoogleEarth or MapBox (Figure 4). Knowing the exact location of where the media in the dataset came from will make it easier to use pre-trained models from similar environments in regards to transfer learning. This will also be useful in keeping track of what data has already been collected and used in the training set. Once a model has been trained and optimized, geospatial tooling can also be used to monitor objects of interest. 

Figure 4:  Shown here is an example of satellite data from Rodanthe, South Carolina which was used to map building damage after a hurricane in 2020. When geospatial metadata is associated with media, our Mapbox and Google Maps integrations allow the user to visualize the image footprint, giving context to the location of the media.

Running a computer vision model on a cell phone or mobile device is a powerful tool that can enable real-time analysis of images and videos, which can be useful in a variety of applications. While there are challenges to streaming computer vision models on small devices, CrowdAI has developed a roadmap of techniques and tools to overcome these challenges. By leveraging cloud driven API connections for invoking inference from a trained model, CrowdAI sees a pathway to real-time analysis of imagery and video on small devices operating at the edge. Additionally, the geospatial benefits of building models from media captured on cell phones can offer unique advantages for training, monitoring, and analyzing objects of interest. As mobile technology continues to advance, the possibilities for running computer vision models from small devices and cell phones will only continue to grow— and CrowdAI is ready to be at the forefront of this venture.

Find out more about our platform and capabilities at Crowdai.com.

Advancing AI
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.