Video Annotation Best Practices with Annika Deurlington
In this post, we sit down with Annika Deurlington, Commercial Program Manager at CrowdAI. She shares best practices for video annotation and reveals hidden industry tricks.
Data-centric AI is all the buzz. Today everyone agrees, you need better labeled data for better performing models. However, deep learning models are extremely data hungry. The data labeling market alone is worth $4.1 Billion and set to grow to more than $21 Billion by 2027. Models keep asking for the two things organizations don’t have in today’s race for AI adoption–more time and resources.
This is where automated and AI-assisted labeling promise much value. Business leaders are turning to automated labeling in hope of more accurate labels, faster, while using less resources. In fact, Cognyltica predicts that by 2027 over 50% of current labeling tasks will be automated or performed by AI systems.
Today, most modern labeling platforms offer some level of automated label generation. Some of these capabilities use AI and some others use other rules-based systems built for particular tasks. For example, at CrowdAI, for video, our platform only requires users to identify an object of interest in a few frames and then automates the labeling of that object throughout the rest of the video (ML-engineers call this linear interpolation).
Away from all the hype, AI-assisted labeling is still a dynamic space—it is far from a standardized technology that provides predictable performance to all organizations. In order to leverage automated labeling to your advantage, it is important to develop a nuanced understanding of how it works. One category of ML-assisted labeling involves automated annotation of media using public or proprietary pre-trained models. In this post, CrowdAI looks at a few examples to explore advantages and limits of using pre-trained models for automated labeling.
Fast hint, computers can only see what you teach them to see. No surprise here, pre-trained models do a very good job on images with known object categories, even when those images look somewhat different from what the model was trained on. The power of deep learning enables models to see the same object fairly well in a number of novel environments. Even in some experimental photos with unnatural color distributions or unexpected photo compositions, known object types can still be identified fairly well. In fact, models even perform well with masked human faces in the COVID era, as well as religious and fashionable clothing.
Pre-trained models are taught to see a finite amount of object types. When shown an image of a dog, a model only taught to see cats will not be able to properly detect our canine friend. This, of course, is because humans only train models to see a limited number of things based on their needs. For example, it is crucial for a self-driving system to detect a human crossing the road. Conversely, a model in a warehouse tracking packed boxes will likely ignore humans as background. One model’s foreground is another’s background. Thus, if one uses a pre-trained model on street view datasets for automatic annotation of warehouse imagery, those annotations would not be very useful. In short, a pre-trained model can’t know everything, it’ll only know the objects that it is taught.
You can’t expect a trained doctor to excel at being a lawyer without formal training. Similarly, pre-trained models are taught to do well in a limited set of domains, making domain shift a real limiting factor of these models’ utility. Specialized domains are often so technically complex that there is no true replacement for solid domain expertise—the devil truly is in the details. Whereas, publicly available pre-trained models use benchmark datasets which tend to tackle a few commonly used domains. While these datasets are extremely valuable in the advancement of AI, they are also known to limit or bias the types of problems and domains that get tackled. This limit is reflected in auto-annotation applications, where any domain of data not represented sufficiently or at all in publicly available datasets complicates model productionization. For example, medicine is a highly-specialized domain, where pre-trained models on benchmarks like ImageNet won’t work. Imagine a model trained to detect dogs and cats trying to catch the smallest pixel-level tumor growth on a CT scan—it just isn’t a solid strategy for success. It is for this reason that there are large efforts working toward collecting open-source medical imagery as well as pre-trained models for that particular use case. Similar efforts are made for geospatial pre-trained models. It is crucial to embed domain expertise in model development to ensure that a model will perform well in an operational environment.
Models are only as good as the data they have seen. People often view commonly used commercial models as a panacea, but forget that those models are trained to see certain objects, under certain conditions, and using specific image capture devices. It is right to be excited about the utility of pre-trained models, especially domain specific models that are being trained on increasingly larger publicly available datasets—they can save you time, money, and speed up the path to production. However, determining the utility and limits of a pre-trained model to address a highly specialized task at your enterprise is difficult. If you’re unsure about how to best use a pre-trained model to develop an automation solution that is customized for your enterprise, get in touch with CrowdAI today—that is precisely how an AI partner like us can supercharge your path to automating visual tasks.
For those specialized tasks that require further training, don’t forget to make your CrowdAI Explorer account and start labeling data today for free, no credit card required.