Video Annotation Best Practices with Annika Deurlington
In this post, we sit down with Annika Deurlington, Commercial Program Manager at CrowdAI. She shares best practices for video annotation and reveals hidden industry tricks.
Image annotation is crucial to creating successful computer vision models. You have to teach an image what a tree is before you can expect it to recognize one from an image on its own. However, it can often be difficult to create quality annotations when you don’t know where to start. Here are four industry secrets to labeling images for AI. Skip to the bottom to find out why image annotation is important to successful computer vision models!
Yes, annotating for computer vision is often just drawing boxes or polygons around the object of interest in hundreds of photos until the computer can accurately detect that object on its own in new media. But it’s also ensuring consistency, accuracy, and precision so that you get the best performing model possible. You probably keep hearing the saying “garbage in, garbage out”, meaning a model is truly only as good as the annotated data shown to it. So, here are four best practices to start using today when annotating images for computer vision!
A model excels with properly labeled data. Let’s look at these pills below. If we only label the pills that are facing upright but want our model to identify pills in all orientations, we will run into some trouble. Similarly, if we label the object in some of our photos but not in others, we would be effectively teaching a computer that “pills upside down aren’t considered pills”. So, instead, every single pill that appears in the image should be labeled. Then, our model can easily identify pills in new media with precision.
Bounding boxes and polygon masks are some of the most common tools used in image annotation. Sometimes, a box around the object you’re looking for is good enough, and other times, you need a pixel-perfect detection for every image. Object detection models use bounding boxes (which are really just rectangles) to understand whether a certain object is present in an image and where it is located.
On the other hand, image segmentation is where we want to understand not only if an object is in an image and where it is, but also the precise shape of the object itself. For segmentation, we want to trace the exact contours of the object of interest by drawing a polygon mask. This is a much more resource intensive annotation task, but yields far more rich training data that can be used a variety of ways.
Whether we’re using bounding boxes or polygons, we always want our annotations to catch as much detail about an object as possible. We have to teach the model which pixels are important and which aren’t.
This can be hard when zoomed out, but don’t be afraid to get close and zoom in! Get close to the cut off between the object and the background so you can clearly see where to stop your polygon or bounding box. However, one thing to keep in mind is that you don't want to overcorrect and make your annotations too small, as you might accidentally cut off part of the feature you're trying to label.
Occluded objects are objects that are not fully in view, such as some of the airplanes in the photo below. You can see that for a few of them, the nose of the plane is covered by a jet bridge. So how should we annotate the planes that are only partially visible?
It all starts from our initial objective. What are our goals for the model? Whether you should annotate occluded objects or not depends on what you want your model outputs to look like. For example, if the objective of our model is to count planes, we do want to annotate occluded planes so that our model can learn to recognize them and include them in the total count.
There are a few typical approaches for dealing with occluded objects:
Like we mentioned earlier, we would probably pretend the plane is fully visible and annotate it if we wanted to count how many total planes were in a photo.
Another example is the photo below. Imagine we are trying to build data for a model that is trying to detect cracks on the wall—we might trace all cracks except those in the area behind the tree, because we can’t predict what the hidden part of the crack looks like.
What’s the moral of the story? Always think about the business outcome you want your model to achieve. As long as we build with this goal in mind and are consistent in our annotation process, we’re on the right track to producing a successful model.
Most organizations rely on distributed teams to annotate data. These teams are often given master instructions by a manager whose job is to ensure data is being annotated accurately and consistently.
Let’s keep it simple. If you have instructions that outline what you should and should not annotate, follow them. Otherwise, every image you annotate will be different from what others annotate. If you’re annotating on your own, setting standards helps ensure you’re streamlining the quality of your work. Consider drafting a set of annotation rules for yourself!
There you have it: a short but necessary guide to proper image annotation. It may seem like common sense, but it’s really important that you follow these best practices to ensure that your model is producing the best results. After all, what is the use of a model that doesn’t produce your intended result?
Want to see more AI tips and tricks like these in your inbox? Sign up for the CrowdAI newsletter or use the form below to get in touch with an AI expert today. As always, don’t forget to jumpstart your AI journey and start annotating your imagery and video for free with our powerful and free media annotation suite by signing up for a CrowdAI Explorer account today, don’t worry you won’t need your credit card.
A picture is worth a thousand words, they say. For human eyes, it is easy to parse out familiar objects in a picture: in the picture below most of us can recognize a squirrel and a tree. But how can we teach a computer to see the way humans do? This is where image annotation plays a huge role in training computer vision models to pick out from images and videos anything from everyday objects to pixel-level details.
In short, image annotation is when you add information about what's in that photo. This information could be just simple text, like "squirrel" or an actual shape you draw on top of it, like a box or polygon. In our case, we annotate images with the intent of training a CV model. Like humans, computer vision models learn through example. We need to provide examples of what we want our model to find so that it can learn to identify those things. It’s not too complicated to start—think of just drawing a box around everything that looks like a tree. There you go, you now have your first annotated image.
Does your photo show cracks on a roof? Is there a faulty soda can along the production line? In order to find answers to these questions, we need to create annotated data. A machine can only learn from what you teach it, which is why properly labeled data is important. When you hear people say “data is the new code” this is what they mean—model performance is directly tied to the quality of the data fed to it. If we teach a machine to look for raisins when we are only trying to look for nuts, the machine will not give us the results we want.