Video Annotation Best Practices with Annika Deurlington
April 18, 2022

Video Annotation Best Practices with Annika Deurlington

Casey Chang
Casey Chang

In this post, we sit down with Annika Deurlington, Commercial Program Manager at CrowdAI, in charge of managing enterprise engagements with the Fortune 500. Below, she shares best practices for video annotation and reveals hidden industry tricks.

Casey: Video annotation is known to be complex. How can we make sure that we aren’t making mistakes? 

Annika: Once you’ve determined that video annotation is suitable for your use case, the first thing to think about is not even about annotating itself but it’s about project set up. Having shorter videos (<50 frames) can help lower the chances of making mistakes. This is especially important when you’re annotating multiple types of objects that appear and disappear. In these situations, we aim for videos that are around 20 frames. What can end up happening in long tasks is that it becomes difficult to keep track of hundreds to thousands of annotations, so when you make a mistake, it can be easy to make more mistakes while attempting to correct the original one. In addition, long tasks can end up taking hours to annotate, which is less satisfying since you don’t get to submit tasks as quickly.

Casey: And how can we create these shorter tasks? Do we just split up our videos into a bunch of 30 second videos?

Annika: Essentially!! To create shorter tasks on the CrowdAI platform, you need to know your camera’s frame-per-second (fps) rate. Fps is just like how fast you flip through a book: the faster you flip, the more fluid the motion is in the video. With fps, you can “chunk” a video into shorter videos. Less frames means less mistakes!

Casey: If it’s so easy to make mistakes, why do people do video annotation?

Annika: The big advantage of video tasks is that you can annotate many instances of an item of interest quickly. We do this by drawing a label in one frame, copying and pasting that label into a future frame, and then adjusting labels in intermediate frames that were auto-annotated by interpolating between the original label’s location and the copy/pasted label’s location. Like imagine a car driving from left to right: I can annotate the first frame when the car is on the left and then annotate the car when it reaches the right and all the frames in the middle will be auto-annotated for me - I just need to review them and make tweaks to check the car is still inside the rectangle. If we were annotating individual images instead of video, we would have to draw a rectangle around the car every time it moved. 

Because the CrowdAI platform uses interpolation in the video annotating process, it maximize our efficiency if we minimize the amount of times we manually annotate the video. If we manually draw at every frame, interpolation won’t propagate and you’ll have to go back and edit each frame instead of letting our platform do the work for you.

Casey: So everything you’ve talked about makes annotating video faster? 

Annika: Yes. You can generate more training data for your model more quickly by using video annotations. However, it’s important that the training data is diverse, which means we still aim for annotating at least 200 unique videos (not chunked videos). Think of it like this: if I train a model with 200 videos instead of 200 images, I am actually training my model with ~4000 images if my videos are ~20 frames long. It might take me a little longer to annotate 200 videos instead of 200 images, but I may get more information from that extra increment of time invested.

Casey: And not just fast but more consistent too?

Annika: In general, yes. Annotations can be more consistent because (1) one person is annotating an entire video so they’re likely to label the same objects in the same way and (2) interpolation makes it easy to apply the same label across many frames. Quality control checks on your video annotations are still important, though! Because video annotations can move quickly, QC early and often to prevent runaway mistakes.


Video data annotating can be intimidating. But these simple and easy tips can entirely change your video annotation process to make it easy for you. Remember, the reason we are using machine learning and video annotation is to be efficient, effective, and give you the results you need. 

Interested in learning more? Chat with one of our AI experts today and who knows, maybe you’ll get to speak with Annika!

Understanding AI