While machine learning systems have gotten much better at identifying objects within still frames, the next stage of this process is identifying individual objects within video, which could open up new considerations in brand placement, visual effects, accessibility features and more.
Google has been developing its tools on this front for some time, which has now lead to new advances in YouTube’s options, including the capacity to tag products displayed within video clips, and provide direct shopping options, facilitating broader eCommerce opportunities in the app.
And now, Facebook too is taking the next steps, with a new process that’s much better at singling out individual objects within video frames.
As explained by Facebook:
“Working in collaboration with researchers at Inria, we have developed a new method, called DINO, to train Vision Transformers (ViT) with no supervision. Besides setting a new state of the art among self-supervised methods, this approach leads to a remarkable result that is unique to this combination of AI techniques. Our model can discover and segment objects in an image or a video with absolutely no supervision and without being given a segmentation-targeted objective.”
That effectively automates the process, which is a major advance in computer vision technology.
And as noted, that will open up a range of new potential opportunities.
“Segmenting objects helps facilitate tasks ranging from swapping out the background of a video chat to…