Machine learning offers new insights for analyzing video images

Machine learning is a rapidly growing field of study in which computer algorithms are developed to perform a specific task without relying on explicit instructions. There are several machine learning projects underway at MBARI that take advantage of the institute’s extensive video and image management and analysis capabilities. Two of these projects, FathomNet and Video and Annotation Reference System (VARS) Annotation Assistance, aim to use machine learning to expedite the time-consuming process of manually identifying animals and other objects of interest that are captured in underwater video recordings. These expert identifications, or annotations, make it possible for scientists to search for marine organism trends and cross-reference those observations with other environmental measurements.

Annotation and localization of a giant larvacean, Bathochordaeus mcnutti, as it leaves its mucus house. These data are then added to the FathomNet database, which will be used to train machine learning models to automate the detection and classification of objects in underwater imagery and video. These augmented data are generated using MBARI’s VARS Localizer software tool.

MBARI Video Lab staff, for decades, have manually annotated video recordings for research purposes, repeatedly illustrating the value of this ocean-observing technique. Recently, platforms at MBARI that collect video imagery have expanded to include three remotely operated vehicles (ROVs); the i2MAP autonomous underwater vehicle (AUV); the Benthic Rover; the Mesobot; and time-lapse camera systems. This “data deluge” presents both a challenge and an opportunity: how do we capitalize on the valuable information it contains without requiring an armada of highly skilled video annotators?

Diagram illustrating the bottleneck that FathomNet aims to address, which is a lack of labeled (or annotated and localized) image data. Labeled data are required to train and test machine learning models that can then be subsequently used on unlabeled data.

To manage this deluge, MBARI is working with partners at CVision AI and Massachusetts Institute of Technology’s (MIT) Media Lab to build FathomNet, a publicly available database that makes use of data gathered from a number of sources, including MBARI’s VARS observations database. The FathomNet team will be largely responsible for generating training data for machine learning algorithms. For each taxon of interest, training data must be generated that the algorithms can use to “learn” how to differentiate one taxonomic group from another. These images need to be annotated in such a way that the object of interest and its location are identified within the frame. The millions of images currently within the VARS database can be used to generate machine learning training data, however, these annotated images require localization of objects of interest. The FathomNet project team is also developing the tools needed to localize images, generate more images from digital video, and export images and localization data for algorithm development and testing. The long-term goal for FathomNet is for it to be used broadly across the ocean community via an externally accessible web portal.

Existing VARS annotations contain information about what objects are in the frame, but excludes information about where it is in the frame. Localization, or the process of associating position data by generating bounding boxes, in addition to object annotations, are required to train machine learning models. Tools to localize objects within VARS image and video data, like VARS Localizer, have been developed to address this need.

The VARS Annotation Assistance project will develop tools for localizing and tracking objects within the video and integrate these into a new VARS video analysis workflow. This new workflow will include the ability to generate training data for FathomNet within VARS and, eventually, the ability to automate the analysis and annotation of objects within video. Once the automated analysis is complete, annotators will be able to view the machine-generated object proposals within the VARS annotation user interface. Annotators can then make corrections to the machine-generated proposals as needed, which will then be fed into the newly trained machine learning training algorithms for further refinement.

MBARI’s software tool, the ML-tracking GUI, is used to quickly validate machine-generated annotation and localization proposals from analyzed ROV dive footage. Tools like these have been developed by MBARI engineers to speed up the data curation process to generate machine learning training data and validate machine learning models.

In addition to speeding up the process of analyzing MBARI’s deep-sea video and developing training data and algorithms for automated detection and classification of marine organisms, these tools can be used for real-time, in situ object detection and tracking. An MBARI project called ML-Tracking—involving members of MBARI’s Bioinspiration Lab and CVision AI, and with funding support from the National Science Foundation (NSF)—has already demonstrated the use of machine learning (ML) models in-situ to seek out specific organisms, classify, and track them in real-time. These classification and tracking models are then fed into the vehicle control system, and enables acquisition and tracking of targets of interest with very little to no pilot interaction. These technologies will eventually be shared with the broader oceanographic community to assist in their video analysis projects. Thanks to the ever-increasing use of video recordings for ocean science and engineering, the future looks bright for machine learning at MBARI and beyond.

This video shows machine learning proposals of the giant larvacean species, Bathochordaeus mcnutti, and its mucous inner filter. To perform this automated tracking and classification, a machine learning model was first trained using localized images of this species from other observations contained within MBARI’s VARS database and aggregated into FathomNet. Models like these are being used as part of the ML-Tracking project, which aims to integrate machine learning with underwater vehicle control algorithms to automate the acquisition and long-duration tracking of animals in situ.

Machine learning offers new insights for analyzing video images

Deciphering Cascadia’s history of mega-earthquakes using MBARI’s unique deep-sea vehicles