Hearing what the ocean is really saying — MBARI Annual Report: 2018

Our ocean is alive with sound, most of which we are never lucky enough to hear. Acoustic energy, or sound waves, propagate fast and far across Earth’s largest habitat. Highly evolved life forms in the ocean produce, receive, and interpret sound for the essential life activities of communication, navigation, and foraging. This invisible auditory landscape provides scientists with the opportunity to learn about the oceanic realm by listening carefully. It also presents a responsibility to recognize how human noise pollution is impacting marine life and what can be done to mitigate these impacts.

MBARI embarked on a listening journey under the leadership of Senior Research Specialist John Ryan in the summer of 2015. This foray was made possible by the technological infrastructure of the institute’s cabled observatory that connects the deep sea to the shore, the Monterey Accelerated Research System—also known as MARS. MBARI remotely operated vehicle pilots gracefully connected a hydrophone (an underwater microphone) to the cabled observatory, spooled out its extension cord, and set it to capture the sounds of the sea. Since that moment, sounds that propagate through the ocean off Monterey Bay have been streaming through the hydrophone to a shore-side computer, enabling many unanticipated discoveries.

We are constantly flooded by sound in our daily lives, from the gentlest breeze to the loud din of a construction project. If we were to record all the sound that reaches our ears, the amount of data would quickly become overwhelming. Sound in the ocean presents yet a greater data challenge because it requires recording within and beyond the range of human hearing. For example, dolphin echolocation clicks can reach frequencies more than five times the upper limit of human hearing. To capture this high end, the MARS hydrophone samples sound at more than a quarter million times each second. The resulting volume of information is massive, about two terabytes per month, requiring advanced technology to identify relevant signals from the mountain of raw data.

One approach to sound analysis begins with translating audio data into a visual representation called a spectrogram, which contains a tremendous amount of information. While a person can readily learn to detect and classify sounds using spectrograms, the vast amount of information pouring in from the MARS hydrophone is overwhelming for manual analysis. To address this challenge, MBARI software engineers, led by Danelle Cline, are turning to machine-learning techniques to sift through the data and harness the power of cloud computing to process it quickly.

Example of a spectrogram from a single day of MARS recordings (1 November 2016) with time across the horizontal axis and frequency range of 5 to 100,000 Hz on the vertical axis. This is far greater than the range of human hearing (about 20 to 20,000 Hz). The intensity of sound is represented by color, with warmer colors indicating higher intensity. In this one day of the Monterey Bay soundscape, whales, dolphins, earth processes (wind and earthquakes), and human activities (boats) were recorded.

The current state-of-the-art in processing images, video, speech, and audio uses a form of machine learning called deep learning, in which computers learn by example. Similar to the way humans can solve problems based on accumulated knowledge from many experiences, deep-learning involves “convolutional neural networks” that are pre-trained on massive archives of images, including everything from dogs to cars. Using a method called transfer learning, the computer models can then apply that training to something new, such as visualizations of whale calls.

The computer “learning” can be supervised, in which humans provide examples so that the system can recognize certain patterns, or it can be unsupervised, in which the system learns by itself from the data. In both of these forms, sound is presented as a visual representation, the spectrogram.

Building on a neural network that was trained using many and diverse images, transfer learning adds network layers to enable classification of images representing whale vocalizations.

Consider the example of two blue whale calls. The supervised output might be whether a sound is a blue whale “A” call or “D” call, or whether it is not A or D. To train the model to differentiate between the blue whale calls using a supervised method, two types of data are needed: the input data, which in our case is a collection of spectrograms, each localized around only one call, and the output (the label): a blue whale A call or D call. Given enough data, the model will learn how to generalize whether a spectrogram represents one of these specific calls. The team has found deep-learning models that can very accurately classify sounds in spectrograms, even those that are ambiguous to a human, because of their ability to learn which features distinguish complex structures (such as the whale sounds). The model can classify whale calls with an average of 95 percent accuracy.

While training the models is very rapid, collecting the information to train these models is time-consuming because it requires human effort to verify that the training data are correct. While supervised learning is proving highly effective for relatively simple calls, some marine mammal vocalizations are much more complex and require different methods. For example, male humpback whales vocalize in complex sequences of units (analogous to musical notes) that are organized into phrases. Phrases are repeated to create themes, and a collection of themes comprises a song. A single song can last for more than 30 minutes, and can be repeated with little to much improvisation in song sessions that can last more than a day. In the Monterey Bay region, humpback whale songs are prominent in the winter soundscape, occurring as much as 86 percent of time during a month.

Unsupervised learning is showing great potential in the analysis of such complex humpback songs. Here a spectrogram is again used as the input, but the output is not specified; the goal is for the computer to learn what naturally occurring sounds are present in the data, from the data themselves. These methods do not require prior labeling of the information; instead, the themes arise from the natural structure of the data.

One method, called “topic modeling”, has proven useful for finding patterns in genetic data, images, and social networks. Topic modeling is also effective for analyzing ocean sounds. An algorithm automatically analyzes features in slices of acoustic data from the original humpback song recordings to discover the themes that run through them, how those themes are connected to each other, and how they change over time. This automated method has the potential to find song patterns not easily detected by humans—and to do it faster and with less subjectivity than humans.

Sample topic model output shows the ability to detect humpback song units. Above, the spectrogram from a short segment of humpback song; dark blue features represent individual song units. Below, the topic model output shows how it was able to distinguish different types of song units (represented by different colors; blue corresponds with the quiet background in the spectrogram above where song is absent).

Different unsupervised learning techniques can analyze sounds of other marine mammals. Several species of dolphins, beaked whales, and other toothed cetaceans use Monterey Bay as foraging habitat, locating their prey underwater by echolocation. Over the past three years, the MARS hydrophone has recorded over 250 million echolocation clicks. The MBARI researchers are using unsupervised methods to detect different types of echolocation clicks, many of which can be identified to species and used to track the feeding activity of the animals making them. Although supervised learning has produced excellent results, the team sees unsupervised methods as part of a more comprehensive plan to help discover sounds and understand the broad soundscape in the future.

Together, the cabled observatory that provides the persistent presence in the deep sea and the software developed to analyze the information it conveys are enabling development of an auditory system for the sea. Understanding the ocean’s sonic dimension is critical to protecting the health and biodiversity of marine ecosystems. MBARI’s investments in infrastructure—MARS, ships, and ROVs—and data processing tools are key to facilitating science and engineering partnerships, which also informs resource management and conservation efforts. Listening to what the ocean is telling us—revealing the wonders of its soundscape—is both a source of inspiration and a call to be good stewards of the ocean.

Developing a data processing system to hear what the ocean is really saying

Giving the ocean the attention it deserves