Skip to content

Audio Data

hover supports bulk-labeling audios through their URLs.

💡 Let's do a quickstart for audios and note what's different from texts.

This page assumes that you have know the basics

i.e. simple usage of dataset and annotator. Please visit the quickstart tutorial if you haven't done so.

Running Python right here

Think of this page as almost a Jupyter notebook. You can edit code and press Shift+Enter to execute.

Behind the scene is a Binder-hosted Python environment. Below is the status of the kernel:

To download a notebook file instead, visit here.

Dataset for audios

hover handles audios through their URL addresses. URLs are strings which can be easily stored, hashed, and looked up against. They are also convenient for rendering tooltips in the annotation interface.

Similarly to SupervisableTextDataset, we can build one for audios:

from hover.core.dataset import SupervisableAudioDataset
import pandas as pd

# this is a table of audio-MNIST (pronounced digit 0-9) urls, 100 audios per digit
example_csv_path = "https://raw.githubusercontent.com/phurwicz/hover-gallery/main/0.7.0/audio_mnist.csv"
df = pd.read_csv(example_csv_path).sample(frac=1).reset_index(drop=True)
df["SUBSET"] = "raw"
df.loc[500:800, 'SUBSET'] = 'train'
df.loc[800:900, 'SUBSET'] = 'dev'
df.loc[900:, 'SUBSET'] = 'test'

dataset = SupervisableAudioDataset.from_pandas(df, feature_key="audio", label_key="label")

# each subset can be accessed as its own DataFrame
dataset.dfs["raw"].head(5)

Vectorizer for audios

We can follow a URL -> content -> audio array -> vector path.

import requests
from functools import lru_cache

@lru_cache(maxsize=10000)
def url_to_content(url):
    """
    Turn a URL to response content.
    """
    response = requests.get(url)
    return response.content
import librosa
from io import BytesIO

@lru_cache(maxsize=10000)
def url_to_audio(url):
    """
    Turn a URL to audio data.
    """
    data, sampling_rate = librosa.load(BytesIO(url_to_content(url)))
    return data, sampling_rate
Caching and reading from disk

This guide uses @wrappy.memoize in place of @functools.lru_cache for caching.

  • The benefit is that wrappy.memoize can persist the cache to disk, speeding up code across sessions.

Cached values for this guide have been pre-computed, making it much master to run the guide.

import wrappy

@wrappy.memoize(cache_limit=10000, persist_path='custom_cache/audio_url_to_vector.pkl')
def vectorizer(url):
    """
    Averaged MFCC over time.
    Resembles word-embedding-average-as-doc-embedding for texts.
    """
    y, sr = url_to_audio(url)
    mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=32)
    return mfcc.mean(axis=1)

Embedding and Plot

This is exactly the same as in the quickstart, just switching to audio data:

# any kwargs will be passed onto the corresponding reduction
# for umap: https://umap-learn.readthedocs.io/en/latest/parameters.html
# for ivis: https://bering-ivis.readthedocs.io/en/latest/api.html
reducer = dataset.compute_nd_embedding(vectorizer, "umap", dimension=2)
from hover.recipes.stable import simple_annotator

interactive_plot = simple_annotator(dataset)

# ---------- SERVER MODE: for the documentation page ----------
# because this tutorial is remotely hosted, we need explicit serving to expose the plot to you
from local_lib.binder_helper import binder_proxy_app_url
from bokeh.server.server import Server
server = Server({'/my-app': interactive_plot}, port=5007, allow_websocket_origin=['*'], use_xheaders=True)
server.start()
# visit this URL printed in cell output to see the interactive plot; locally you would just do "https://localhost:5007/my-app"
binder_proxy_app_url('my-app', port=5007)

# ---------- NOTEBOOK MODE: for your actual Jupyter environment ---------
# this code will render the entire plot in Jupyter
# from bokeh.io import show, output_notebook
# output_notebook()
# show(interactive_plot, notebook_url='https://localhost:8888')
What's special for audios?

Tooltips

For text, the tooltip shows the original value.

For audios, the tooltip embeds the audio based on URL.

  • audios in the local file system shall be served through python -m http.server.
  • they can then be accessed through https://localhost:<port>/relative/path/to/file.

Search

For text, the search widget is based on regular expressions.

For audios, the search widget is based on vector cosine similarity.

  • the dataset has remembered the vectorizer under the hood and passed it to the annotator.
  • please let us know if you think there's a better way to search audios in this case.
    • dynamic time warping, due to its running time (> 10ms per pair for small 100x10 MFCC arrays), is too slow for search.
      • we are experimenting with subsampled signals and pre-selected data points (by vector similarity, for example).