Soft Label & Joint Filters

hover filters can stack together.

This makes selections incredibly powerful.

Running Python right here

Think of this page as almost a Jupyter notebook. You can edit code and press Shift+Enter to execute.

Behind the scene is a Binder-hosted Python environment. Below is the status of the kernel:

To download a notebook file instead, visit here.

This page addresses single components of hover

We are using code snippets to pick out parts of the annotation interface, so that the documentation can explain what they do.

Please be aware that this is NOT how one would typically use hover.
Typical usage deals with recipes where the individual parts have been tied together.

Dependencies for local environments

When you run the code locally, you may need to install additional packages.

To run the text embedding code on this page, you need:

pip install spacy
python -m spacy download en_core_web_md

To render bokeh plots in Jupyter, you need:

pip install jupyter_bokeh

If you are using JupyterLab older than 3.0, use this instead ([reference](https://pypi.org/project/jupyter-bokeh/)):
```shell
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install @bokeh/jupyter_bokeh
```

Preparation

As always, start with a ready-for-plot dataset:

from hover.core.dataset import SupervisableTextDataset
import pandas as pd

raw_csv_path = "https://raw.githubusercontent.com/phurwicz/hover-gallery/main/0.5.0/20_newsgroups_raw.csv"
train_csv_path = "https://raw.githubusercontent.com/phurwicz/hover-gallery/main/0.5.0/20_newsgroups_train.csv"

# for fast, low-memory demonstration purpose, sample the data
df_raw = pd.read_csv(raw_csv_path).sample(400)
df_raw["SUBSET"] = "raw"
df_train = pd.read_csv(train_csv_path).sample(400)
df_train["SUBSET"] = "train"
df_dev = pd.read_csv(train_csv_path).sample(100)
df_dev["SUBSET"] = "dev"
df_test = pd.read_csv(train_csv_path).sample(100)
df_test["SUBSET"] = "test"

# build overall dataframe and ensure feature type
df = pd.concat([df_raw, df_train, df_dev, df_test])
df["text"] = df["text"].astype(str)

# this class stores the dataset throught the labeling process
dataset = SupervisableTextDataset.from_pandas(df, feature_key="text", label_key="label")

import spacy
import re
from functools import lru_cache

# use your preferred embedding for the task
nlp = spacy.load("en_core_web_md")

# raw data (str in this case) -> np.array
@lru_cache(maxsize=int(1e+4))
def vectorizer(text):
    clean_text = re.sub(r"[\s]+", r" ", str(text))
    return nlp(clean_text, disable=nlp.pipe_names).vector

# any kwargs will be passed onto the corresponding reduction
# for umap: https://umap-learn.readthedocs.io/en/latest/parameters.html
# for ivis: https://bering-ivis.readthedocs.io/en/latest/api.html
reducer = dataset.compute_nd_embedding(vectorizer, "umap", dimension=2)

Soft-Label Explorer

Active learning works by predicting labels and scores (i.e. soft labels) and utilizing that prediction. An intuitive way to plot soft labels is to color-code labels and use opacity ("alpha" by bokeh terminology) to represent scores.

SoftLabelExplorer delivers this functionality:

from bokeh.io import show, output_notebook

output_notebook()

# normally your would skip notebook_url or use Jupyter address
notebook_url = 'localhost:8888'

# special configuration for this remotely hosted tutorial
from local_lib.binder_helper import remote_jupyter_proxy_url
notebook_url = remote_jupyter_proxy_url

from hover.recipes.subroutine import standard_softlabel
from bokeh.layouts import row, column

softlabel = standard_softlabel(dataset)
show(softlabel.figure, notebook_url=notebook_url)

Filter Selection by Score Range

Similarly to finder, a softlabel plot has its own selection filter. The difference lies in the filter condition:

Showcase widgets here are not interactive

Plotted widgets on this page are not interactive, but only for illustration.

Widgets will be interactive when you actually use them (in your local environment or server apps like in the quickstart).

be sure to use a whole recipe rather than individual widgets.
if you really want to plot interactive widgets on their own, try from hover.utils.bokeh_helper import show_as_interactive as show instead of from bokeh.io import show.
- this works in your own environment but still not on the documentation page.
- show_as_interactive is a simple tweak of bokeh.io.show by turning standalone LayoutDOM to an application.

show(softlabel.score_filter, notebook_url=notebook_url)

Linked Selections & Joint Filters

When we plot multiple explorers for the same dataset, it makes sense to synchronize selections between those plots. hover recipes take care of this synchronization.

This also works with cumulative selections. Consequently, the cumulative toggle is synchronized too.

Since each filter is narrowing down the selections we make, joint filters is just set intersection, extended

from two sets (original selection + filter)
to N sets (original selection + filter A + filter B + ...)

The active_learning recipe is built of softlabel + annotator + finder, plus a few widgets for iterating the model-in-loop.

In the next tutorial(s), we will see more recipes taking advantage of linked selections and joint filters. Powerful indeed!