Annotator & Plot Tools
Annotator
is anexplorer
which provides a map of your data colored by labels.Let's walk through its components and how they interact with the
dataset
.
- You will find many of these components again in other
explorer
s.
Running Python right here
Think of this page as almost a Jupyter notebook. You can edit code and press Shift+Enter
to execute.
Behind the scene is a Binder-hosted Python environment. Below is the status of the kernel:
To download a notebook file instead, visit here.
This page addresses single components of hover
We are using code snippets to pick out parts of the annotation interface, so that the documentation can explain what they do.
- Please be aware that this is NOT how one would typically use
hover
. - Typical usage deals with recipes where the individual parts have been tied together.
Dependencies for local environments
When you run the code locally, you may need to install additional packages.
To run the text embedding code on this page, you need:
pip install spacy
python -m spacy download en_core_web_md
bokeh
plots in Jupyter, you need:
pip install jupyter_bokeh
If you are using JupyterLab older than 3.0, use this instead ([reference](https://pypi.org/project/jupyter-bokeh/)):
```shell
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install @bokeh/jupyter_bokeh
```
Preparation
As always, start with a ready-for-plot dataset:
from hover.core.dataset import SupervisableTextDataset import pandas as pd raw_csv_path = "https://raw.githubusercontent.com/phurwicz/hover-gallery/main/0.5.0/20_newsgroups_raw.csv" train_csv_path = "https://raw.githubusercontent.com/phurwicz/hover-gallery/main/0.5.0/20_newsgroups_train.csv" # for fast, low-memory demonstration purpose, sample the data df_raw = pd.read_csv(raw_csv_path).sample(400) df_raw["SUBSET"] = "raw" df_train = pd.read_csv(train_csv_path).sample(400) df_train["SUBSET"] = "train" df_dev = pd.read_csv(train_csv_path).sample(100) df_dev["SUBSET"] = "dev" df_test = pd.read_csv(train_csv_path).sample(100) df_test["SUBSET"] = "test" # build overall dataframe and ensure feature type df = pd.concat([df_raw, df_train, df_dev, df_test]) df["text"] = df["text"].astype(str) # this class stores the dataset throught the labeling process dataset = SupervisableTextDataset.from_pandas(df, feature_key="text", label_key="label")
import spacy import re from functools import lru_cache # use your preferred embedding for the task nlp = spacy.load("en_core_web_md") # raw data (str in this case) -> np.array @lru_cache(maxsize=int(1e+4)) def vectorizer(text): clean_text = re.sub(r"[\s]+", r" ", str(text)) return nlp(clean_text, disable=nlp.pipe_names).vector # any kwargs will be passed onto the corresponding reduction # for umap: https://umap-learn.readthedocs.io/en/latest/parameters.html # for ivis: https://bering-ivis.readthedocs.io/en/latest/api.html reducer = dataset.compute_nd_embedding(vectorizer, "umap", dimension=2)
Scatter Plot: Semantically Similar Points are Close Together
hover
labels data points in bulk, which requires selecting groups of homogeneous data.
The core of the annotator is a scatter plot and labeling widgets:
Showcase widgets here are not interactive
Plotted widgets on this page are not interactive, but only for illustration.
Widgets will be interactive when you actually use them (in your local environment or server apps like in the quickstart).
- be sure to use a whole
recipe
rather than individual widgets. - if you really want to plot interactive widgets on their own, try
from hover.utils.bokeh_helper import show_as_interactive as show
instead offrom bokeh.io import show
.- this works in your own environment but still not on the documentation page.
show_as_interactive
is a simple tweak ofbokeh.io.show
by turning standalone LayoutDOM to an application.
from bokeh.io import show, output_notebook output_notebook() # normally your would skip notebook_url or use Jupyter address notebook_url = 'localhost:8888' # special configuration for this remotely hosted tutorial from local_lib.binder_helper import remote_jupyter_proxy_url notebook_url = remote_jupyter_proxy_url from hover.recipes.subroutine import standard_annotator from bokeh.layouts import row, column annotator = standard_annotator(dataset) show(column( row(annotator.annotator_input, annotator.annotator_apply), annotator.figure, ), notebook_url=notebook_url)
Select Points on the Plot
On the right of the scatter plot, you can find tap, polygon, and lasso tools which can select data points.
View Tooltips with Mouse Hover
Embeddings are helpful but rarely perfect. This is why we have tooltips that show the detail of each point on mouse hover, allowing us to inspect points, discover patterns, and come up with new labels on the fly.
Show & Hide Subsets
Showing labeled subsets can tell you which parts of the data has been explored and which ones have not. With toggle buttons, you can turn on/off the display for any subset.
Showcase widgets here are not interactive
Plotted widgets on this page are not interactive, but only for illustration.
Widgets will be interactive when you actually use them (in your local environment or server apps like in the quickstart).
- be sure to use a whole
recipe
rather than individual widgets. - if you really want to plot interactive widgets on their own, try
from hover.utils.bokeh_helper import show_as_interactive as show
instead offrom bokeh.io import show
.- this works in your own environment but still not on the documentation page.
show_as_interactive
is a simple tweak ofbokeh.io.show
by turning standalone LayoutDOM to an application.
show(annotator.data_key_button_group, notebook_url=notebook_url)
Make Consecutive Selections
Ever selected multiple (non-adjacent) files in your file system using Ctrl/Command?
Similarly but more powerfully, you can make consecutive selections with a "keep selecting" option.
Showcase widgets here are not interactive
Plotted widgets on this page are not interactive, but only for illustration.
Widgets will be interactive when you actually use them (in your local environment or server apps like in the quickstart).
- be sure to use a whole
recipe
rather than individual widgets. - if you really want to plot interactive widgets on their own, try
from hover.utils.bokeh_helper import show_as_interactive as show
instead offrom bokeh.io import show
.- this works in your own environment but still not on the documentation page.
show_as_interactive
is a simple tweak ofbokeh.io.show
by turning standalone LayoutDOM to an application.
show(annotator.selection_option_box, notebook_url=notebook_url)
Selection option values: what do they do?
Basic set operations on your old & new selection. Quick intro here
none
: the default, where a new selectionB
simply replaces the old oneA
.union
:A ∪ B
, the new selection gets unioned with the old one.- this resembles the Ctrl/Command mentioned above.
intersection
:A ∩ B
, the new selection gets intersected with the old one.- this is particularly useful when going beyond simple 2D plots.
difference
:A ∖ B
, the new selection gets subtracted from the old one.- this is for de-selecting outliers.
Change Plot Axes
hover
supports dynamically choosing which embedding dimensions to use for your 2D plot. This becomes nontrivial, and sometimes very useful, when we have a 3D embedding (or higher):
Showcase widgets here are not interactive
Plotted widgets on this page are not interactive, but only for illustration.
Widgets will be interactive when you actually use them (in your local environment or server apps like in the quickstart).
- be sure to use a whole
recipe
rather than individual widgets. - if you really want to plot interactive widgets on their own, try
from hover.utils.bokeh_helper import show_as_interactive as show
instead offrom bokeh.io import show
.- this works in your own environment but still not on the documentation page.
show_as_interactive
is a simple tweak ofbokeh.io.show
by turning standalone LayoutDOM to an application.
reducer = dataset.compute_nd_embedding(vectorizer, "umap", dimension=3) annotator = standard_annotator(dataset) show(column( row(annotator.dropdown_x_axis, annotator.dropdown_y_axis), annotator.figure, ), notebook_url=notebook_url)
Text Search Widget: Include/Exclude
Keywords or regular expressions can be great starting points for identifying a cluster of similar points based on domain expertise.
You may specify a positive regular expression to look for and/or a negative one to not look for.
The annotator
will amplify the sizes of positive-match data points and shrink those of negative matches.
Showcase widgets here are not interactive
Plotted widgets on this page are not interactive, but only for illustration.
Widgets will be interactive when you actually use them (in your local environment or server apps like in the quickstart).
- be sure to use a whole
recipe
rather than individual widgets. - if you really want to plot interactive widgets on their own, try
from hover.utils.bokeh_helper import show_as_interactive as show
instead offrom bokeh.io import show
.- this works in your own environment but still not on the documentation page.
show_as_interactive
is a simple tweak ofbokeh.io.show
by turning standalone LayoutDOM to an application.
show(row(annotator.search_pos, annotator.search_neg), notebook_url=notebook_url)
Preview: Use Search for Selection in Finder
In a particular kind of plots called finder
(see later in the tutorials), the search widget can directly operate on your selection as a filter.
The Plot and The Dataset
When we apply labels through the annotator plot, it's acutally the dataset
behind the plot that gets immediately updated. The plot itself is not in direct sync with the dataset, which is a design choice for performance. Instead, we will use a trigger called PUSH
for updating the data entries to the plot.
PUSH: Synchronize from Dataset to Plots
Below is the full interface of the dataset
, where you can find a green "Push" button:
Showcase widgets here are not interactive
Plotted widgets on this page are not interactive, but only for illustration.
Widgets will be interactive when you actually use them (in your local environment or server apps like in the quickstart).
- be sure to use a whole
recipe
rather than individual widgets. - if you really want to plot interactive widgets on their own, try
from hover.utils.bokeh_helper import show_as_interactive as show
instead offrom bokeh.io import show
.- this works in your own environment but still not on the documentation page.
show_as_interactive
is a simple tweak ofbokeh.io.show
by turning standalone LayoutDOM to an application.
show(dataset.view(), notebook_url=notebook_url)
In a built-in recipe
, the "Push" button will update the latest data to every explorer
linked to the dataset
.