Finder & Selection Filter
Finderis anexplorerfocused on search.
It can help you select points using a filter based on search results.
Running Python right here
Think of this page as almost a Jupyter notebook. You can edit code and press Shift+Enter to execute.
Behind the scene is a Binder-hosted Python environment. Below is the status of the kernel:
To download a notebook file instead, visit here.
This page addresses single components of hover
We are using code snippets to pick out parts of the annotation interface, so that the documentation can explain what they do.
- Please be aware that this is NOT how one would typically use
hover. - Typical usage deals with recipes where the individual parts have been tied together.
Dependencies for local environments
When you run the code locally, you may need to install additional packages.
To run the text embedding code on this page, you need:
pip install spacy
python -m spacy download en_core_web_md
bokeh plots in Jupyter, you need:
pip install jupyter_bokeh
If you are using JupyterLab older than 3.0, use this instead ([reference](https://pypi.org/project/jupyter-bokeh/)):
```shell
jupyter labextension install @jupyter-widgets/jupyterlab-manager
jupyter labextension install @bokeh/jupyter_bokeh
```
More Angles -> Better Results
Explorers other than annotator are specialized in finding additional insight to help us understand the data. Having them juxtaposed with annotator, we can label more accurately, more confidently, and even faster.
Preparation
As always, start with a ready-for-plot dataset:
from hover.core.dataset import SupervisableTextDataset import pandas as pd raw_csv_path = "https://raw.githubusercontent.com/phurwicz/hover-gallery/main/0.5.0/20_newsgroups_raw.csv" train_csv_path = "https://raw.githubusercontent.com/phurwicz/hover-gallery/main/0.5.0/20_newsgroups_train.csv" # for fast, low-memory demonstration purpose, sample the data df_raw = pd.read_csv(raw_csv_path).sample(400) df_raw["SUBSET"] = "raw" df_train = pd.read_csv(train_csv_path).sample(400) df_train["SUBSET"] = "train" df_dev = pd.read_csv(train_csv_path).sample(100) df_dev["SUBSET"] = "dev" df_test = pd.read_csv(train_csv_path).sample(100) df_test["SUBSET"] = "test" # build overall dataframe and ensure feature type df = pd.concat([df_raw, df_train, df_dev, df_test]) df["text"] = df["text"].astype(str) # this class stores the dataset throught the labeling process dataset = SupervisableTextDataset.from_pandas(df, feature_key="text", label_key="label")
import spacy
import re
from functools import lru_cache
# use your preferred embedding for the task
nlp = spacy.load("en_core_web_md")
# raw data (str in this case) -> np.array
@lru_cache(maxsize=int(1e+4))
def vectorizer(text):
clean_text = re.sub(r"[\s]+", r" ", str(text))
return nlp(clean_text, disable=nlp.pipe_names).vector
# any kwargs will be passed onto the corresponding reduction
# for umap: https://umap-learn.readthedocs.io/en/latest/parameters.html
# for ivis: https://bering-ivis.readthedocs.io/en/latest/api.html
reducer = dataset.compute_nd_embedding(vectorizer, "umap", dimension=2)
Filter Toggles
When we use lasso or polygon select, we are describing a shape. Sometimes that shape is not accurate enough -- we need extra conditions to narrow down the data.
Just like annotator, finder has search widgets. But unlike annotator, finder has a filter toggle which can directly intersect what we selected with what meets the search criteria.
Showcase widgets here are not interactive
Plotted widgets on this page are not interactive, but only for illustration.
Widgets will be interactive when you actually use them (in your local environment or server apps like in the quickstart).
- be sure to use a whole
reciperather than individual widgets. - if you really want to plot interactive widgets on their own, try
from hover.utils.bokeh_helper import show_as_interactive as showinstead offrom bokeh.io import show.- this works in your own environment but still not on the documentation page.
show_as_interactiveis a simple tweak ofbokeh.io.showby turning standalone LayoutDOM to an application.
from bokeh.io import show, output_notebook
output_notebook()
# normally your would skip notebook_url or use Jupyter address
notebook_url = 'localhost:8888'
# special configuration for this remotely hosted tutorial
from local_lib.binder_helper import remote_jupyter_proxy_url
notebook_url = remote_jupyter_proxy_url
from hover.recipes.subroutine import standard_finder
from bokeh.layouts import row, column
finder = standard_finder(dataset)
show(row(
column(finder.search_pos, finder.search_neg),
finder.search_filter_box,
), notebook_url=notebook_url)
Next to the search widgets is a checkbox. The filter will stay active as long as the checkbox is.
How the filter interacts with selection options
Selection options apply before filters.
hover memorizes your pre-filter selections, so you can keep selecting without having to tweaking the filter toggle.
-
Example:
- suppose you have previously selected a set of points called
A. - then you toggled a filter
f, giving youA∩FwhereFis the set satisfyingf. - now, with selection option "union", you select a set of points called
B. - your current selection will be
(A ∪ B) ∩ F, i.e.(A ∩ F) ∪ (B ∩ F).- similarly, you would get
(A ∩ B) ∩ Ffor "intersection" and(A ∖ B) ∩ Ffor "difference".
- similarly, you would get
- if you untoggle the filter now, you selection would be
A ∪ B.
- suppose you have previously selected a set of points called
-
In the later tutorials, we shall see multiple filters in action together.
- spoiler:
F = F1 ∩ F2 ∩ ...and that's it!
- spoiler:
Stronger Highlight for Search
finder also colors data points based on search criteria, making them easier to find.
Showcase widgets here are not interactive
Plotted widgets on this page are not interactive, but only for illustration.
Widgets will be interactive when you actually use them (in your local environment or server apps like in the quickstart).
- be sure to use a whole
reciperather than individual widgets. - if you really want to plot interactive widgets on their own, try
from hover.utils.bokeh_helper import show_as_interactive as showinstead offrom bokeh.io import show.- this works in your own environment but still not on the documentation page.
show_as_interactiveis a simple tweak ofbokeh.io.showby turning standalone LayoutDOM to an application.
show(column(
row(finder.search_pos, finder.search_neg),
finder.figure,
), notebook_url=notebook_url)
It can help you select points using a filter based on search results.