Skip to content

Material facet label gaps: OpenContext URIs in facet_summaries missing from vocab_labels #161

@rdhyee

Description

@rdhyee

Problem

While reconciling material facets between progressive_globe and notebook queries, I found two material URIs that appear in facet counts but have no matching pref_label in vocab_labels.parquet.

This causes blank/None friendly labels in diagnostics and forces fallback to URI-tail rendering.

Repro (DuckDB)

-- URIs present in material facet counts
SELECT facet_value, count
FROM read_parquet('https://data.isamples.org/isamples_202601_facet_summaries.parquet')
WHERE facet_type='material'
  AND (facet_value ILIKE '%organicanimalproduct%'
       OR facet_value ILIKE '%plantmaterial%');

Returns:

  • https://w3id.org/isample/opencontext/material/0.1/organicanimalproduct (261)
  • https://w3id.org/isample/opencontext/material/0.1/plantmaterial (1)
-- No matching labels in vocab_labels
WITH facet AS (
  SELECT DISTINCT facet_value AS uri
  FROM read_parquet('https://data.isamples.org/isamples_202601_facet_summaries.parquet')
  WHERE facet_type='material'
), labels AS (
  SELECT DISTINCT uri
  FROM read_parquet('https://data.isamples.org/vocab_labels.parquet')
  WHERE lang='en'
)
SELECT f.uri
FROM facet f
LEFT JOIN labels l USING (uri)
WHERE l.uri IS NULL
ORDER BY f.uri;

Returns exactly those same 2 URIs.

Additional context

scripts/build_vocab_labels.py currently builds labels from the TTL list including:

  • opencontext_material_extension.ttl

But that TTL appears to contain:

  • .../organicplantmaterial
  • .../organicanimalmaterial

and not the two URIs present in facet data (.../plantmaterial, .../organicanimalproduct).

So this looks like a vocabulary/data-term drift (or legacy aliases), not a rendering bug.

Why this issue fits here

This repo owns:

  • scripts/build_vocab_labels.py
  • tutorial consumers (progressive_globe.qmd, isamples_explorer.qmd)
  • substrate docs (SERIALIZATIONS.md, data.qmd, how-to-use.qmd)

Even if canonical fix is upstream (vocabulary terms/aliases), this repo is the right integration point to track and guard against missing label coverage.

Proposed actions

  1. Add a CI/data check: every facet_summaries(facet_type IN material/context/object_type).facet_value must resolve to vocab_labels.uri.
  2. Decide canonical handling for these two OpenContext URIs:
    • map to canonical terms during export, or
    • add alias/deprecated concept coverage in upstream vocab, and ensure vocab_labels includes them.
  3. Add temporary fallback map in UI or label-build pipeline so users do not see missing labels.
  4. Document label-coverage expectation in SERIALIZATIONS.md.

Acceptance criteria

  • LEFT JOIN coverage query above returns 0 missing URIs for material facets.
  • Both Search Explorer and Progressive Globe render friendly labels for all material facet values.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions