Skip to content

pretext: vectorize field on pipeline + PUT /pipelines/{id}/vectorize#151

Open
beckyconning wants to merge 1 commit into
masterfrom
bc-v4-vectorize
Open

pretext: vectorize field on pipeline + PUT /pipelines/{id}/vectorize#151
beckyconning wants to merge 1 commit into
masterfrom
bc-v4-vectorize

Conversation

@beckyconning
Copy link
Copy Markdown
Contributor

Summary

vectorization-spec-v4 / LD13 + LD15 API surface:

  • Pipeline schema gains vectorize: boolean (nullable, optional). When true, loads of datasets in the pipeline produce 384-dim embeddings landed in a sibling PRETEXT table on the destination. Absent or false preserves pre-feature behaviour byte-identically.
  • New endpoint PUT /pipelines/{id}/vectorize for runtime opt-in/out at pipeline scope (mirrors the pii-config pattern).
  • POST /pipelines and the new PUT both return 400 when vectorize=true is set against a destination that does not support the Snowflake VECTOR column type (LD15 — Snowflake-family destinations only).

Companion PRs

  • precog/services#3478 — implementation
  • precog/precog-destination-snowflake#493 — public SnowflakeKind.parseConfigValues
  • precog/devops#1536 — staging pin

Verification

End-to-end verified on staging via new pipeline customer service vectorize (Intercom Conversations source, vectorize: true):

SELECT COUNT(*) FROM procurement_staging.CUSTOMER_SERVICE_VECTORIZE.PRETEXT;
-- 180

SELECT VECTOR_COSINE_SIMILARITY(vec, vec) FROM PRETEXT LIMIT 1;
-- ~1.0

Merge-readiness summary: Linear PRD-212.

Test plan

  • Confirm OpenAPI spec validates clean.
  • Confirm the new vectorize field is correctly typed as nullable boolean.
  • Confirm PUT /pipelines/{id}/vectorize request/response shapes match what the implementation in precog/services#3478 ships.

🤖 Generated with Claude Code

vectorization-spec-v4 / LD13 + LD15:
- pipeline schema gains `vectorize: boolean` (nullable, optional). When
  true, loads of datasets in the pipeline produce 384-dim IBM Granite-30M
  embeddings landed in a sibling `PRETEXT` table on the destination.
  Absent or false preserves pre-feature behaviour byte-identically.
- New endpoint PUT /pipelines/{id}/vectorize for runtime opt-in/out at
  pipeline scope (mirrors the pii-config pattern).
- POST /pipelines and the new PUT both return 400 when vectorize=true is
  set against a destination that does not support the Snowflake VECTOR
  column type (LD15 — Snowflake-family destinations only).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant