Skip to content

feat: knowledge data standard + configs/ skeleton (PR3 of OpenAI Agents SDK migration)#74

Merged
keli-wen merged 11 commits intomasterfrom
feat/knowledge-and-configs
Apr 27, 2026
Merged

feat: knowledge data standard + configs/ skeleton (PR3 of OpenAI Agents SDK migration)#74
keli-wen merged 11 commits intomasterfrom
feat/knowledge-and-configs

Conversation

@keli-wen
Copy link
Copy Markdown
Contributor

@keli-wen keli-wen commented Apr 26, 2026

Summary

Lands quantmind/knowledge/ as a data standard with three shapes plus the quantmind/configs/ skeleton. The knowledge schema is the contract every downstream module (flows, utils, retrieval, future KnowledgeStore) builds on; this PR ships schema only — the storage layer is specified but lands separately.

quantmind/knowledge/ — three-shape data standard

  • BaseKnowledge — root for every shape. Carries id (auto UUID), item_type, schema_version, as_of (mandatory), created_at, source: SourceRef (typed provenance — no bare strings), extraction: ExtractionRef | None, confidence, citations, tags, disclaimers, plus an embedding_text() contract that subclasses MUST override so the future store knows what to embed.
  • FlattenKnowledge — atomic-card shape. Subclasses: News, Earnings, PaperKnowledgeCard.
  • TreeKnowledge — hierarchical-artifact shape. Holds root_node_id plus a flat nodes: dict[UUID, TreeNode] map. Helpers: root(), children_of(), walk_dfs(), find_path(). Default embedding_text() delegates to root. Subclass: Paper (whole paper as section tree).
  • GraphKnowledge — placeholder. The class exists for type-hinting BaseKnowledge | FlattenKnowledge | TreeKnowledge | GraphKnowledge, but __init_subclass__ raises NotImplementedError until the shape is finalised in a later PR.

Citation gains optional tree_id / node_id anchors for tree-rooted citations. The full design rationale (PageIndex-style navigation, embedding as pre-filter not replacement, future SQLite + sqlite-vec backend, Python-interface contract for KnowledgeStore) is captured in the local design spec.

quantmind/configs/

Unchanged from the original PR3: BaseFlowCfg + BaseInput plus per-flow <Name>FlowCfg and <Name>Input discriminated unions for paper / news / earnings.

Other changes

  • openai-agents>=0.14 added as a hard dep so BaseFlowCfg.model_settings: ModelSettings | None is honoured.
  • import-linter contracts: knowledge is a leaf; configs may only depend on knowledge.
  • quantmind/config/registry._discover_flows_in_path made resilient to OSError from Path.rglob (transitional code; deleted in PR5).
  • basedpyright's reportIncompatibleVariableOverride disabled so the Pydantic v2 idiom of narrowing a str field to Literal[...] in subclasses (used throughout the discriminator pattern) does not produce spurious errors.

Old quantmind/models/{content,paper,analysis}.py and quantmind/config/* stay in place; transitional parsers/, sources/, flow/, llm/ still depend on them and they get deleted in PR4-PR5.

Part of #71.

Test plan

  • bash scripts/verify.sh passes locally (259 tests, coverage 68.43% ≥ 60%)
  • CI verify workflow green on the PR
  • python -c "from quantmind.knowledge import Paper, PaperKnowledgeCard, News, Earnings, FlattenKnowledge, TreeKnowledge, GraphKnowledge; print('OK')" prints OK

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

keli-wen added 10 commits April 26, 2026 16:54
- Disable `reportIncompatibleVariableOverride` globally so the Pydantic v2
  pattern (`Paper.item_type: Literal["paper"]` narrowing
  `KnowledgeItem.item_type: str`) does not trigger spurious type errors.
- Make `quantmind/config/registry._discover_flows_in_path` resilient to
  `OSError` from `Path.rglob` so the local verify loop is reliable on macOS
  (AppTranslocation tmpdirs can crash mid-scan). Code is transitional and
  gets deleted in PR5.
Replaces the flat KnowledgeItem hierarchy with BaseKnowledge + three sibling
shapes:

- FlattenKnowledge: atomic cards (News, Earnings, PaperKnowledgeCard)
- TreeKnowledge:    hierarchical artifacts (Paper)
- GraphKnowledge:   placeholder; subclassing blocked until shape finalises

BaseKnowledge gains typed provenance (SourceRef / ExtractionRef instead of
bare strings), an auto UUID id, schema_version, created_at, and an
embedding_text() contract that subclasses MUST override so the future store
layer knows what to embed. Citations grow optional tree_id / node_id anchors
for tree-rooted citations.

Paper is now a TreeKnowledge (sectioned paper); the previous flat Paper
becomes PaperKnowledgeCard (the distilled summary card pointing at a paper
via paper_id). News and Earnings remain flat and reparent to
FlattenKnowledge unchanged in domain payload.

Tests cover the new shapes end-to-end (45 assertions across base, tree,
graph, paper, news, earnings); the wider verify loop stays green at 259
tests, 68.4% coverage.

Storage layer (KnowledgeStore Protocol + SQLite + sqlite-vec backend) is
specified but lands separately so this PR remains schema-only.
@keli-wen keli-wen changed the title feat: knowledge/ + configs/ skeleton (PR3 of OpenAI Agents SDK migration) feat: knowledge data standard + configs/ skeleton (PR3 of OpenAI Agents SDK migration) Apr 26, 2026
…dtrip tests

Polishes PR3 ahead of merge with four small additions surfaced in review:

- `BaseKnowledge.is_extracted()` / `freshness(now=None)` / `with_tags(*tags)`
  shared helpers (every shape benefits): provenance check, staleness
  measurement, and frozen-friendly tag append. `with_tags` is idempotent on
  duplicates so callers do not have to dedup themselves.
- `Citation.tree_id` / `node_id` are now exercised by a JSON round-trip
  test that proves UUID anchors survive serialisation.
- `Factor` and `Thesis` ship as stubs (FlattenKnowledge subclasses) so
  ``from quantmind.knowledge import Factor, Thesis`` works today; the full
  payloads land with their respective flows.
- New ``test_roundtrip.py`` exercises ``model_dump_json`` →
  ``model_validate_json`` on every concrete subclass, including
  ``Paper.nodes: dict[UUID, TreeNode]`` whose JSON-stringified keys must
  rehydrate back to ``UUID`` keys for the SDK ``output_type=`` contract.

`FlattenKnowledge` itself stays empty intentionally: its subclasses share
no payload fields, so any "common method" would be hollow. Cross-shape
helpers belong on `BaseKnowledge` instead.

`typing_extensions` (already pulled in transitively by Pydantic) is used
for ``Self`` so ``with_tags`` returns the correct subclass type on Python
3.10.
@keli-wen keli-wen self-assigned this Apr 27, 2026
@keli-wen keli-wen added the enhancement New feature or request label Apr 27, 2026
@keli-wen keli-wen merged commit 6276872 into master Apr 27, 2026
2 checks passed
@keli-wen keli-wen deleted the feat/knowledge-and-configs branch April 27, 2026 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant