Skip to content

feat: Support tracking issue impact over time (#156)#177

Open
duncanleo wants to merge 120 commits into
mainfrom
duncanleo/data-overhaul
Open

feat: Support tracking issue impact over time (#156)#177
duncanleo wants to merge 120 commits into
mainfrom
duncanleo/data-overhaul

Conversation

@duncanleo
Copy link
Copy Markdown
Member

@duncanleo duncanleo commented Mar 21, 2026

Summary

Overhauls mrtdown-data into the canonical reviewed data repository for MRTDown, with typed packages, deterministic file-backed tooling, and a migrated append-only issue dataset.

What Changed

  • Migrated canonical data into the new data/ layout:
    • static entities under data/{station,line,service,operator,town,landmark}
    • issues under data/issue/YYYY/MM/<issue_id>/
    • append-only evidence.ndjson and impact.ndjson per issue
  • Introduced the monorepo package structure:
    • @mrtdown/core for schemas and shared period/state helpers
    • @mrtdown/fs for file-backed repositories and writers
    • @mrtdown/cli for creation, validation, manifest, listing, show, and repair tooling
    • @mrtdown/triage for LLM-assisted evidence triage and replay utilities
  • Removed the old API/database runtime from this repo; runtime serving and Postgres import now belong in mrtdown-site.
  • Added GitHub Pages publishing for generated manifests and downloadable data archives.
  • Added Changesets/npm publishing workflow for shared packages.
  • Added architecture docs describing the canonical-data vs runtime-data split and the two-repo mrtdown-data / mrtdown-site model.
  • Replayed and normalized issue impact data, including repairs for empty impacts, degraded-service extraction, recurring maintenance periods, and legacy issue normalization.

Breaking Changes

  • This repo no longer ships the old Hono/API/DuckDB runtime.
  • Consumers should use the new packages and/or the generated GitHub Pages data artifacts instead of importing from the previous src/api, src/db, or legacy schema paths.

Review Notes

This is a large structural migration. The most important review areas are:

  • package boundaries and exports
  • canonical data layout compatibility with mrtdown-site
  • CLI validation behavior
  • generated manifest/archive workflow
  • issue impact replay correctness

Validation

TODO before merge:

  • npm ci
  • npm run build
  • npm run test
  • npm run cli -- -- validate
  • Confirm Pages artifact generation with npm run cli -- -- manifest and npm run cli -- -- pages-index

Fixes #156

@duncanleo duncanleo self-assigned this Mar 21, 2026
@duncanleo duncanleo changed the title feat: Support tracking issue impact over time feat: Support tracking issue impact over time (#156) Mar 21, 2026
@duncanleo duncanleo force-pushed the duncanleo/data-overhaul branch 13 times, most recently from ca15f1a to 965bee3 Compare March 22, 2026 16:43
@duncanleo duncanleo force-pushed the duncanleo/data-overhaul branch from 0b0ff52 to 78cdc9d Compare March 23, 2026 16:07
@duncanleo duncanleo force-pushed the duncanleo/data-overhaul branch from 2e64332 to 46a53bb Compare April 4, 2026 10:27
@duncanleo
Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector
Copy link
Copy Markdown

💡 Codex Review

const DATA_DIR = resolve(import.meta.dirname, '../../../../data');

P1 Badge Point ingestContent to the repository data directory

ingestContent resolves DATA_DIR with four .. segments from packages/triage/src/util/ingestContent, which lands at <repo>/packages/data instead of <repo>/data; that directory does not exist in this repo, so the first repository read (via FileStore.listDir in triage flow) will throw ENOENT and break webhook ingestion. This is production-impacting because ingestViaWebhook imports and executes this code path for every incoming message.


'impact.scopeItems.serviceId',
'impact.scopeItems.stationId',
'impact.scopeItems.fromStationId',
'impact.scopeItems.toStationId',

P2 Badge Query against actual impact event fields in issue search

The Fuse search keys target impact.scopeItems.*, but IssueRepository.get() stores impact data under impactEvents with event-specific fields (serviceScopes, entity.serviceId, etc.), so these clauses never match and searches by affected services/stations silently fail. This degrades FindIssuesTool matching and can cause triage to miss existing related issues.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support tracking impact over time

1 participant