Site tier-1 follow-up: per-model deep-dive page#23
Closed
MaxGhenis wants to merge 1 commit into
Closed
Conversation
Statically generates a dedicated page for each of the 12 models in data.json, using generateStaticParams so the entire site stays a pure static export. Each page renders: - Headline strip: provider mark, model name, global/US/UK scores, parse-rate pill — all sourced from globalStat.countryScores. - Hardest outputs: top-5 lowest-scoring output groups (country × outputGroup) computed by reusing buildAllRows/scorePrediction from lib/sensitivity.ts and lib/scoring.ts, aggregated the same way as the headline scorer. - Sample wrong predictions: up to 10 (scenario, variable) cells where relErr > 10% and score < 0.75, sorted by largest relative error, with prediction / ground-truth / error columns plus a collapsible model explanation and a link back to /#scenarios. - Back to leaderboard link. Reuses SiteHeader (alwaysExpanded + actionLink back to /), the Badge color scheme from ModelLeaderboard, and Tailwind v4 design-token classes throughout. Build smoke-test: `bun run build` produces the /model/[id] SSG route with all 12 model paths; `bun run lint` is clean. https://claude.ai/code/session_01DS3KJmEye7o7ff18RdthTC
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Contributor
Author
MaxGhenis
added a commit
that referenced
this pull request
Jun 10, 2026
Every model gets a statically generated page at /model/[id]: country ranks and headline pills, per-program scores sorted hardest-first, binary eligibility-flag accuracy computed from prediction rows, and the model's worst misses on positive references, each linking into the scenario explorer. Pages are server components over the bundled summary, so they ship no additional client JS, and each carries its own metadata for social previews. Leaderboard rows and the explorer's detail dialog link to them; the sitemap lists them. The explorer now mirrors its state into the URL (?scenario=..., ?cell=variable~model) via replaceState, applies deep links on mount (turning off the frontier-only filter when the linked model needs it), and clears both params on country switch since ids are country-specific. Deep-linking exposed a latent UX bug: the explanation sidecar fetch was viewport-gated, but an open dialog makes the background inert, so a deep-linked dialog could never trigger the fetch and showed "Loading explanation text" forever. Opening any detail dialog now starts the fetch directly. Supersedes the stale draft #23, rebuilt on the split-data stack. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
MaxGhenis
added a commit
that referenced
this pull request
Jun 10, 2026
* Add per-model pages and explorer deep links Every model gets a statically generated page at /model/[id]: country ranks and headline pills, per-program scores sorted hardest-first, binary eligibility-flag accuracy computed from prediction rows, and the model's worst misses on positive references, each linking into the scenario explorer. Pages are server components over the bundled summary, so they ship no additional client JS, and each carries its own metadata for social previews. Leaderboard rows and the explorer's detail dialog link to them; the sitemap lists them. The explorer now mirrors its state into the URL (?scenario=..., ?cell=variable~model) via replaceState, applies deep links on mount (turning off the frontier-only filter when the linked model needs it), and clears both params on country switch since ids are country-specific. Deep-linking exposed a latent UX bug: the explanation sidecar fetch was viewport-gated, but an open dialog makes the background inert, so a deep-linked dialog could never trigger the fetch and showed "Loading explanation text" forever. Opening any detail dialog now starts the fetch directly. Supersedes the stale draft #23, rebuilt on the split-data stack. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * Trigger CI after retarget to main Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #8 and #9. Adds a statically-generated per-model deep-dive page at
/model/[id]— one page per model present indata.json.Rendered sections
1. Headline strip (inside
SiteHeaderexpandedContent,alwaysExpanded)ProviderMark) + model name + provider labelglobalStat.countryScores), Parse rate (nParsed / n)2. Hardest outputs — top 5 lowest-scoring output groups for this model
(country, outputGroup)level usingbuildAllRows+scorePredictionfromlib/sensitivity.ts/lib/scoring.tsscoresPerCountryModel: per-row scores → output-group mean → displayed scoregetVariableLabel), country tag, and aBadge(same color thresholds asModelLeaderboard)3. Sample wrong predictions — up to 10 distinct
(country, scenario, variable)cells where relErr > 10% and score < 0.75<details>block with the model's explanation text/#scenariosfor the scenario explorer, plus the scenario ID4. Back to leaderboard link at page bottom
Static routes generation
generateStaticParamscollects all model IDs fromdashboard.global.modelStatsand the union of country-levelmodelStats, returning one{ id }entry per model. The current data produces 12 static routes:Library reuse
lib/scoring.ts—scorePrediction,metricTypeForVariablelib/sensitivity.ts—buildAllRowsScoreRow[]for all countries, filtered to the modellib/bootstrap.tsScoring math
For each
(country, outputGroup)pair, the displayed score is the mean of per-row scores (eachscorePredictionresult × 100) across all scenarios and person-expanded variables that map to that output group. This is equivalent to the inner two levels of the 3-level mean inscoresPerCountryModel.Smoke test
Build output excerpt:
Test plan
/model/gpt-5.5— headline shows Global / US / UK scores, parse rate<details>block/#scenarios//model/nonexistent-model— returns 404🤖 Generated with Claude Code
Generated by Claude Code