Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 41 additions & 21 deletions artifacts/venue_correction_validation_latest.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Venue Correction Validation Scorecard

Generated: 2026-05-03T17:06:52+00:00
Generated: 2026-05-05T00:24:36+00:00

Correction method: `distance_mean_shrinkage_v1 (latest prior-season only)`

Expand All @@ -10,27 +10,27 @@ Training snapshot: `schema=v5; seasons=20092010-20252026; rows=1,854,812; adjust

| Gate | Result | Metric |
|------|--------|--------|
| Held-out log loss non-worse | PASS | delta = -0.000017 |
| Held-out log loss non-worse | PASS | delta = -0.000015 |
| Home-ice over-correction guardrail | PASS | removed = -0.013, max = 0.500 |
| Distance/location residual z-scores | FAIL | blocking regimes = 24, supported regimes = 4, max abs(z) = 4.067, limit < 2.000 |
| Event-frequency residual z-scores | FAIL | blocking regimes = 4, supported regimes = 23, max abs(z) = 3.572, limit < 2.000 |
| Distance/location residual z-scores | FAIL | blocking regimes = 10, supported regimes = 18, max abs(z) = 4.067, limit < 2.000 |
| Event-frequency residual z-scores | FAIL | blocking regimes = 5, supported regimes = 22, max abs(z) = 3.572, limit < 2.000 |

## Summary Metrics

- Overall pass: FAIL
- Holdout rows: 1,525,907
- Distance residual venue-seasons evaluated: 532
- Distance residual gate mode: `regime_aware`
- Distance blocking regimes: 24
- Distance supported regimes: 4
- Distance blocking regimes: 10
- Distance supported regimes: 18
- Event-frequency residual venue-seasons evaluated: 525
- Event-frequency residual gate mode: `regime_aware`
- Event-frequency blocking regimes: 4
- Event-frequency supported regimes: 23
- Baseline log loss: 0.229272
- Corrected log loss: 0.229255
- Event-frequency blocking regimes: 5
- Event-frequency supported regimes: 22
- Baseline log loss: 0.229270
- Corrected log loss: 0.229254
- Baseline home advantage: 0.001848
- Corrected home advantage: 0.001872
- Corrected home advantage: 0.001873
- Worst distance/location residual: `20092010:Madison Square Garden`
- Worst event-frequency residual: `20112012:Prudential Center`

Expand All @@ -39,27 +39,47 @@ Training snapshot: `schema=v5; seasons=20092010-20252026; rows=1,854,812; adjust

| Metric | Venue-season | z | Classification | Prior roll | Centered roll | Population anomaly share | Evidence | Known prior |
|--------|--------------|---|----------------|------------|---------------|--------------------------|----------|-------------|
| `distance_location` | `20092010:Madison Square Garden` | -4.067 | `persistent_bias` | n/a | -3.114 | 0.032 | NO | YES |
| `distance_location` | `20172018:Bell MTS Place` | 3.123 | `unexplained_or_confounded` | n/a | 1.456 | 0.091 | NO | NO |
| `distance_location` | `20192020:United Center` | -3.121 | `unexplained_or_confounded` | -1.237 | -1.840 | 0.062 | NO | NO |
| `distance_location` | `20222023:SAP Center at San Jose` | -2.885 | `unexplained_or_confounded` | 0.700 | 0.009 | 0.062 | NO | NO |
| `distance_location` | `20092010:Madison Square Garden` | -4.067 | `temporary_supported_regime` | n/a | -3.114 | 0.032 | YES | YES |
| `distance_location` | `20172018:Bell MTS Place` | 3.123 | `temporary_supported_regime` | n/a | 1.456 | 0.091 | YES | NO |
| `distance_location` | `20192020:United Center` | -3.121 | `temporary_supported_regime` | -1.237 | -1.840 | 0.062 | YES | NO |
| `distance_location` | `20222023:SAP Center at San Jose` | -2.885 | `temporary_supported_regime` | 0.700 | 0.009 | 0.062 | YES | NO |
| `distance_location` | `20182019:NYCB Live/Nassau Coliseum` | -2.838 | `unexplained_or_confounded` | n/a | -1.383 | 0.031 | NO | NO |
| `distance_location` | `20202021:Amalie Arena` | -2.801 | `unexplained_or_confounded` | 0.476 | -0.896 | 0.065 | NO | NO |
| `distance_location` | `20122013:Wells Fargo Center` | 2.690 | `unexplained_or_confounded` | -0.618 | 0.947 | 0.067 | NO | NO |
| `distance_location` | `20112012:American Airlines Center` | 2.640 | `unexplained_or_confounded` | -0.110 | -0.144 | 0.031 | NO | NO |
| `distance_location` | `20222023:Little Caesars Arena` | 2.635 | `unexplained_or_confounded` | -0.547 | 0.596 | 0.062 | NO | NO |
| `distance_location` | `20212022:Enterprise Center` | -2.628 | `unexplained_or_confounded` | -0.049 | -0.280 | 0.061 | NO | NO |
| `distance_location` | `20122013:Wells Fargo Center` | 2.690 | `temporary_supported_regime` | -0.618 | 0.947 | 0.067 | YES | NO |
| `distance_location` | `20112012:American Airlines Center` | 2.640 | `temporary_supported_regime` | -0.110 | -0.144 | 0.031 | YES | NO |
| `distance_location` | `20222023:Little Caesars Arena` | 2.635 | `temporary_supported_regime` | -0.547 | 0.596 | 0.062 | YES | NO |
| `distance_location` | `20212022:Enterprise Center` | -2.628 | `temporary_supported_regime` | -0.049 | -0.280 | 0.061 | YES | NO |
| `event_frequency` | `20112012:Prudential Center` | -3.572 | `persistent_bias` | -3.033 | -3.103 | 0.033 | YES | NO |
| `event_frequency` | `20152016:Prudential Center` | -3.485 | `persistent_bias` | -2.771 | -2.592 | 0.067 | YES | NO |
| `event_frequency` | `20102011:Prudential Center` | -3.155 | `persistent_bias` | -2.910 | -3.212 | 0.033 | YES | NO |
| `event_frequency` | `20182019:Scotiabank Arena` | 2.982 | `temporary_supported_regime` | n/a | 2.445 | 0.031 | YES | NO |
| `event_frequency` | `20132014:Prudential Center` | -2.967 | `persistent_bias` | -3.103 | -2.771 | 0.033 | YES | NO |
| `event_frequency` | `20092010:Prudential Center` | -2.910 | `persistent_bias` | n/a | -3.033 | 0.033 | YES | NO |
| `event_frequency` | `20092010:Prudential Center` | -2.910 | `temporary_supported_regime` | n/a | -3.033 | 0.033 | YES | NO |
| `event_frequency` | `20202021:Amalie Arena` | -2.845 | `unexplained_or_confounded` | -0.292 | -1.640 | 0.032 | NO | NO |
| `event_frequency` | `20252026:American Airlines Center` | -2.785 | `temporary_supported_regime` | 0.085 | -1.350 | 0.062 | YES | NO |
| `event_frequency` | `20142015:Prudential Center` | -2.765 | `persistent_bias` | -3.040 | -3.073 | 0.067 | YES | NO |
| `event_frequency` | `20232024:Nationwide Arena` | 2.607 | `temporary_supported_regime` | 0.538 | 1.465 | 0.062 | YES | NO |

## Distance-Location Paired Diagnostics

- Primary distance gate: venue-season corrected-distance residuals with visiting-team paired evidence stratified by shot type and manpower state.

- Candidate distance residuals: 28
- Supported paired distance regimes: 17

| Venue-season | z | Paired diff | 95% CI | d | Pairs | Evidence | Evidence classification | Regime classification |
|--------------|---|-------------|--------|---|-------|----------|-------------------------|-----------------------|
| `20092010:Madison Square Garden` | -4.067 | -8.167 | [-9.935, -5.944] | -1.647 | 23 | YES | `real_scorekeeper_regime_supported` | `temporary_supported_regime` |
| `20172018:Bell MTS Place` | 3.123 | 1.529 | [0.249, 2.838] | 0.421 | 30 | YES | `real_scorekeeper_regime_supported` | `temporary_supported_regime` |
| `20192020:United Center` | -3.121 | -3.010 | [-4.433, -1.513] | -0.760 | 27 | YES | `real_scorekeeper_regime_supported` | `temporary_supported_regime` |
| `20222023:SAP Center at San Jose` | -2.885 | -2.810 | [-3.981, -1.618] | -0.821 | 31 | YES | `real_scorekeeper_regime_supported` | `temporary_supported_regime` |
| `20182019:NYCB Live/Nassau Coliseum` | -2.838 | -0.598 | [-2.050, 0.838] | -0.185 | 18 | NO | `hockey_context_confounded` | `unexplained_or_confounded` |
| `20202021:Amalie Arena` | -2.801 | -3.881 | [-5.373, -2.479] | -1.649 | 9 | NO | `insufficient_evidence` | `unexplained_or_confounded` |
| `20122013:Wells Fargo Center` | 2.690 | 2.319 | [0.910, 3.648] | 0.849 | 14 | YES | `real_scorekeeper_regime_supported` | `temporary_supported_regime` |
| `20112012:American Airlines Center` | 2.640 | 1.273 | [0.396, 2.218] | 0.561 | 23 | YES | `real_scorekeeper_regime_supported` | `temporary_supported_regime` |
| `20222023:Little Caesars Arena` | 2.635 | 2.150 | [0.885, 3.461] | 0.584 | 31 | YES | `real_scorekeeper_regime_supported` | `temporary_supported_regime` |
| `20212022:Enterprise Center` | -2.628 | -3.291 | [-4.394, -2.089] | -0.986 | 31 | YES | `real_scorekeeper_regime_supported` | `temporary_supported_regime` |

## Event-Frequency Diagnostics

Primary frequency gate: sample-adequate `regular_season:training_attempts`
Expand All @@ -82,4 +102,4 @@ Primary frequency gate: sample-adequate `regular_season:training_attempts`

## Notes

Generated from live SQLite data with forward-chaining temporal CV. Each shot uses the latest venue distance adjustment from a season before the shot's season; same-season venue corrections are not used for holdout rows. Distance residual z-scores are venue-season corrected-distance mean z-scores. Rolling venue-regime diagnostics use prior-only rolling estimates for production-safe context and centered rolling estimates only for exploratory historical-spike labeling. Event-frequency residual z-scores use sample-adequate regular-season training attempts as the primary gate; blocked-shot and all-attempt frequencies are reported as diagnostics and remain outside the current shot-level xG training contract.
Generated from live SQLite data with forward-chaining temporal CV. Each shot uses the latest venue distance adjustment from a season before the shot's season; same-season venue corrections are not used for holdout rows. Distance residual z-scores are venue-season corrected-distance mean z-scores. Distance/location candidates are annotated with paired visiting-team evidence stratified by shot type and manpower state; this diagnostic uses the in-memory prior-corrected distances and does not mutate shot_events or venue_bias_corrections. Rolling venue-regime diagnostics use prior-only rolling estimates for production-safe context and centered rolling estimates only for exploratory historical-spike labeling. Event-frequency residual z-scores use sample-adequate regular-season training attempts as the primary gate; blocked-shot and all-attempt frequencies are reported as diagnostics and remain outside the current shot-level xG training contract.
30 changes: 19 additions & 11 deletions docs/xg_model_components/04_scorekeeper_bias.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,26 +15,34 @@ Estimate and correct rink/venue scorer effects that distort event recording and
- `scripts/export_venue_correction_validation.py` exports the Phase 2.5.4
scorecard once a metrics JSON has been generated from a current database.
The scorecard gates are held-out log-loss non-worsening, home-ice
over-correction, max distance/location residual venue z-score, and max
sample-adequate event-frequency residual venue z-score.
over-correction, distance/location residuals, and sample-adequate
event-frequency residuals. Residual z-scores mark candidate venue-seasons
for regime-aware review rather than acting as automatic vetoes.
- `scripts/export_venue_correction_validation_from_db.py` generates that
metrics payload directly from SQLite with forward-chaining temporal CV and
prior-season-only venue distance corrections under the shared model-training
contract. It also computes normalized event-frequency diagnostics by
venue-season, event group, and game-type scope. The primary frequency gate
uses sample-adequate regular-season training attempts; blocked-shot and
all-attempt frequencies are diagnostic only. The 2026-05-01 live v5 refresh
passes held-out log-loss and home-ice guardrails but fails the residual
corrected-distance z-score gate (`max |z| = 4.067`) and event-frequency
residual gate (`max |z| = 3.572`), so the current correction remains
exploratory rather than a production xG training feature.
- The 2026-05-03 rolling venue-regime extension adds a less brittle
venue-season, event group, and game-type scope plus paired distance-location
diagnostics from in-memory prior-corrected distances. The distance diagnostic
compares each visiting team's corrected shot distance at a venue against that
same team's away shots elsewhere in the same season, stratified by shot type
and manpower state. The primary frequency gate uses sample-adequate
regular-season training attempts; blocked-shot and all-attempt frequencies
are diagnostic only. The 2026-05-05 live v5 refresh uses the regime-aware
residual gate. It passes held-out log-loss and home-ice guardrails but still
fails the residual corrected-distance gate (`max |z| = 4.067`, 10 blocking
regimes) and event-frequency residual gate (`max |z| = 3.572`, 5 blocking
regimes), so the current correction remains exploratory rather than a
production xG training feature.
- The 2026-05-03 rolling venue-regime extension, expanded with paired
distance evidence on 2026-05-05, adds a less brittle
acceptance path for historically real scorer spikes. `src/venue_bias.py`
now computes prior-only rolling residual estimates for production-safe
context, centered rolling estimates for exploratory historical diagnosis,
and regime labels: `persistent_bias`, `temporary_supported_regime`, and
`unexplained_or_confounded`. `evaluate_venue_correction_scorecard()` can
use those labels so supported temporary or persistent regimes are reported
use those labels so `|z| >= 2` is a candidate residual rather than an
automatic veto. Supported temporary or persistent regimes are reported
without automatically failing the correction layer. Unexplained/confounded
residuals, population-wide shifts, insufficient evidence, held-out log-loss
harm, and home-ice over-correction remain blocking.
Expand Down
Loading