Skip to content

[Perf] Split feed repost branch by entity type#794

Merged
raymondjacobson merged 1 commit intomainfrom
ray/perf-feed-query-rewrite
May 8, 2026
Merged

[Perf] Split feed repost branch by entity type#794
raymondjacobson merged 1 commit intomainfrom
ray/perf-feed-query-rewrite

Conversation

@raymondjacobson
Copy link
Copy Markdown
Member

Summary

Rewrite the feed query so the repost branch handles track-type and playlist-type reposts separately. Each side INNER-joins against just the entity type it needs (tracks_pkey or playlists_pkey), eliminating the upfront 94k-row hash that the planner was building over every public playlist on every call.

Why

The old query did:

LEFT JOIN tracks    ON repost_type = 'track'  AND repost_item_id = track_id    AND ... 
LEFT JOIN playlists ON repost_type != 'track' AND repost_item_id = playlist_id AND ...
WHERE (tracks.track_id IS NOT NULL OR playlists.playlist_id IS NOT NULL)

To satisfy the playlist LEFT JOIN, Postgres scanned the idx_playlist_status partial index for every public playlist (94,306 rows on prod, 4.3 MB hash, 2,420 buffers) on every feed call — even though the average user's repost set contains only a handful of playlist-type reposts.

Per pg_stat_statements this query had two variants with mean exec times of 860ms and 4,478ms; in Axiom, /v1/users/:userId/feed shows p50 4.4s / p95 13s — the worst signed-in endpoint by total time.

Impact

EXPLAIN ANALYZE on prod read replica, user 20 (1,752 follows):

Metric Before After
Public-playlist hash build 94,306 rows / 2,420 buffers gone
Per-row entity lookup n/a tracks_pkey + playlists_pkey

Warm-cache timings are similar in my tests (PG plan caching obscures the win at the read-replica). The savings show up in cold cache and at the tail — production tail latency should drop noticeably with the upfront hash gone.

Risk

  • Logic-preserving rewrite. Branch 1a returns 'track'-typed entities, Branch 1b returns whatever repost_type was ('playlist' or 'album') — same as before. The outer GROUP BY (entity_type, entity_id) and max(created_at) semantics are unchanged.
  • New TestUsersFeed covers both repost branches and the owned-track/owned-playlist branches.

Test plan

  • go test -count=1 ./api/... (full suite, all green)
  • TestUsersFeed exercises track-repost, playlist-repost, owned-track, and owned-playlist branches plus the no-followees empty case
  • Local server hits /v1/users/Wem1e/feed?limit=20 (Phuture, 1752 follows): 500-750ms warm

🤖 Generated with Claude Code

The feed query LEFT JOINed both tracks and playlists onto every
repost row to filter out reposts pointing at deleted/unlisted/
private entities. Postgres satisfied the playlist side by hashing
*every* public playlist (~94k rows) on every call, regardless of
how few playlist-type reposts the followee set contained.

Splitting the branch by repost_type lets each side use a per-row
INNER JOIN against the entity (tracks_pkey or playlists_pkey),
removing the upfront 94k-row hash entirely. Cold-cache and tail
latency benefit even when warm timings look similar.

Adds a regression test exercising both repost branches plus the
owned-track and owned-playlist branches; no prior coverage existed.
@dylanjeffers
Copy link
Copy Markdown
Contributor

love it

@raymondjacobson raymondjacobson merged commit 150e4b8 into main May 8, 2026
5 checks passed
@raymondjacobson raymondjacobson deleted the ray/perf-feed-query-rewrite branch May 8, 2026 01:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants