Skip to content

Avoid reloading job rows after dispatch and schedule#742

Open
ImDineshSaini wants to merge 1 commit intorails:mainfrom
ImDineshSaini:claude/review-security-issue-MTKxa
Open

Avoid reloading job rows after dispatch and schedule#742
ImDineshSaini wants to merge 1 commit intorails:mainfrom
ImDineshSaini:claude/review-security-issue-MTKxa

Conversation

@ImDineshSaini
Copy link
Copy Markdown

In Job.dispatch_all and Job.schedule_all (the hot path of enqueue_all / ActiveJob.perform_all_later), the post-insert step rebuilt the returned set with:

where(id: .where(job_id: jobs.map(&:id)).pluck(:job_id))

which always issues two statements per execution table -- a pluck on the execution table and a follow-up SELECT ... FROM solid_queue_jobs WHERE id IN (...) -- to re-read rows we already hold in memory. The Job instances passed in are the ones just returned by create_all_from_active_jobs, so they are persisted and have ids.

Filter jobs in memory against the plucked execution ids instead.

Per enqueue_all batch:

  • dispatch_all: 4 queries -> 2 (drops the two job-table reloads)
  • schedule_all: 2 queries -> 1 (drops the job-table reload)

The avoided SELECT is the most expensive of the four: it scans the wide solid_queue_jobs row including the serialized arguments payload, so the saving is bytes-over-the-wire and not just a round trip.

Semantic equivalence:

  • Each job is inserted into at most one of ready_executions / blocked_executions / scheduled_executions in this code path, so the in-memory filter selects exactly the same job ids the prior where(id: ...) would have returned.
  • Both call sites (prepare_all_for_execution which concatenates with +, and Execution::Dispatching#dispatch_jobs which calls .map(&:id)) already coerce the result to an Array and do not depend on it being an ActiveRecord::Relation or on row order.
  • jobs is the same collection that was just queried back in create_all_from_active_jobs, so attribute freshness is unchanged versus the previous reload.

Verified against the existing job_test, ready_execution_test, dispatcher_test, and concurrency_controls_test suites on SQLite (55 runs, 0 failures).

In `Job.dispatch_all` and `Job.schedule_all` (the hot path of
`enqueue_all` / `ActiveJob.perform_all_later`), the post-insert step
rebuilt the returned set with:

  where(id: <Execution>.where(job_id: jobs.map(&:id)).pluck(:job_id))

which always issues two statements per execution table -- a `pluck`
on the execution table and a follow-up `SELECT ... FROM
solid_queue_jobs WHERE id IN (...)` -- to re-read rows we already
hold in memory. The `Job` instances passed in are the ones just
returned by `create_all_from_active_jobs`, so they are persisted and
have ids.

Filter `jobs` in memory against the plucked execution ids instead.

Per `enqueue_all` batch:

  * dispatch_all:  4 queries -> 2 (drops the two job-table reloads)
  * schedule_all:  2 queries -> 1 (drops the job-table reload)

The avoided `SELECT` is the most expensive of the four: it scans the
wide `solid_queue_jobs` row including the serialized `arguments`
payload, so the saving is bytes-over-the-wire and not just a round
trip.

Semantic equivalence:

  * Each job is inserted into at most one of ready_executions /
    blocked_executions / scheduled_executions in this code path, so
    the in-memory filter selects exactly the same job ids the prior
    `where(id: ...)` would have returned.
  * Both call sites (`prepare_all_for_execution` which concatenates
    with `+`, and `Execution::Dispatching#dispatch_jobs` which calls
    `.map(&:id)`) already coerce the result to an Array and do not
    depend on it being an `ActiveRecord::Relation` or on row order.
  * `jobs` is the same collection that was just queried back in
    `create_all_from_active_jobs`, so attribute freshness is
    unchanged versus the previous reload.

Verified against the existing `job_test`, `ready_execution_test`,
`dispatcher_test`, and `concurrency_controls_test` suites on SQLite
(55 runs, 0 failures).
@ImDineshSaini
Copy link
Copy Markdown
Author

ImDineshSaini commented Apr 30, 2026

Hi @rosa and @jeremy — would you mind taking a quick look at this when you get a chance? It's a small, isolated change in Job::Executable and Job::Schedulable that drops the redundant solid_queue_jobs reload after dispatch/schedule by filtering the already-in-memory jobs array against the plucked execution ids (4→2 queries on dispatch_all, 2→1 on schedule_all).

Happy to add benchmarks, a assert_no_queries / assert_queries_count test around enqueue_all, or anything else the contributor guidelines expect — just let me know what would make this easiest to review.

FYI - @dhh

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant