Avoid reloading job rows after dispatch and schedule#742
Open
ImDineshSaini wants to merge 1 commit intorails:mainfrom
Open
Avoid reloading job rows after dispatch and schedule#742ImDineshSaini wants to merge 1 commit intorails:mainfrom
ImDineshSaini wants to merge 1 commit intorails:mainfrom
Conversation
In `Job.dispatch_all` and `Job.schedule_all` (the hot path of
`enqueue_all` / `ActiveJob.perform_all_later`), the post-insert step
rebuilt the returned set with:
where(id: <Execution>.where(job_id: jobs.map(&:id)).pluck(:job_id))
which always issues two statements per execution table -- a `pluck`
on the execution table and a follow-up `SELECT ... FROM
solid_queue_jobs WHERE id IN (...)` -- to re-read rows we already
hold in memory. The `Job` instances passed in are the ones just
returned by `create_all_from_active_jobs`, so they are persisted and
have ids.
Filter `jobs` in memory against the plucked execution ids instead.
Per `enqueue_all` batch:
* dispatch_all: 4 queries -> 2 (drops the two job-table reloads)
* schedule_all: 2 queries -> 1 (drops the job-table reload)
The avoided `SELECT` is the most expensive of the four: it scans the
wide `solid_queue_jobs` row including the serialized `arguments`
payload, so the saving is bytes-over-the-wire and not just a round
trip.
Semantic equivalence:
* Each job is inserted into at most one of ready_executions /
blocked_executions / scheduled_executions in this code path, so
the in-memory filter selects exactly the same job ids the prior
`where(id: ...)` would have returned.
* Both call sites (`prepare_all_for_execution` which concatenates
with `+`, and `Execution::Dispatching#dispatch_jobs` which calls
`.map(&:id)`) already coerce the result to an Array and do not
depend on it being an `ActiveRecord::Relation` or on row order.
* `jobs` is the same collection that was just queried back in
`create_all_from_active_jobs`, so attribute freshness is
unchanged versus the previous reload.
Verified against the existing `job_test`, `ready_execution_test`,
`dispatcher_test`, and `concurrency_controls_test` suites on SQLite
(55 runs, 0 failures).
Author
|
Hi @rosa and @jeremy — would you mind taking a quick look at this when you get a chance? It's a small, isolated change in Job::Executable and Job::Schedulable that drops the redundant solid_queue_jobs reload after dispatch/schedule by filtering the already-in-memory jobs array against the plucked execution ids (4→2 queries on dispatch_all, 2→1 on schedule_all). Happy to add benchmarks, a assert_no_queries / assert_queries_count test around enqueue_all, or anything else the contributor guidelines expect — just let me know what would make this easiest to review. FYI - @dhh Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In
Job.dispatch_allandJob.schedule_all(the hot path ofenqueue_all/ActiveJob.perform_all_later), the post-insert step rebuilt the returned set with:where(id: .where(job_id: jobs.map(&:id)).pluck(:job_id))
which always issues two statements per execution table -- a
pluckon the execution table and a follow-upSELECT ... FROM solid_queue_jobs WHERE id IN (...)-- to re-read rows we already hold in memory. TheJobinstances passed in are the ones just returned bycreate_all_from_active_jobs, so they are persisted and have ids.Filter
jobsin memory against the plucked execution ids instead.Per
enqueue_allbatch:The avoided
SELECTis the most expensive of the four: it scans the widesolid_queue_jobsrow including the serializedargumentspayload, so the saving is bytes-over-the-wire and not just a round trip.Semantic equivalence:
where(id: ...)would have returned.prepare_all_for_executionwhich concatenates with+, andExecution::Dispatching#dispatch_jobswhich calls.map(&:id)) already coerce the result to an Array and do not depend on it being anActiveRecord::Relationor on row order.jobsis the same collection that was just queried back increate_all_from_active_jobs, so attribute freshness is unchanged versus the previous reload.Verified against the existing
job_test,ready_execution_test,dispatcher_test, andconcurrency_controls_testsuites on SQLite (55 runs, 0 failures).