gh-133672: Allow LOAD_FAST to be optimized to LOAD_FAST_BORROW#133721
gh-133672: Allow LOAD_FAST to be optimized to LOAD_FAST_BORROW#133721ljfp wants to merge 2390 commits intopython:mainfrom
Conversation
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
There was a problem hiding this comment.
Thanks for working on this! This isn't safe to do unconditionally and will require an analysis that operates on the entire CFG, rather than per basic block. It's only safe to optimize a LOAD_FAST instruction that leaves a value on the stack at the end of a basic block if the other two conditions hold along all paths between the end of the basic block and when the value is popped from the stack. For example, it's not safe to optimize the first LOAD_FAST in bb0 below because the local is overwritten before the value is consumes from the stack in bb1:
bb0:
LOAD_FAST 0
LOAD_FAST 1
TO_BOOL.
POP_JUMP_IF_FALSE <bb2>
bb1:
LOAD_CONST 0
STORE_FAST 0
RETURN_VALUE
bb2:
RETURN_VALUE
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
|
@mpage Thanks for your feedback! I've pushed an update that should address cases like the one you mentioned. The main change involves doing, as suggested, a cfg analysis (using dfs via is_borrow_safe and check_borrow_safety_globally functions). This analysis now:
I think this way we are properly handling the scenarios like the one you provided and ensuring the LOAD_FAST_BORROW optimization is applied safely. Could you please take another look when you have a moment? To be honest, this dived deeper into flowgraph.c than I initially anticipated, and it's a bit outside my usual comfort zone, so any further guidance would be much appreciated if I've missed something or if there are better ways to approach this. |
|
Could you please check out the test failures? I'll try to give this PR a look if Matt doesn't get to it first :). |
…pythonGH-148302) Improve ABI/feature selection, add new header for it. Add a test that Python headers themselves don't use Py_GIL_DISABLED in abi3t: abi3 and abi3t ought to be the same except the _Py_OPAQUE_PYOBJECT differences. This is done using the GCC-only poison pragma. Co-authored-by: Victor Stinner <vstinner@python.org>
…` on free-threading (python#148908)
… string representations of ForwardRef (python#148682) Co-authored-by: Shamil <ashm.tech@proton.me>
…from both `ValueError` and `IndexError` (python#148664) Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Stan Ulbrych <stan@python.org>
… Lines file (python#132632) Co-authored-by: Brian Schubert <brianm.schubert@gmail.com>
pythongh-148886) Avoid racing with the owning thread's refcount operations when immortalizing an interned string: if we don't own it and its refcount isn't merged, intern a copy we own instead. Use atomic stores in _Py_SetImmortalUntracked so concurrent atomic reads are race-free.
Add '+' alternatives to signed_number and signed_real_number grammar rules, mirroring how unary minus is already handled for pattern matching. Unary plus is a no-op on numbers so the value is returned directly without wrapping in a UnaryOp node.
…ntend (pythonGH-148089) * Replaces ad-hoc logic for ending traces with a simple inequality: `fitness < exit_quality` * Fitness starts high and is reduced for branches, backward edges, calls and trace length * Exit quality reflect how good a spot that instruction is to end a trace. Closing a loop is very, specializable instructions are very low and the others in between.
Improve `hash()` builtin docstring with caveats. Mention its return type and that the value can be expected to change between processes (hash randomization). Why? The `hash` builtin gets reached for and used by a lot of people whether it is the right tool or not. IDEs surface docstrings and people use pydoc and `help(hash)`.
Fixes python#108951 Co-authored-by: sobolevn <mail@sobolevn.me> Co-authored-by: Andrew Svetlov <andrew.svetlov@gmail.com> Co-authored-by: Guido van Rossum <guido@python.org>
…RING_EVENT_*` values from pythongh-146182 (pythongh-148955) python#146182 left an unintended change in the `PY_MONITORING_*` macro values. This change reverts that part to avoid a user visible impact.
…on#148948) Also add a test demonstrating the need for the existing "is oldcls" check. Co-authored-by: Bartosz Sławecki <bartosz@ilikepython.com>
|
The following commit authors need to sign the Contributor License Agreement: |
|
Why was this PR closed? I was just synching my branch with the latest changes from main, after receiving an email saying my PR was marked "stale". |
|
Your rebase messed something up, which ended up tagging every codeowner, so everyone is going to get pings in their inbox for all activity on this PR. Please open a new PR. |
The
LOAD_FAST_BORROWinstruction (op code 86) loads a borrowed reference onto the operand stack, which is a performance optimization that avoids unnecessary reference counting operations.Previously, we were only applying this optimization when the reference was consumed within the same basic block. If the value was still on the stack at the end of a basic block (indicated by the
REF_UNCONSUMEDflag), we wouldn't perform the optimization.However, (if I understood this correctly) there are cases where it's perfectly safe to use
LOAD_FAST_BORROWeven when the value is still on the stack at the end of a basic block. The optimization is safe as long as:SUPPORT_KILLEDflag)STORED_AS_LOCALflag)This fix allows us to optimize more cases, which is seems to be particularly important for the virtual iterators implementation (PR #132555) where the iterable for a loop is often live at basic block end.
LOAD_FAST_BORROWnot being used even when safe to do so, if value is live at BB end. #133672