Reduce non-default-interface costs for complicated runtimeclasses#1579
Draft
jonwis wants to merge 27 commits intomicrosoft:masterfrom
Draft
Reduce non-default-interface costs for complicated runtimeclasses#1579jonwis wants to merge 27 commits intomicrosoft:masterfrom
jonwis wants to merge 27 commits intomicrosoft:masterfrom
Conversation
- docs/plan-cached-interface-dispatch.md: design for thunk-based interface caching in projected runtimeclasses, hazard audit, implementation plan, and agent workflow instructions - scripts/build_and_test.ps1: parallel msbuild + test runner - scripts/run_cppwinrt.ps1: run cppwinrt.exe with output under build/
Plan fixes: - Remove contradictory consume_general guidance; commit to three-way branch - Clarify operator I() returns by value (AddRef path), hot path is consume_general - Specify include ordering: base_thunked_runtimeclass.h after base_implements.h - Fix WINRT_IMPL_SHIM: breaks for thunked types, add consume_general_nothrow - Add as<T>()/try_as<T>() member docs on thunked_runtimeclass_base - Use SFINAE (enable_if_t/void_t) not requires - C++17 floor - Specify .2.h file placement (same as today) - Document factory constructor wiring - Add copy_from_abi/copy_to_abi coverage - Add no-op thunk vtable slots 0/1/2 (QI/AddRef/Release) Naming: PascalCase -> snake_case throughout (thunked_runtimeclass_header, interface_thunk, cache_and_thunk_tagged, thunked_runtimeclass_base, etc.) Script: build_and_test.ps1 reworked: - Default: build only test\test (fast feedback loop) - -BuildAll: build all 9 test targets - -Test: run built test executables - -Clean: git clean -dfx . before building
…sume_general) - base_thunked_runtimeclass.h: thunked_runtimeclass<IDefault, I...> template, interface_thunk with resolve(), SFINAE ABI overloads (get_abi, put_abi, etc.) - base_meta.h: has_thunked_cache_v, has_thunked_interface_v, type_index traits - base_windows.h: three-way if constexpr in consume_general/noexcept + consume_general_nothrow - code_writers.h: WINRT_IMPL_SHIM -> consume_general_nothrow for IMap/IMapView Lookup/Remove - ASM thunks: x64, x86, ARM64, ARM64EC stubs (256 slots each) - winrt_thunk_resolve.cpp: extern C bridge from ASM to C++ resolve()
- write_thunked_class: generates impl::thunked_runtimeclass<IDefault, I...> base - write_thunked_class_requires: includes ALL interfaces (including default) in require<> - is_interface: includes thunked types for producer_convert detection - ActivateInstance<T>: if-constexpr fast path for thunked types - ABI overloads: moved to base_windows.h, added rvalue detach_abi, exclusions - Implicit IUnknown/IInspectable conversions on thunked_runtimeclass_base - test/Directory.Build.targets: MASM + thunk resolve for all test binaries 8 errors remain: agile_ref ctor, LiesAboutInheritance edge cases
…/unbox Systematic fix: thunked runtimeclasses must be recognized as COM object types everywhere the library uses is_base_of<IUnknown/IInspectable> to distinguish COM types from value types. - base_meta.h: move thunked traits before empty_value/arg<T> (ordering) - base_meta.h: arg<T> specialization includes has_thunked_cache_v - base_meta.h: empty_value<T> returns nullptr for thunked types - base_windows.h: is_com_interface includes has_thunked_cache_v - base_windows.h: com_ref<T> includes has_thunked_cache_v - base_reference_produce.h: box_value/unbox_value/unbox_value_or handle thunked 23/25 tests pass; 2 failures are pre-existing EH funclet issues (custom_error, disconnected)
Exercises IMap<hstring,IInspectable> methods (Insert/Lookup/Size/HasKey/Remove/Clear) through thunked PropertySet runtimeclass. noinline helpers for disassembly. Disassembly confirms: cache slot load at [rcx+28h] -> vtable call, NO QI. Agility crash is pre-existing EH funclet issue (C_A_T_C_H_T_E_S_T_0::dtor).
- thunked_copy_move: copy/move ctor/assign, nullptr assign - thunked_abi_interop: get/put/detach/attach/copy_from/copy_to_abi round-trips - thunked_as_try_as: as<T>, try_as<T>, implicit IInspectable/IUnknown conversion - thunked_threading: 8 threads x 100 iterations concurrent thunk resolution - Fix: reset_thunked() reinitializes thunk pairs after ABI copy/attach (copy_from_abi/attach_abi left pairs uninitialized causing null deref)
…) tests - thunked_generic_default: StringMap with IMap<hstring,hstring> generic default Exercises Insert/Lookup/HasKey/Size/Clear + iteration, as<IObservableMap> - thunked_full_mode: Package with 9 secondaries (>8 = full mode) Static asserts verify use_tagged=false, tuple_size>8
…OM identity - has_async_default_interface: detect IAsyncOperation/Action via TypeSpec parsing - operator==/!=: three-tier (address, default_cache, QI IUnknown) comparison - operator==/!= for nullptr_t e2e build_test_all: all builds pass, 2 test failures: - test_slow: QI count changed from 4 to 1 (expected: thunking reduces QI calls) - test_old: delegate.cpp crash (needs investigation)
- bind_in<T>: SFINAE specialization for thunked types (get_abi instead of reinterpret) - delegate_arg<T>: safe ABI->projected conversion in delegate produce stubs - Code generator: emit delegate_arg<T>() instead of reinterpret_cast for IN params - has_async_default_interface: exclude IAsyncOperation/Action via TypeSpec parsing - test_slow/Simple.cpp: QI count 4->1 (thunking reduces tracked QI calls) - operator==/!=: three-tier COM identity (address, default_cache, QI IUnknown) e2e: 222/223 old_tests pass. 1 remaining: event_consume factory revoker crash.
…flinging an exception out of a non-exceptional frame.
Co-authored-by: Copilot <copilot@github.com>
Move winrt_cached_resolve_thunk from strings/cached_thunk_resolve.cpp into base_thunked_runtimeclass.h as extern C inline with a selectany function-pointer forcelink to ensure MSVC emits the symbol for ASM stubs. Remove ClCompile for cached_thunk_resolve.cpp from test/Directory.Build.targets. Add per-project Directory.Build.targets for TestRuntimeComponentCX and TestProxyStub to strip MASM thunk stubs from non-C++/WinRT projects that don't include winrt/base.h. Update plan doc to reflect the new inline approach.
Add -flatten_classes CLI option to cppwinrt.exe. Thunked/flattened runtimeclass projections are now only emitted when this flag is passed, matching the -fastabi opt-in pattern. - settings.h: add bool flatten_classes - main.cpp: add CLI option and parse it - code_writers.h: gate write_thunked_class on settings.flatten_classes - nuget .targets: wire to -flatten_classes - build_projection.cmd, test_component.vcxproj, test/CMakeLists.txt, CI scripts: pass -flatten_classes for test builds
- Add status header: implementation complete, gated on -flatten_classes - Fix header layout diagram: default_cache first, iid_table second - Fix P0 hazard description: layout is already correct - Add async default interface exclusion to categories table - Fix Phase 1 items 3/4: note write_abi_args revert, bind_out removal - Update phase status markers: all phases complete with commit refs - Remove stale 'Phase 2 not started' note - runtimeclass-caching.md: add -flatten_classes requirement, fix criteria - Remove duplicate comment in test/Directory.Build.targets
Replace ~350 lines of session-by-session development history with a clean 'Development Notes' section: 7 key architectural decisions, async exclusion rationale, COM identity approach, and a commit reference table.
Create cached_thunks/cached_thunks.vcxproj (StaticLibrary, MASM-only)
that builds cppwinrt_cached_thunks.lib per architecture, mirroring the
fast_fwd pattern.
- Add project to cppwinrt.sln with all 6 platform configs
- build_test_all.cmd: build cached_thunks alongside fast_fwd
- build_nuget.cmd: build all 3 arches and pass lib paths to nuget pack
- nuspec: package libs at build/native/lib/{platform}/
- .targets: link cppwinrt_cached_thunks.lib when CppWinRTFlattenClasses=true
Simple is a fast ABI type, not a thunked type. Restore the original 4-QI expectation that was incorrectly changed during thunked development.
oldnewthing
reviewed
May 9, 2026
| mov rax, [r11 + r10 * 8] ; rax = method at vtable[slot] | ||
|
|
||
| ; Verify indirect call target (preserves all GPRs except rax/flags) | ||
| call [__guard_check_icall_fptr] |
Member
There was a problem hiding this comment.
I don't think you can issue call instructions on a non-paragraph aligned stack.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Super Summary
"Thunked" or "flattened" runtimeclasses decrease binary size and increase throughput by reducing the effort required to call non-default interface members on runtimeclass types. When calling a method on a non-default interface, a place holder vtable proxy swaps itself for the "real" interface obtained from the default interface. Those "real" interfaces are cached inside the runtimeclass instance until the class is destroyed.
Certain kinds of types are excluded - classes with fast_abi metadata or which are unsealed. These limitations may be addressed at a later time.
Use
cppwinrt.exein the usual way*.asmthunk files are pre-compiled as libraries similar to the fast_abi.lib per-architecture filesThis feature is turned on with the
-thunked_classesswitch, which is off by default.Background
A type like
ValueSetimplements a number of interfaces. Most developers write code like this:Under the covers, each call to
.Insertand the use of.Firstbehind range-basedforis a sequence of:IMap<String, Object>::InsertThis produces a nontrivial amount of code for composite objects, those with interfaces with
requires, and those that have gone through multiple versioning cycles. While QI and Release are fast-ish, they're still cacheline flushes (interlocked operations) and code the processor has to use. Windows heavily relies on runtimeclass contract versioning, producing types that grow in complexity over time.Mechanism
See
docs/runtimeclass-caching.mdfor the full details. Projected runtimeclasses contain up to 8 "cached slots" for their non-default interfaces. Each slot acts like another COM instance - a pointer to a thunk vtable, plus a pointer to the parent class type combined with a slot index.When any method on the thunk vtable is invoked, the stub calls out to the parent class's cache to resolve its slot. "Resolve" means calling
defaultInterface->QueryInterface(iid, &result), then using racy-init to swap the result of the QI into the projected type's interface pointers. Future calls on the same runtimeclass for the same interface then go through the "real" interface instance, rather than the thunk stub.Notable Changes
This mechanism is a departure from the
sizeof(runtimeclass)==sizeof(void*)model. A runtimeclass instance is no longer just a container for awinrt::com_ptr<abi_t<TDefaultInterface>>... runtimeclass instances grow in size to contain up to 8 slots, along with the increased code required to make full-value copies.ValueSet v = otherValueSetcopies only the default interface, causing a re-query for all the slots.Many C++/WinRT functions that expected this pointer-is-the-default-interface model changed to accomodate