Skip to content

Reduce non-default-interface costs for complicated runtimeclasses#1579

Draft
jonwis wants to merge 27 commits intomicrosoft:masterfrom
jonwis:user/jonwis/fast-runtimeclasses
Draft

Reduce non-default-interface costs for complicated runtimeclasses#1579
jonwis wants to merge 27 commits intomicrosoft:masterfrom
jonwis:user/jonwis/fast-runtimeclasses

Conversation

@jonwis
Copy link
Copy Markdown
Member

@jonwis jonwis commented May 9, 2026

Super Summary

"Thunked" or "flattened" runtimeclasses decrease binary size and increase throughput by reducing the effort required to call non-default interface members on runtimeclass types. When calling a method on a non-default interface, a place holder vtable proxy swaps itself for the "real" interface obtained from the default interface. Those "real" interfaces are cached inside the runtimeclass instance until the class is destroyed.

Certain kinds of types are excluded - classes with fast_abi metadata or which are unsealed. These limitations may be addressed at a later time.

Use

  1. Use cppwinrt.exe in the usual way
  2. For NuGet consumers, the *.asm thunk files are pre-compiled as libraries similar to the fast_abi.lib per-architecture files
  3. Rebuild clean

This feature is turned on with the -thunked_classes switch, which is off by default.

Background

A type like ValueSet implements a number of interfaces. Most developers write code like this:

PropertySet v;
v.Insert(L"kittens", box_value(true));
v.Insert(L"puppies", box_value(false));
for (auto& [k, v] : v) {
    tell_people(k, v.as<bool>());
}

Under the covers, each call to .Insert and the use of .First behind range-based for is a sequence of:

  1. QueryInterface from IPropertySet (the default marker interface)
  2. Call IMap<String, Object>::Insert
  3. Release the temporary from step 1

This produces a nontrivial amount of code for composite objects, those with interfaces with requires, and those that have gone through multiple versioning cycles. While QI and Release are fast-ish, they're still cacheline flushes (interlocked operations) and code the processor has to use. Windows heavily relies on runtimeclass contract versioning, producing types that grow in complexity over time.

Mechanism

See docs/runtimeclass-caching.md for the full details. Projected runtimeclasses contain up to 8 "cached slots" for their non-default interfaces. Each slot acts like another COM instance - a pointer to a thunk vtable, plus a pointer to the parent class type combined with a slot index.

When any method on the thunk vtable is invoked, the stub calls out to the parent class's cache to resolve its slot. "Resolve" means calling defaultInterface->QueryInterface(iid, &result), then using racy-init to swap the result of the QI into the projected type's interface pointers. Future calls on the same runtimeclass for the same interface then go through the "real" interface instance, rather than the thunk stub.

Notable Changes

This mechanism is a departure from the sizeof(runtimeclass)==sizeof(void*) model. A runtimeclass instance is no longer just a container for a winrt::com_ptr<abi_t<TDefaultInterface>> ... runtimeclass instances grow in size to contain up to 8 slots, along with the increased code required to make full-value copies. ValueSet v = otherValueSet copies only the default interface, causing a re-query for all the slots.

Many C++/WinRT functions that expected this pointer-is-the-default-interface model changed to accomodate

jonwis and others added 27 commits May 4, 2026 10:56
- docs/plan-cached-interface-dispatch.md: design for thunk-based
  interface caching in projected runtimeclasses, hazard audit,
  implementation plan, and agent workflow instructions
- scripts/build_and_test.ps1: parallel msbuild + test runner
- scripts/run_cppwinrt.ps1: run cppwinrt.exe with output under build/
Plan fixes:
- Remove contradictory consume_general guidance; commit to three-way branch
- Clarify operator I() returns by value (AddRef path), hot path is consume_general
- Specify include ordering: base_thunked_runtimeclass.h after base_implements.h
- Fix WINRT_IMPL_SHIM: breaks for thunked types, add consume_general_nothrow
- Add as<T>()/try_as<T>() member docs on thunked_runtimeclass_base
- Use SFINAE (enable_if_t/void_t) not requires - C++17 floor
- Specify .2.h file placement (same as today)
- Document factory constructor wiring
- Add copy_from_abi/copy_to_abi coverage
- Add no-op thunk vtable slots 0/1/2 (QI/AddRef/Release)

Naming: PascalCase -> snake_case throughout (thunked_runtimeclass_header,
interface_thunk, cache_and_thunk_tagged, thunked_runtimeclass_base, etc.)

Script: build_and_test.ps1 reworked:
- Default: build only test\test (fast feedback loop)
- -BuildAll: build all 9 test targets
- -Test: run built test executables
- -Clean: git clean -dfx . before building
…sume_general)

- base_thunked_runtimeclass.h: thunked_runtimeclass<IDefault, I...> template, interface_thunk with resolve(), SFINAE ABI overloads (get_abi, put_abi, etc.)

- base_meta.h: has_thunked_cache_v, has_thunked_interface_v, type_index traits

- base_windows.h: three-way if constexpr in consume_general/noexcept + consume_general_nothrow

- code_writers.h: WINRT_IMPL_SHIM -> consume_general_nothrow for IMap/IMapView Lookup/Remove

- ASM thunks: x64, x86, ARM64, ARM64EC stubs (256 slots each)

- winrt_thunk_resolve.cpp: extern C bridge from ASM to C++ resolve()
- write_thunked_class: generates impl::thunked_runtimeclass<IDefault, I...> base

- write_thunked_class_requires: includes ALL interfaces (including default) in require<>

- is_interface: includes thunked types for producer_convert detection

- ActivateInstance<T>: if-constexpr fast path for thunked types

- ABI overloads: moved to base_windows.h, added rvalue detach_abi, exclusions

- Implicit IUnknown/IInspectable conversions on thunked_runtimeclass_base

- test/Directory.Build.targets: MASM + thunk resolve for all test binaries

8 errors remain: agile_ref ctor, LiesAboutInheritance edge cases
…/unbox

Systematic fix: thunked runtimeclasses must be recognized as COM object types

everywhere the library uses is_base_of<IUnknown/IInspectable> to distinguish

COM types from value types.

- base_meta.h: move thunked traits before empty_value/arg<T> (ordering)

- base_meta.h: arg<T> specialization includes has_thunked_cache_v

- base_meta.h: empty_value<T> returns nullptr for thunked types

- base_windows.h: is_com_interface includes has_thunked_cache_v

- base_windows.h: com_ref<T> includes has_thunked_cache_v

- base_reference_produce.h: box_value/unbox_value/unbox_value_or handle thunked

23/25 tests pass; 2 failures are pre-existing EH funclet issues (custom_error, disconnected)
Exercises IMap<hstring,IInspectable> methods (Insert/Lookup/Size/HasKey/Remove/Clear)

through thunked PropertySet runtimeclass. noinline helpers for disassembly.

Disassembly confirms: cache slot load at [rcx+28h] -> vtable call, NO QI.

Agility crash is pre-existing EH funclet issue (C_A_T_C_H_T_E_S_T_0::dtor).
- thunked_copy_move: copy/move ctor/assign, nullptr assign

- thunked_abi_interop: get/put/detach/attach/copy_from/copy_to_abi round-trips

- thunked_as_try_as: as<T>, try_as<T>, implicit IInspectable/IUnknown conversion

- thunked_threading: 8 threads x 100 iterations concurrent thunk resolution

- Fix: reset_thunked() reinitializes thunk pairs after ABI copy/attach

  (copy_from_abi/attach_abi left pairs uninitialized causing null deref)
…) tests

- thunked_generic_default: StringMap with IMap<hstring,hstring> generic default

  Exercises Insert/Lookup/HasKey/Size/Clear + iteration, as<IObservableMap>

- thunked_full_mode: Package with 9 secondaries (>8 = full mode)

  Static asserts verify use_tagged=false, tuple_size>8
…OM identity

- has_async_default_interface: detect IAsyncOperation/Action via TypeSpec parsing

- operator==/!=: three-tier (address, default_cache, QI IUnknown) comparison

- operator==/!= for nullptr_t

e2e build_test_all: all builds pass, 2 test failures:

- test_slow: QI count changed from 4 to 1 (expected: thunking reduces QI calls)

- test_old: delegate.cpp crash (needs investigation)
- bind_in<T>: SFINAE specialization for thunked types (get_abi instead of reinterpret)

- delegate_arg<T>: safe ABI->projected conversion in delegate produce stubs

- Code generator: emit delegate_arg<T>() instead of reinterpret_cast for IN params

- has_async_default_interface: exclude IAsyncOperation/Action via TypeSpec parsing

- test_slow/Simple.cpp: QI count 4->1 (thunking reduces tracked QI calls)

- operator==/!=: three-tier COM identity (address, default_cache, QI IUnknown)

e2e: 222/223 old_tests pass. 1 remaining: event_consume factory revoker crash.
…flinging an exception out of a non-exceptional frame.
Co-authored-by: Copilot <copilot@github.com>
Move winrt_cached_resolve_thunk from strings/cached_thunk_resolve.cpp into
base_thunked_runtimeclass.h as extern C inline with a selectany function-pointer
forcelink to ensure MSVC emits the symbol for ASM stubs.

Remove ClCompile for cached_thunk_resolve.cpp from test/Directory.Build.targets.

Add per-project Directory.Build.targets for TestRuntimeComponentCX and
TestProxyStub to strip MASM thunk stubs from non-C++/WinRT projects that
don't include winrt/base.h.

Update plan doc to reflect the new inline approach.
Add -flatten_classes CLI option to cppwinrt.exe. Thunked/flattened
runtimeclass projections are now only emitted when this flag is passed,
matching the -fastabi opt-in pattern.

- settings.h: add bool flatten_classes
- main.cpp: add CLI option and parse it
- code_writers.h: gate write_thunked_class on settings.flatten_classes
- nuget .targets: wire  to -flatten_classes
- build_projection.cmd, test_component.vcxproj, test/CMakeLists.txt,
  CI scripts: pass -flatten_classes for test builds
- Add status header: implementation complete, gated on -flatten_classes
- Fix header layout diagram: default_cache first, iid_table second
- Fix P0 hazard description: layout is already correct
- Add async default interface exclusion to categories table
- Fix Phase 1 items 3/4: note write_abi_args revert, bind_out removal
- Update phase status markers: all phases complete with commit refs
- Remove stale 'Phase 2 not started' note
- runtimeclass-caching.md: add -flatten_classes requirement, fix criteria
- Remove duplicate comment in test/Directory.Build.targets
Replace ~350 lines of session-by-session development history with a clean
'Development Notes' section: 7 key architectural decisions, async exclusion
rationale, COM identity approach, and a commit reference table.
Create cached_thunks/cached_thunks.vcxproj (StaticLibrary, MASM-only)
that builds cppwinrt_cached_thunks.lib per architecture, mirroring the
fast_fwd pattern.

- Add project to cppwinrt.sln with all 6 platform configs
- build_test_all.cmd: build cached_thunks alongside fast_fwd
- build_nuget.cmd: build all 3 arches and pass lib paths to nuget pack
- nuspec: package libs at build/native/lib/{platform}/
- .targets: link cppwinrt_cached_thunks.lib when CppWinRTFlattenClasses=true
Simple is a fast ABI type, not a thunked type. Restore the original
4-QI expectation that was incorrectly changed during thunked development.
mov rax, [r11 + r10 * 8] ; rax = method at vtable[slot]

; Verify indirect call target (preserves all GPRs except rax/flags)
call [__guard_check_icall_fptr]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can issue call instructions on a non-paragraph aligned stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants