feat: add modelsdk — auto-generated type-safe model SDK#335
feat: add modelsdk — auto-generated type-safe model SDK#335engalar wants to merge 2 commits intomendixlabs:mainfrom
Conversation
65bd533 to
00ac646
Compare
…undtrip Add a new `modelsdk/` package that provides a fundamentally better architecture for reading and writing Mendix MPR files compared to the existing hand-written `sdk/` layer. This is an additive, non-breaking change — no existing code is modified. The new package coexists with `sdk/` to enable gradual migration. Key improvements over current sdk/: - 1,500+ auto-generated types across 53 Mendix domains (vs ~480 hand-written) - Type registry with automatic init() registration (vs manual dispatch) - Dirty tracking via bitmap + container chain propagation - BSON roundtrip preservation — unknown fields survive read/write cycles - Lazy decode via InitFromRaw() — zero cost for unaccessed fields - Three-branch encoder: self-dirty / child-dirty / clean pass-through - Property abstraction: Primitive[T], Part[T], PartList[T], Enum[T], ByNameRef[T] - Per-property version metadata (introduced/deleted/public) - Reference registry for cross-domain relationship tracking Packages added: - modelsdk/codec — BSON encoder/decoder with type registry - modelsdk/element — Element interface with dirty tracking - modelsdk/property — Generic property types with lazy init - modelsdk/gen/ — 53 auto-generated domain packages - modelsdk/mpr — MPR v1/v2 reader/writer - modelsdk/widgets — Widget template handling - modelsdk/meta — System module metadata - modelsdk/version — Version compatibility info - cmd/modelsdk-codegen — Code generator from TS SDK reflection data - internal/codegen/dtsparser — TypeScript SDK parser - internal/codegen/emitter — Go code generation templates
00ac646 to
637565e
Compare
ako
left a comment
There was a problem hiding this comment.
PR #335 Review — feat: add modelsdk — auto-generated type-safe model SDK
Overview
Large architectural addition (~199K lines across 327 files) introducing modelsdk/ as a fundamentally better reading/writing layer for Mendix MPR files. Additive only — coexists with existing sdk/. The core abstractions (element, property, codec, dirty-chain) are solid; most additions are auto-generated domain types.
Architecture
The three-layer design is well thought out:
| Layer | Role | Verdict |
|---|---|---|
element.Base + dirty bitmap |
Identity, raw bytes, child-dirty bubble | Good |
property.Primitive/Part/PartList/Enum/ByNameRef |
Lazy decode, dirty tracking per-bit | Good |
codec.Decoder/Encoder + three-branch encoder |
Clean passthrough / child rebuild / full rebuild | Good |
The bson.A{int32(3)} version sentinel in the encoder's child-list branches is correct for Mendix BSON document arrays. DecodeChildren silently skips the sentinel (non-document first element) as intended.
Issues
1. Production log noise — [perf] log in store.go
log.Printf("[perf] Store.ListUnits: %s (units=%d)", time.Since(listStart), len(units))This fires on every Open / OpenForWriting call and will spam production logs. Remove it or put it behind an environment-gated debug flag.
2. FindByQualifiedName searches on simple Name, not qualified name
The method is named FindByQualifiedName and takes a qualifiedName parameter, but both the index lookup (u.Type+":"+u.Name) and the linear scan fallback match against the BSON Name field, which stores the simple name (e.g., "ACT_GetUser"), not the qualified name ("MyModule.ACT_GetUser"). Callers passing a qualified name will always miss on the linear-scan fallback. Either rename the method or document that qualifiedName must be the module-local simple name.
3. Concurrent-use documentation is overstated
Multiple doc comments say methods are "Safe for concurrent use", but the generation-based cache consistency in AllOfType is not atomic:
currentTypeGen := m.uc.TypeGen(typeName) // lock acquired + released
// ...loop...
cached, cachedGen, hasCached := m.uc.GetWithGen(u.ID) // lock acquired + releasedA concurrent InsertUnit/DeleteUnit between these two lock acquisitions bumps the generation, making the cachedGen >= currentTypeGen check stale. This is fine for the current single-goroutine usage, but the docs should say "not safe for concurrent mutation" to avoid future surprises.
4. Partial failure window in WriteTransaction.Commit()
The v2 write path commits the DB transaction before renaming temp files to final paths. If a rename fails, the DB records the new hash but the file on disk retains the old content, creating an inconsistency that survives restarts. The comment acknowledges this. This is inherited from the existing sdk/mpr writer, but surfacing it here for the record — consider an all-or-nothing two-phase approach (rename all files, then commit DB).
5. Three escape-hatch methods on Model signal incomplete coverage
PatchEncodedField, PatchEncodedNestedField, PatchEncodedVersionedArrayElements are described as "last-resort" escape hatches for cases where the SDK type's setter has a different type than BSON storage. These are pragmatic for a v1, but they should either be marked //nolint / //TODO: remove when X is supported or documented in a TODO file so they don't become permanent load-bearing APIs.
6. DeleteModuleWithCleanup uses strings.ToLower(moduleName) in a filesystem path
themesourceDir := filepath.Join(projectDir, "themesource", strings.ToLower(moduleName))Mendix module names are alphanumeric so path traversal is unlikely in practice, but filepath.Join doesn't prevent it if a malformed name somehow reaches this path. A filepath.Clean + parent-dir check would be belt-and-suspenders here.
7. Missing package-level documentation
modelsdk/ is a major new API surface but has no README or doc.go explaining: (a) how to choose between modelsdk/ and the existing sdk/, (b) the migration path, (c) which domains are covered, (d) the generated-code regeneration workflow. The package comment in model.go covers only basic usage.
Nits
TypeRegistryusessync.RWMutexbut all registrations happen duringinit()(single-threaded). The mutex adds overhead to everyLookup()call after startup for no benefit. Async.Mapor even a plainmapwith a build-time freeze would be more efficient, though this is unlikely to be a bottleneck in practice.- The
cmd/modelsdk-codegen/audit.gois a useful verification tool; consider wiring it intomake testor CI so type coverage regressions are caught automatically.
Summary
The core engine (element, property, codec) is well-designed and the test suite covers the critical dirty-chain and roundtrip invariants. The seven concerns above are prioritised: items 1 and 2 are likely to cause user-visible bugs (log spam and wrong lookups). Items 3–5 are correctness/safety concerns. Items 6–7 are documentation and hygiene. The auto-generated domain types are not hand-reviewable but the codegen templates and supplements.json look structurally sound.
- Remove [perf] log noise from Store.ListUnits (#1) - Fix FindByQualifiedName docs: clarify it matches simple Name, not qualified (#2) - Fix concurrency docs: "not safe for concurrent mutation" (#3) - Add TODO for WriteTransaction.Commit partial-failure window (#4) - Add TODO for PatchEncoded escape-hatch methods (#5) - Add path traversal guard in DeleteModuleWithCleanup (#6) - Add package doc.go with architecture overview and migration guidance (#7)
|
All 7 items addressed in b8499d5:
Re: nit on |
ako
left a comment
There was a problem hiding this comment.
PR #335 — feat: add modelsdk — auto-generated type-safe model SDK
Reviewed against the CLAUDE.md checklist. This is a large architectural PR (198k additions, 328 files) so the review focuses on structural concerns rather than line-by-line coverage.
The PR is confirmed additive only — no files outside modelsdk/, internal/codegen/, and cmd/modelsdk-codegen/ are modified relative to the PR's own base commit. The earlier review bot concern about modified sdk/ and mdl/ files was noise from comparing against a stale main HEAD.
Blockers
None — but there are several issues that should be resolved before or alongside this landing.
Moderate Issues
1. modelsdk/widgets/ is a near-duplicate of sdk/widgets/ and has already drifted
augment.go, loader.go, mpk/, definitions/, and templates/ are duplicated between sdk/widgets/ and modelsdk/widgets/. One template file (combobox.json) already differs between the two directories. The log message format in loader.go also diverged.
Two parallel widget implementations means every sdk/widgets/ bug fix must be manually ported to modelsdk/widgets/, and vice versa. The PR description doesn't address this — it doesn't say whether modelsdk/widgets/ is meant to replace sdk/widgets/ or live alongside it indefinitely.
Suggested fix: Either have modelsdk/widgets/ import the shared parts from sdk/widgets/ (import rather than copy), or explicitly document the fork intent and add a comment warning future editors.
2. PatchEncoded* escape hatches signal incomplete type coverage
model.go exports three methods (PatchEncodedField, PatchEncodedNestedField, PatchEncodedVersionedArrayElements) that bypass the typed property API and write raw BSON fields directly. The TODO: remove when generated setters cover all BSON storage type mismatches comment acknowledges this is a gap.
Public API escape hatches like these tend to persist. If callers depend on them, they become load-bearing. Restricting them to an internal/ package until the typed setters are complete would prevent premature stabilization of a workaround surface.
3. Write path has a partial-failure window in MPR v2
modelsdk/mpr/writer_core.go has:
// TODO: adopt two-phase approach (rename files first, then commit DB) to
// eliminate the partial-failure window where DB is committed but files are stale.
For v2 MPR projects (Mendix ≥ 10.18), a crash between the DB commit and the file rename leaves the project in a state where the database metadata refers to file content that doesn't exist yet. This is a data-integrity risk, not just a cosmetic TODO. It should be tracked as an issue before the package is used for real writes.
4. No proposal document for the architecture
CLAUDE.md says to check docs/11-proposals/ before coding. A change of this scope — introducing a second, architecturally distinct MPR layer — warrants a proposal covering: the migration strategy, the intended coexistence lifecycle with sdk/, which code should use modelsdk/ vs sdk/, and how the two will eventually converge. There is a BSON_SCHEMA_REGISTRY_PROPOSAL.md in proposals that overlaps; its relationship to this PR is unclear.
5. Scope is too broad for a single PR
328 files span five independent sub-systems: (a) codec infrastructure, (b) MPR reader/writer, (c) generated domain types for all 53 domains, (d) widget templates, (e) codegen tool. Per the CLAUDE.md checklist: "Each PR is scoped to a single feature or concern — if the description needs 'and' between unrelated items, split it."
A realistic split: codec + element + property as one PR; generated domain types as a second; MPR reader/writer as a third; widgets as a fourth. This would make each reviewable.
Minor Issues
6. Integration test skips by default — won't run in CI
modelsdk/integration_test.go's TestRoundtrip uses t.Skip if no .mpr file is found. The test data committed under modelsdk/mpr/testdata/ contains only .mxunit files (individual units), not a full .mpr. The roundtrip test — the most important correctness guarantee for the new package — will silently skip in every CI run.
7. CLAUDE.md architecture overview not updated
The project architecture section in CLAUDE.md lists sdk/ as the MPR implementation layer but does not mention modelsdk/. New contributors won't know to consider modelsdk/ when adding SDK functionality. doc.go has good guidance but only helps if you already know the package exists.
8. poc_test.go / poc_all_domains_test.go naming in codegen
Two test files in internal/codegen/dtsparser/ are named poc_test.go and poc_all_domains_test.go. "POC" naming in committed production code signals exploratory code that wasn't cleaned up. These should be renamed to reflect what they actually validate (e.g., parser_test.go, all_domains_test.go).
Positive Notes
- Three-branch encoder (self-dirty full rebuild, child-dirty selective rebuild, clean zero-copy passthrough) is a genuinely better approach than the current
sdk/full-rewrite path. ✓ - Unknown-type safety (
element.Basewith raw bytes preserved) means unrecognized$Typevalues survive read/write without data loss — a real problem in the currentsdk/. ✓ - 96 tests across the modelsdk package including benchmarks. The dirty-chain and roundtrip tests cover the core guarantees well. ✓
- Generated code header with
// Code generated ... DO NOT EDITand codegen instructions is clear and standard. ✓ doc.go"choosing between modelsdk/ and sdk/" section gives concrete guidance. ✓
Summary: No blockers, but five moderate issues — the widget duplication (already diverging), the escape-hatch API, the v2 write partial-failure window, the missing proposal, and the PR scope — should be addressed. The architectural foundation is sound and the core codec is well-designed; the concerns are about the completeness of the landing rather than the correctness of the core.
Architectural ReviewAuto-generating 1,500+ types across 53 domains from a single source beats hand-writing Go code.
The direction is right. The open questions below must be resolved before merging to main. Questions1. Source of truth — why parse generated JS?The codegen extracts metadata from Two structured alternatives already exist in the appdev monorepo: Option A:
An ANTLR4 grammar ( Option B: Declarative TypeScript schemas ( Generated from Either alternative would:
Licensing trade-off: Both sources live in the internal repo. If we cannot open-source them, we could publish a minimal derived artifact (Go package or JSON schema) containing only the structural facts mxcli needs — no proprietary DSL exposed. This decision must happen before merging. The JS regex parser locks in a fragile foundation. Switching later means rewriting the entire codegen pipeline after it ships. 2. Regex parser robustness
3. Versioning and multi-version supportGenerated types carry
4. Testing strategy198K lines of generated code. No tests visible in the PR. The existing sdk/ has test coverage against real MPR files.
5. Functional equivalenceThe PR states modelsdk/ coexists with sdk/ for gradual migration. Equivalence is not verified.
6. Build reproducibilityThe codegen reads from
7. Migration strategyThe PR is additive. Long-term value depends on modelsdk/ replacing sdk/.
8. Generated code size and CI impact198K lines across 328 files (7x the existing sdk/'s ~28K lines).
9. supplements.json maintenanceHand-maintained bridge between SDK types and BSON reality. The
10. Error handlingProperties parse BSON on first access via
11. Unknown type handlingUnrecognized
12. Widget ecosystem
13. Backend integrationThe existing codebase has a
14. ConcurrencyThe lazy decode pattern (
15. API surface and stabilityEvery generated package exports all types and functions. No public vs internal boundary.
ConclusionI like the direction — codegen from a single source is the right call, and the coverage jump from ~10 domains to 53 speaks for itself. This is worth pursuing. That said, I wouldn't merge to main yet. Two things bother me: The source of truth needs to be settled first. Right now the codegen parses generated JavaScript with regex. That works today, but it's a fragile foundation — any upstream codegen change breaks it silently. We have There's no integration point. 198K lines with no I'd suggest merging to a feature branch, sorting out the source question, wiring up a read-only backend, and running it in parallel with sdk/ on real projects. Once the read path checks out, merge to main and start migrating consumers. |
Summary
Add a new
modelsdk/package — a fundamentally better architecture for reading and writing Mendix MPR files. This is additive only (no existing code modified) and coexists withsdk/to enable gradual migration.Why this is a better solution
The current
sdk/layer has five structural problems that can't be fixed incrementally:sdk/modelsdk/init()→DefaultRegistrysupplements.json, re-run codegenArchitecture highlights
Primitive[T],Part[T],PartList[T],Enum[T],ByNameRef[T]unify field access with lazy init and dirty trackingInitFromRaw()parses BSON on first access; unaccessed fields cost nothingintroduced/deleted/publicversion info from Mendix metamodel$Typevalues decode to genericelement.Basepreserving raw bytes, so the document round-trips without data lossWhat's included
modelsdk/codec/modelsdk/element/modelsdk/property/modelsdk/gen/(53 packages)modelsdk/mpr/modelsdk/widgets/modelsdk/meta/cmd/modelsdk-codegen/internal/codegen/dtsparser/.jsfile parserinternal/codegen/emitter/Test plan
go build ./modelsdk/...compiles cleango build ./cmd/modelsdk-codegen/...compiles cleango test ./modelsdk/...— 7 packages pass (codec, element, property, version, widgets, mpk, model)🤖 Generated with Claude Code