From 5fa81220622e915adcc08a3ee5c3652633138ab9 Mon Sep 17 00:00:00 2001 From: Zack Asofsky Date: Wed, 3 Jun 2026 15:08:22 -0400 Subject: [PATCH] Add schema versioning, analysis views, and in-place upgrade command Introduce a major.minor schema version stamped in a new `schema_meta` table and exported by both the DuckDB and SQLite writers, plus a set of analysis views/macros that make memory-region, connection, and asset-bundle investigation ergonomic. Schema versioning - DatabaseSchemaInfo: single source of truth (SchemaMajor=1, SchemaMinor=3), Evaluate(major,minor) -> SchemaAction (None/UpgradeInPlace/ReExport/ ToolOutdated), version/snapshot-path readers, and re-export command builder. - Major bump => re-export required; minor bump => safe in-place upgrade. - DatabaseMaintenance + SchemaGate: CLI gates report/summary/validate on version, prompts interactively, and never auto-modifies non-interactively. - New `upgrade ` CLI command re-applies views/indexes in place. Analysis views & macros (both backends unless noted) - v_allocation_enriched, v_system_region_summary, v_region_owner_breakdown, v_connection_edges (typed reference graph; join-key fix for hash joins), v_assetbundle_utilization, and v_native_object_retained (DuckDB-only). - DuckDB macros: region_allocations, region_page_density. - Indexes made re-runnable (CREATE INDEX IF NOT EXISTS). Other - Extract page_size into snapshot_info (from SystemMemoryResidentPageSize). - Surface schema version in summary, report, and multi-report outputs. - New memory-db-sql skill and docs/database-schema.md; sql-safety/index/nav updates. - Tests for DatabaseSchemaInfo, page_size propagation, and schema-version output. Co-Authored-By: Claude Opus 4.8 (1M context) --- .claude/skills/memory-db-sql/SKILL.md | 104 +++++ .../skills/memory-snapshot-report/SKILL.md | 8 +- .github/workflows/validate_catalog.yaml | 2 +- Cli/CliOptions.cs | 2 + Cli/CommandLineBuilder.cs | 31 +- Cli/Program.cs | 49 ++- Cli/SchemaGate.cs | 136 +++++++ Core/ExportDestination/DatabaseMaintenance.cs | 68 ++++ .../DuckDbExportDestination.cs | 186 ++++++++- .../IExportDestinationWriter.cs | 10 + .../SqliteExportDestination.cs | 4 + Core/ExportDestination/SqliteWriter.cs | 194 ++++++++- Core/Models/DatabaseSchemaInfo.cs | 169 ++++++++ Core/Models/SnapshotData.cs | 6 + Core/Parser/SnapshotBridge.cs | 1 + .../MultiSnapshotHtmlRenderer.cs | 18 +- .../MultiSnapshotReportBuilder.cs | 9 +- .../MultiSnapshotReportModel.cs | 6 + Core/Report/Queries/ReportSql.cs | 3 + Core/Report/ReportBuilder.cs | 18 + Core/Report/SummaryReportFormatter.cs | 3 + Core/Report/SummaryReportRunner.cs | 9 + README_UNITY.md | 4 +- Tests/DatabaseSchemaInfoTests.cs | 56 +++ Tests/ResidentMemoryCalculatorTests.cs | 4 + Tests/SummaryReportFormatterTests.cs | 2 + catalog-info.yaml | 4 +- docs/database-schema.md | 379 ++++++++++++++++++ docs/index.md | 4 +- docs/sql-safety.md | 2 +- mkdocs.yml | 7 +- 31 files changed, 1456 insertions(+), 42 deletions(-) create mode 100644 .claude/skills/memory-db-sql/SKILL.md create mode 100644 Cli/SchemaGate.cs create mode 100644 Core/ExportDestination/DatabaseMaintenance.cs create mode 100644 Core/Models/DatabaseSchemaInfo.cs create mode 100644 Tests/DatabaseSchemaInfoTests.cs create mode 100644 docs/database-schema.md diff --git a/.claude/skills/memory-db-sql/SKILL.md b/.claude/skills/memory-db-sql/SKILL.md new file mode 100644 index 0000000..0dc5d0e --- /dev/null +++ b/.claude/skills/memory-db-sql/SKILL.md @@ -0,0 +1,104 @@ +--- +name: memory-db-sql +description: Write and manage SQL queries against an exported Unity memory-snapshot database (DuckDB/SQLite), and keep the schema, views, version, and docs consistent. Use when writing/editing queries over native_objects, native_allocations, native_roots, memory_regions, system_memory_regions, etc.; investigating memory regions; adding or changing tables/views/macros; or checking database version compatibility. +--- + +# Memory snapshot database SQL + +Use this when working with the database the tool exports from a `.snap` (the `export` command), as +opposed to running the export/report pipeline itself (use `memory-snapshot-report` for that). + +**Canonical schema reference:** [`docs/database-schema.md`](../../../docs/database-schema.md). It +lists every table, column, view, macro, join key, and the version policy. Read it before writing +non-trivial queries — this skill is the workflow; that doc is the source of truth. + +## Before you query: check the schema version + +The database stamps a **major.minor** version in `schema_meta`: + +```sql +SELECT schema_version_major, schema_version_minor, msdt_version FROM schema_meta; +``` + +- **No `schema_meta` table** → a pre-versioning export (treated as 0.0): re-export the `.snap`. +- **major < `DatabaseSchemaInfo.SchemaMajor`** → table/column structure changed; **re-export required** + (`MemorySnapshotDataTools export "" ""`). +- **same major, minor < `DatabaseSchemaInfo.SchemaMinor`** → only views/indexes changed; **upgrade in + place** with `MemorySnapshotDataTools upgrade ""` (no re-export). +- In code, classify with `DatabaseSchemaInfo.Evaluate(major, minor)` → `SchemaAction` + (`None`/`UpgradeInPlace`/`ReExport`/`ToolOutdated`); the CLI `SchemaGate` already does this before + `report`/`summary`/`validate`, and `DatabaseMaintenance.{Inspect,UpgradeInPlace}` are the entry + points (see [`Core/Models/DatabaseSchemaInfo.cs`](../../../Core/Models/DatabaseSchemaInfo.cs)). + +## Querying: the things that bite + +These are spelled out in the schema doc, but the high-frequency traps: + +- **Two region tables, no FK between them.** `memory_regions` = Unity allocator buckets + (`native_allocations.memory_region_index` points here). `system_memory_regions` = OS/VM regions + (the RAM truth). Bridge an allocation to its OS region **by address range**, not a key. +- **Prefer the views/macros** over rebuilding joins: `v_allocation_enriched` (allocation + Unity + region + OS region + root + object), `v_system_region_summary` (committed/resident/Unity-tracked + per OS region), `v_region_owner_breakdown`, `v_connection_edges` (reference graph with both + endpoints typed — filter it, don't `SELECT *`), `v_assetbundle_utilization` (per-bundle: does it + reference other loaded assets, and how much). DuckDB also has `region_allocations(name)` and + `region_page_density(name)` (SQLite has the views only). +- **Connections: don't hand-count "own" edges with magic numbers.** A native object's outbound edges + include its self-reference and its managed wrappers (`native_gc_handle_bridge`/`native_connection` + to managed). For "references to *other* loaded objects," count `native_object→native_object` + `native_connection` edges where `to_index <> from_index` — exactly what `v_assetbundle_utilization` + does. +- **`native_object_address` ≠ `native_allocations.address`** — bridge objects and allocations + through a shared `root_reference_id` → `native_roots.root_id`, never by address. +- **Don't divide by `memory_regions.address_size`** (0 for `ALLOC_DEFAULT` et al.). +- **`system_memory_regions.type` is uniformly 0 on iOS** — group by `name`. +- **Resident data and `page_size` require `snap_format_version` ≥ 17** (else NULL). +- `region_page_density` is for **small-allocation zones** (NANO/TINY/SMALL); `avg_fill_pct` > 100% + means the region holds page-spanning allocations and the metric doesn't apply. + +## Writing query code safely + +This repo's first-class rule is SQL safety — see [`CLAUDE.md`](../../../CLAUDE.md) and +[`docs/sql-safety.md`](../../../docs/sql-safety.md): + +- **Parameterize every value.** DuckDB: positional `?`. SQLite: named `$name`. +- **Identifiers can't be parameters** — validate against the catalog (`information_schema.columns` + for DuckDB, `pragma_table_info($t)` for SQLite); see the `HasColumn` helpers. +- **Open read-only** for analysis (`ACCESS_MODE=READ_ONLY` / `Mode=ReadOnly`). +- The report `ExecuteQuery(string)` sink takes only internally-constructed SQL (constants in + `ReportSql`); never pass it external input. + +Ad-hoc CLI querying (no SQL command exists in the tool): use the `duckdb` CLI on a `.duckdb` file +(open with `-readonly`; if a lock error mentions DataGrip, ask the user to close that connection). + +## Upkeep: when you change the schema + +A schema change is not done until **all** of these are consistent: + +1. **Both backends** — mirror the change in + [`DuckDbExportDestination.cs`](../../../Core/ExportDestination/DuckDbExportDestination.cs) **and** + [`SqliteWriter.cs`](../../../Core/ExportDestination/SqliteWriter.cs) (tables, indexes, and the + `CreateViewsScript`). Remember: DuckDB has `ASOF` joins and table macros; SQLite does not (use a + correlated subquery; macros are DuckDB-only). Keep index/view DDL **re-runnable** (`CREATE INDEX + IF NOT EXISTS`, DuckDB `CREATE OR REPLACE VIEW` / SQLite drop-then-create) so `UpgradeSchema` works. +2. **The doc** — update [`docs/database-schema.md`](../../../docs/database-schema.md): table/column + tables, view/macro list, join keys, and the version table. New tables, views, columns, and + identifiers must appear here. +3. **The version** — in + [`Core/Models/DatabaseSchemaInfo.cs`](../../../Core/Models/DatabaseSchemaInfo.cs): + - **View/index change only** (add/change a view or index): bump **`SchemaMinor`**. Existing + databases can be upgraded in place (`MemorySnapshotDataTools upgrade`), so no re-export. + - **Table/column change** (add/rename/remove a table or column, or change a column's + meaning/units): bump **`SchemaMajor`** and reset `SchemaMinor` to 0. Old databases require a + re-export. Make sure both writers' table DDL and `schema_meta` insert stay in sync. + - Add a row to the doc's version table either way. +4. **Readers** — `snapshot_info` is read with `SELECT *` by column name + (`SummaryReportRunner.ReadSnapshotInfo`), so new columns are safe; if you add a column some + reader needs, wire it there. +5. **Tests** — extend `DatabaseSchemaInfoTests` and the export round-trip tests so the new + schema/view/version is asserted. + +## See also + +- `memory-snapshot-report` skill — export a `.snap` to a database and generate reports. +- [`docs/database-schema.md`](../../../docs/database-schema.md) — canonical schema. diff --git a/.claude/skills/memory-snapshot-report/SKILL.md b/.claude/skills/memory-snapshot-report/SKILL.md index 6d16019..d09cfc2 100644 --- a/.claude/skills/memory-snapshot-report/SKILL.md +++ b/.claude/skills/memory-snapshot-report/SKILL.md @@ -85,9 +85,13 @@ dotnet run --project Cli/MemorySnapshotDataTools.Cli.csproj -- report @@ -32,6 +33,7 @@ internal sealed class CliOptions public string GoldenPath { get; set; } = string.Empty; public string? ValidationOutputPath { get; set; } public string SummaryInputPath { get; set; } = string.Empty; + public string UpgradeDbPath { get; set; } = string.Empty; public int BatchSize { get; set; } = 2048; public int QueueCapacity { get; set; } = 256; public ValidationMode Validate { get; set; } = ValidationMode.Minimal; diff --git a/Cli/CommandLineBuilder.cs b/Cli/CommandLineBuilder.cs index 1eb8adc..c439583 100644 --- a/Cli/CommandLineBuilder.cs +++ b/Cli/CommandLineBuilder.cs @@ -14,7 +14,8 @@ public static RootCommand Build( Func runReport, Func runMultiReport, Func runValidateGolden, - Func runSummary) + Func runSummary, + Func runUpgrade) { var root = new RootCommand("Export Unity memory snapshots to DuckDB or SQLite and generate HTML reports."); @@ -342,12 +343,40 @@ public static RootCommand Build( return runSummary(options); }); + // ---- upgrade ---- + var upgradeCmd = new Command( + "upgrade", + "Upgrade an exported database's analysis views/indexes to the current minor schema version (in place; no re-export)."); + var upgradeDatabaseArg = new Argument("database") + { + Description = "Path to the exported database (.duckdb or .db) to upgrade.", + Arity = ArgumentArity.ExactlyOne, + }; + upgradeCmd.Add(upgradeDatabaseArg); + + upgradeCmd.SetAction((ParseResult parseResult) => + { + var dbPath = ExpandPath(parseResult.GetValue(upgradeDatabaseArg)!); + if (!File.Exists(dbPath)) + { + Console.Error.WriteLine($"Database file not found: {dbPath}"); + return 1; + } + var options = new CliOptions + { + Command = CommandKind.Upgrade, + UpgradeDbPath = dbPath, + }; + return runUpgrade(options); + }); + root.Add(exportCmd); root.Add(batchExportCmd); root.Add(reportCmd); root.Add(multiReportCmd); root.Add(validateCmd); root.Add(summaryCmd); + root.Add(upgradeCmd); return root; } diff --git a/Cli/Program.cs b/Cli/Program.cs index 3ae2ee0..29eac6c 100644 --- a/Cli/Program.cs +++ b/Cli/Program.cs @@ -1,5 +1,6 @@ using MemorySnapshotDataTools; using MemorySnapshotDataTools.Export; +using MemorySnapshotDataTools.ExportDestination; using MemorySnapshotDataTools.Report; using MemorySnapshotDataTools.Report.MultiSnapshotReport; using MemorySnapshotDataTools.Validation; @@ -10,7 +11,7 @@ internal static class Program { private static int Main(string[] args) { - var root = CommandLineBuilder.Build(RunExport, RunBatchExport, RunReport, RunMultiReport, RunValidateGolden, RunSummary); + var root = CommandLineBuilder.Build(RunExport, RunBatchExport, RunReport, RunMultiReport, RunValidateGolden, RunSummary, RunUpgrade); return root.Parse(args).Invoke(); } @@ -72,6 +73,7 @@ private static int RunBatchExport(CliOptions options) private static int RunReport(CliOptions options) { + SchemaGate.Check(options.ReportDbPath); var reportOptions = new ReportRunOptions { ReportDbPath = options.ReportDbPath, @@ -97,6 +99,7 @@ private static int RunMultiReport(CliOptions options) private static int RunValidateGolden(CliOptions options) { + SchemaGate.Check(options.ReportDbPath); try { return GoldenValidationRunner.ValidateAndWriteResult( @@ -114,6 +117,8 @@ private static int RunValidateGolden(CliOptions options) private static int RunSummary(CliOptions options) { + // Summary accepts either a .snap or an exported database; only databases have a schema to check. + SchemaGate.Check(options.SummaryInputPath); var progress = new ConsoleProgress(options.Verbose); using var cts = CreateCancellationSource(); @@ -135,6 +140,48 @@ private static int RunSummary(CliOptions options) } } + private static int RunUpgrade(CliOptions options) + { + try + { + var before = DatabaseMaintenance.Inspect(options.UpgradeDbPath); + var current = $"v{DatabaseSchemaInfo.SchemaMajor}.{DatabaseSchemaInfo.SchemaMinor}"; + + switch (before.Action) + { + case SchemaAction.None: + Console.WriteLine($"Database is already at the current schema {current}. Nothing to do."); + return 0; + + case SchemaAction.ToolOutdated: + Console.Error.WriteLine( + $"Database schema v{before.Major}.{before.Minor} is newer than this build ({current}). " + + $"Update {DatabaseSchemaInfo.ToolName} instead of downgrading."); + return 1; + + case SchemaAction.ReExport: + Console.Error.WriteLine( + $"Database major version (v{before.Major}) is behind v{DatabaseSchemaInfo.SchemaMajor}; an in-place upgrade is not possible. " + + "Re-export from the original snapshot:"); + Console.Error.WriteLine($" {before.ReExportCommand ?? $"{DatabaseSchemaInfo.ToolName} export \"{options.UpgradeDbPath}\""}"); + return 1; + + case SchemaAction.UpgradeInPlace: + DatabaseMaintenance.UpgradeInPlace(options.UpgradeDbPath); + Console.WriteLine($"Upgraded database schema from v{before.Major}.{before.Minor} to {current}."); + return 0; + } + + return 0; + } + catch (Exception ex) + { + Console.Error.WriteLine("Schema upgrade failed."); + Console.Error.WriteLine(ex.Message); + return 1; + } + } + private static CancellationTokenSource CreateCancellationSource() { var cts = new CancellationTokenSource(); diff --git a/Cli/SchemaGate.cs b/Cli/SchemaGate.cs new file mode 100644 index 0000000..a731fe5 --- /dev/null +++ b/Cli/SchemaGate.cs @@ -0,0 +1,136 @@ +using MemorySnapshotDataTools.Export; +using MemorySnapshotDataTools.ExportDestination; + +namespace MemorySnapshotDataTools.Cli; + +/// +/// Checks an exported database's schema version before a read command (report/summary/validate) and, +/// when it is behind the current build, informs the user and — interactively — offers to upgrade it. +/// +/// +/// +/// Minor behind (new views/indexes only): offers an in-place upgrade +/// (), which is safe and fast. +/// Major behind / pre-versioning (table structure changed): a re-export from the +/// original .snap is required. If the snapshot still exists at the recorded path, offers to run it; +/// otherwise prints the exact export command. +/// +/// Non-interactive sessions (stdin redirected) never auto-modify the database: the gate prints the +/// advisory and the command to run, then proceeds with the existing database. +/// +internal static class SchemaGate +{ + private const string DatabaseExtensions = ".duckdb,.db,.sqlite,.sqlite3"; + + /// Runs the schema check for a database path. Always proceeds (returns nothing); advisory only. + public static void Check(string dbPath) + { + if (!LooksLikeDatabase(dbPath) || !File.Exists(dbPath)) + return; + + SchemaStatus status; + try + { + status = DatabaseMaintenance.Inspect(dbPath); + } + catch + { + // Never block the requested command because the version probe failed (e.g. locked file). + return; + } + + var current = $"v{DatabaseSchemaInfo.SchemaMajor}.{DatabaseSchemaInfo.SchemaMinor}"; + var found = $"v{status.Major}.{status.Minor}"; + + switch (status.Action) + { + case SchemaAction.None: + break; + + case SchemaAction.ToolOutdated: + Console.Error.WriteLine( + $"Note: database schema {found} is newer than this build ({current}). " + + $"Update {DatabaseSchemaInfo.ToolName} for full support."); + break; + + case SchemaAction.UpgradeInPlace: + HandleUpgradeInPlace(status, current, found); + break; + + case SchemaAction.ReExport: + HandleReExport(status, current, found); + break; + } + } + + private static void HandleUpgradeInPlace(SchemaStatus status, string current, string found) + { + Console.Error.WriteLine( + $"Database schema {found} is behind {current} — newer analysis views/indexes are available."); + + if (Confirm("Upgrade this database in place now?", defaultYes: true)) + { + DatabaseMaintenance.UpgradeInPlace(status.DatabasePath); + Console.Error.WriteLine($"Upgraded database schema to {current}."); + } + else + { + Console.Error.WriteLine($" To upgrade later: {DatabaseSchemaInfo.ToolName} upgrade \"{status.DatabasePath}\""); + } + } + + private static void HandleReExport(SchemaStatus status, string current, string found) + { + var versionDesc = status.Major == 0 ? "a pre-versioning schema" : $"schema {found}"; + Console.Error.WriteLine( + $"Database has {versionDesc}; the current major version is v{DatabaseSchemaInfo.SchemaMajor}. " + + "Its table structure is outdated and it must be re-exported from the original snapshot."); + + if (status.SnapshotExists) + { + if (Confirm($"Re-export now from {status.SnapshotPath}?", defaultYes: false)) + { + var code = ExportRunner.Run( + status.SnapshotPath!, + status.DatabasePath, + new ExportRunOptions(), + status.IsSqlite ? DestinationKind.Sqlite : DestinationKind.DuckDb, + new ConsoleProgress(verbose: true), + CancellationToken.None); + Console.Error.WriteLine(code == 0 + ? "Re-export complete." + : "Re-export failed; continuing with the existing database."); + } + else + { + Console.Error.WriteLine($" To re-export later: {status.ReExportCommand}"); + } + } + else + { + var where = string.IsNullOrEmpty(status.SnapshotPath) ? string.Empty : $" at {status.SnapshotPath}"; + var cmd = status.ReExportCommand + ?? $"{DatabaseSchemaInfo.ToolName} export \"{status.DatabasePath}\""; + Console.Error.WriteLine($" Original snapshot not found{where}. Re-export with: {cmd}"); + } + } + + /// Prompts y/n on an interactive terminal; in non-interactive sessions returns false (advisory only). + private static bool Confirm(string prompt, bool defaultYes) + { + if (Console.IsInputRedirected) + return false; + + Console.Error.Write($"{prompt} [{(defaultYes ? "Y/n" : "y/N")}] "); + var line = Console.ReadLine(); + if (string.IsNullOrWhiteSpace(line)) + return defaultYes; + return line.Trim().StartsWith("y", StringComparison.OrdinalIgnoreCase); + } + + private static bool LooksLikeDatabase(string path) + { + var ext = Path.GetExtension(path).ToLowerInvariant(); + return DatabaseExtensions.Split(',').Contains(ext); + } +} diff --git a/Core/ExportDestination/DatabaseMaintenance.cs b/Core/ExportDestination/DatabaseMaintenance.cs new file mode 100644 index 0000000..af76b73 --- /dev/null +++ b/Core/ExportDestination/DatabaseMaintenance.cs @@ -0,0 +1,68 @@ +using System.Data.Common; +using DuckDB.NET.Data; +using Microsoft.Data.Sqlite; + +namespace MemorySnapshotDataTools.ExportDestination; + +/// +/// Schema status of an exported database relative to the current build (see ). +/// +/// Stored major version (0 = pre-versioning). +/// Stored minor version. +/// Recommended action. +/// Original .snap path from snapshot_info, when available. +/// True if the database is SQLite (affects the re-export command). +/// Path to the inspected database (used to build the re-export command). +public readonly record struct SchemaStatus( + int Major, int Minor, SchemaAction Action, string? SnapshotPath, bool IsSqlite, string DatabasePath) +{ + /// True when the source snapshot still exists on disk, so a re-export can be offered/run. + public bool SnapshotExists => !string.IsNullOrEmpty(SnapshotPath) && File.Exists(SnapshotPath); + + /// The exact CLI export command to re-export this database, or null when the snapshot path is unknown. + public string? ReExportCommand => string.IsNullOrEmpty(SnapshotPath) + ? null + : DatabaseSchemaInfo.BuildReExportCommand(SnapshotPath!, DatabasePath, IsSqlite); +} + +/// +/// Path-based schema inspection and in-place upgrade for exported databases. Keeps all DB-connection +/// handling in Core so callers (e.g. the CLI) only deal with paths and the resulting . +/// +public static class DatabaseMaintenance +{ + /// + /// Opens the database read-only, reads its schema version and source snapshot path, and classifies it. + /// Never throws for a readable file; an unreadable/locked file surfaces as the underlying exception. + /// + /// Path to the exported database (.duckdb / .db). + /// The schema status. + public static SchemaStatus Inspect(string dbPath) + { + var isSqlite = IsSqlitePath(dbPath); + using var connection = OpenReadOnly(dbPath, isSqlite); + connection.Open(); + var (major, minor) = DatabaseSchemaInfo.ReadVersion(connection); + var snapshotPath = DatabaseSchemaInfo.ReadSnapshotPath(connection); + return new SchemaStatus(major, minor, DatabaseSchemaInfo.Evaluate(major, minor), snapshotPath, isSqlite, dbPath); + } + + /// + /// Performs an in-place minor schema upgrade (re-applies views and indexes, bumps the minor version). + /// Only valid for a database whose major version matches the current build; callers should check + /// is first. + /// + /// Path to the database to upgrade. + public static void UpgradeInPlace(string dbPath) + { + var kind = IsSqlitePath(dbPath) ? DestinationKind.Sqlite : DestinationKind.DuckDb; + ExportDestinationFactory.Create(kind).UpgradeSchema(dbPath); + } + + private static bool IsSqlitePath(string dbPath) => + Path.GetExtension(dbPath).ToLowerInvariant() is ".db" or ".sqlite" or ".sqlite3"; + + private static DbConnection OpenReadOnly(string dbPath, bool isSqlite) => isSqlite + ? new SqliteConnection($"Data Source={dbPath};Mode=ReadOnly") + : new DuckDBConnection($"Data Source={dbPath};ACCESS_MODE=READ_ONLY"); +} diff --git a/Core/ExportDestination/DuckDbExportDestination.cs b/Core/ExportDestination/DuckDbExportDestination.cs index dd7dccc..9bd1823 100644 --- a/Core/ExportDestination/DuckDbExportDestination.cs +++ b/Core/ExportDestination/DuckDbExportDestination.cs @@ -44,14 +44,25 @@ public WriteStats ConsumeAndWrite( // Create schema Exec(connection, SchemaTablesScript); + // Record the schema version so consumers can detect when a re-export is needed. + using (var cmd = connection.CreateCommand()) + { + cmd.CommandText = "INSERT INTO schema_meta(schema_version_major, schema_version_minor, msdt_version, created_at_utc) VALUES (?, ?, ?, ?);"; + cmd.Parameters.Add(new DuckDBParameter { Value = DatabaseSchemaInfo.SchemaMajor }); + cmd.Parameters.Add(new DuckDBParameter { Value = DatabaseSchemaInfo.SchemaMinor }); + cmd.Parameters.Add(new DuckDBParameter { Value = DatabaseSchemaInfo.ToolVersion }); + cmd.Parameters.Add(new DuckDBParameter { Value = DateTime.UtcNow.ToString("O") }); + cmd.ExecuteNonQuery(); + } + // Insert snapshot_info using positional parameters (DuckDB uses ? placeholders) using (var cmd = connection.CreateCommand()) { cmd.CommandText = """ INSERT INTO snapshot_info( snapshot_path, exported_at_utc, unity_version, - snap_format_version, session_guid, product_name, platform, record_date_utc) - VALUES (?, ?, ?, ?, ?, ?, ?, ?); + snap_format_version, session_guid, product_name, platform, record_date_utc, page_size) + VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?); """; cmd.Parameters.Add(new DuckDBParameter { Value = snapshotInfo.SnapshotPath }); cmd.Parameters.Add(new DuckDBParameter { Value = snapshotInfo.ExportedAtUtc }); @@ -64,6 +75,7 @@ INSERT INTO snapshot_info( cmd.Parameters.Add(new DuckDBParameter { Value = string.IsNullOrEmpty(snapshotInfo.ProductName) ? (object)DBNull.Value : snapshotInfo.ProductName }); cmd.Parameters.Add(new DuckDBParameter { Value = string.IsNullOrEmpty(snapshotInfo.Platform) ? (object)DBNull.Value : snapshotInfo.Platform }); cmd.Parameters.Add(new DuckDBParameter { Value = string.IsNullOrEmpty(snapshotInfo.RecordDateUtc) ? (object)DBNull.Value : snapshotInfo.RecordDateUtc }); + cmd.Parameters.Add(new DuckDBParameter { Value = snapshotInfo.PageSize == 0 ? (object)DBNull.Value : unchecked((long)snapshotInfo.PageSize) }); cmd.ExecuteNonQuery(); } state.AddWritten(1); @@ -254,6 +266,7 @@ INSERT INTO snapshot_info( var indexSw = Stopwatch.StartNew(); Exec(connection, CreateIndexesScript); + Exec(connection, CreateViewsScript); indexSw.Stop(); stats.IndexBuildMs = indexSw.ElapsedMilliseconds; @@ -394,6 +407,27 @@ AND NOT EXISTS ( #endregion + #region UpgradeSchema + + /// + public void UpgradeSchema(string dbPath) + { + using var connection = new DuckDBConnection($"Data Source={dbPath}"); + connection.Open(); + + // Indexes use IF NOT EXISTS and views use CREATE OR REPLACE, so both scripts are re-runnable. + Exec(connection, CreateIndexesScript); + Exec(connection, CreateViewsScript); + + using var cmd = connection.CreateCommand(); + cmd.CommandText = "UPDATE schema_meta SET schema_version_minor = ?, msdt_version = ?;"; + cmd.Parameters.Add(new DuckDBParameter { Value = DatabaseSchemaInfo.SchemaMinor }); + cmd.Parameters.Add(new DuckDBParameter { Value = DatabaseSchemaInfo.ToolVersion }); + cmd.ExecuteNonQuery(); + } + + #endregion + #region Helpers private static void Exec(DuckDBConnection connection, string sql) @@ -424,6 +458,13 @@ private static long QueryCount(DuckDBConnection connection, string sql) // (DuckDB Appender reads raw bytes; passing int to BIGINT column corrupts data). // int → INTEGER (32-bit), long/ulong-cast → BIGINT (64-bit). private const string SchemaTablesScript = """ +CREATE OR REPLACE TABLE schema_meta ( + schema_version_major INTEGER NOT NULL, + schema_version_minor INTEGER NOT NULL, + msdt_version VARCHAR, + created_at_utc VARCHAR NOT NULL +); + CREATE OR REPLACE TABLE snapshot_info ( snapshot_path VARCHAR NOT NULL, exported_at_utc VARCHAR NOT NULL, @@ -432,7 +473,8 @@ CREATE OR REPLACE TABLE snapshot_info ( session_guid BIGINT, product_name VARCHAR, platform VARCHAR, - record_date_utc VARCHAR + record_date_utc VARCHAR, + page_size BIGINT ); CREATE OR REPLACE TABLE native_objects ( @@ -512,15 +554,137 @@ resident_available INTEGER NOT NULL ); """; + // CREATE INDEX IF NOT EXISTS so this script is idempotent and re-runnable by the in-place + // schema upgrade path (UpgradeSchema), not just on a fresh export. private const string CreateIndexesScript = """ -CREATE INDEX idx_connections_from ON connections(from_kind, from_index); -CREATE INDEX idx_connections_to ON connections(to_kind, to_index); -CREATE INDEX idx_native_objects_instance_id ON native_objects(instance_id); -CREATE INDEX idx_native_objects_is_destroyed ON native_objects(is_destroyed); -CREATE INDEX idx_managed_objects_address ON managed_objects(address); -CREATE INDEX idx_memory_regions_address_base ON memory_regions(address_base); -CREATE INDEX idx_native_allocations_address ON native_allocations(address); -CREATE INDEX idx_native_allocations_region ON native_allocations(memory_region_index); +CREATE INDEX IF NOT EXISTS idx_connections_from ON connections(from_kind, from_index); +CREATE INDEX IF NOT EXISTS idx_connections_to ON connections(to_kind, to_index); +CREATE INDEX IF NOT EXISTS idx_native_objects_instance_id ON native_objects(instance_id); +CREATE INDEX IF NOT EXISTS idx_native_objects_is_destroyed ON native_objects(is_destroyed); +CREATE INDEX IF NOT EXISTS idx_managed_objects_address ON managed_objects(address); +CREATE INDEX IF NOT EXISTS idx_memory_regions_address_base ON memory_regions(address_base); +CREATE INDEX IF NOT EXISTS idx_native_allocations_address ON native_allocations(address); +CREATE INDEX IF NOT EXISTS idx_native_allocations_region ON native_allocations(memory_region_index); +CREATE INDEX IF NOT EXISTS idx_system_memory_regions_address ON system_memory_regions(address); +"""; + + // Analysis views and table macros. See docs/database-schema.md for the full reference. + // The Exec helper splits this on ';', so each statement is terminated with ';' and must contain + // no embedded semicolons. DuckDB-only constructs: ASOF JOIN (fast address-range containment) and + // table macros (parameterized views). The SQLite equivalents live in SqliteWriter. + private const string CreateViewsScript = """ +CREATE OR REPLACE VIEW v_allocation_enriched AS +SELECT + a.allocation_index, + a.address, + a.size_bytes, + a.overhead_size_bytes, + a.padding_size_bytes, + a.memory_region_index, + mr.name AS unity_region_name, + CASE WHEN a.address < s.address + s.size_bytes THEN s.region_index END AS system_region_index, + CASE WHEN a.address < s.address + s.size_bytes THEN s.name END AS system_region_name, + a.root_reference_id, + rt.area_name, + rt.object_name AS root_object_name, + o.native_object_index, + o.native_type_name, + o.name AS object_name +FROM native_allocations a +LEFT JOIN memory_regions mr ON mr.region_index = a.memory_region_index +ASOF LEFT JOIN system_memory_regions s ON a.address >= s.address +LEFT JOIN native_roots rt ON rt.root_id = a.root_reference_id +LEFT JOIN native_objects o ON o.root_reference_id = a.root_reference_id; + +CREATE OR REPLACE VIEW v_system_region_summary AS +SELECT + s.region_index, + s.name, + printf('0x%x', s.address) AS addr_hex, + s.size_bytes AS committed_bytes, + s.resident_bytes, + ROUND(100.0 * s.resident_bytes / NULLIF(s.size_bytes, 0), 1) AS pct_resident, + COUNT(a.allocation_index) AS unity_alloc_count, + COALESCE(SUM(a.size_bytes), 0) AS unity_live_bytes, + ROUND(100.0 * COALESCE(SUM(a.size_bytes), 0) / NULLIF(s.resident_bytes, 0), 1) AS unity_live_pct_of_resident +FROM system_memory_regions s +LEFT JOIN native_allocations a + ON a.address >= s.address AND a.address < s.address + s.size_bytes +GROUP BY s.region_index, s.name, s.address, s.size_bytes, s.resident_bytes; + +CREATE OR REPLACE VIEW v_region_owner_breakdown AS +SELECT + system_region_name, + COALESCE(native_type_name, area_name, '(untracked/no-root)') AS owner, + COUNT(*) AS alloc_count, + SUM(size_bytes) AS live_bytes +FROM v_allocation_enriched +GROUP BY 1, 2; + +CREATE OR REPLACE VIEW v_connection_edges AS +SELECT + c.connection_type, + c.from_kind, + c.from_index, + COALESCE(fno.native_type_name, fmo.managed_type_name) AS from_type, + fno.name AS from_name, + c.to_kind, + c.to_index, + COALESCE(tno.native_type_name, tmo.managed_type_name) AS to_type, + tno.name AS to_name +-- The kind check is folded into the join KEY (CASE expr, where NULL never matches) rather than AND'd +-- into the ON clause. A constant predicate inside a LEFT JOIN ON forces DuckDB into a quadratic +-- BLOCKWISE_NL_JOIN over millions of edges, whereas a pure equi-key lets it hash-join. The native and +-- managed object index spaces overlap, so the kind guard is required for correctness. +FROM connections c +LEFT JOIN native_objects fno ON fno.native_object_index = (CASE WHEN c.from_kind = 'native_object' THEN c.from_index END) +LEFT JOIN managed_objects fmo ON fmo.managed_object_index = (CASE WHEN c.from_kind = 'managed_object' THEN c.from_index END) +LEFT JOIN native_objects tno ON tno.native_object_index = (CASE WHEN c.to_kind = 'native_object' THEN c.to_index END) +LEFT JOIN managed_objects tmo ON tmo.managed_object_index = (CASE WHEN c.to_kind = 'managed_object' THEN c.to_index END); + +CREATE OR REPLACE VIEW v_assetbundle_utilization AS +WITH refs AS ( + SELECT DISTINCT c.from_index AS bundle_index, c.to_index AS ref_index + FROM connections c + JOIN native_objects b ON b.native_object_index = c.from_index AND b.native_type_name = 'AssetBundle' + WHERE c.from_kind = 'native_object' AND c.to_kind = 'native_object' + AND c.connection_type = 'native_connection' AND c.to_index <> c.from_index +) +SELECT + b.native_object_index, + b.name, + b.size_bytes AS bundle_size_bytes, + b.resident_size_bytes AS bundle_resident_bytes, + b.is_destroyed, + COUNT(DISTINCT r.ref_index) AS referenced_object_count, + COUNT(DISTINCT o.native_type_name) AS referenced_type_count, + COALESCE(SUM(o.size_bytes), 0) AS referenced_size_bytes, + COALESCE(SUM(o.resident_size_bytes), 0) AS referenced_resident_bytes, + (COUNT(DISTINCT r.ref_index) > 0) AS references_loaded_assets +FROM native_objects b +LEFT JOIN refs r ON r.bundle_index = b.native_object_index +LEFT JOIN native_objects o ON o.native_object_index = r.ref_index +WHERE b.native_type_name = 'AssetBundle' +GROUP BY b.native_object_index, b.name, b.size_bytes, b.resident_size_bytes, b.is_destroyed; + +CREATE OR REPLACE MACRO region_allocations(region_name) AS TABLE + SELECT * FROM v_allocation_enriched WHERE system_region_name = region_name; + +CREATE OR REPLACE MACRO region_page_density(region_name) AS TABLE +SELECT COUNT(*) AS touched_pages, + COUNT(*) * MAX(page_bytes) AS touched_bytes, + ROUND(AVG(used), 0) AS avg_live_bytes_per_page, + ROUND(100.0 * AVG(used) / MAX(page_bytes), 1) AS avg_fill_pct, + ROUND(AVG(n), 1) AS avg_allocs_per_page +FROM ( + SELECT a.address // (SELECT COALESCE(NULLIF(page_size, 0), 16384) FROM snapshot_info LIMIT 1) AS page, + (SELECT COALESCE(NULLIF(page_size, 0), 16384) FROM snapshot_info LIMIT 1) AS page_bytes, + SUM(a.size_bytes) AS used, COUNT(*) AS n + FROM native_allocations a + JOIN system_memory_regions s + ON s.name = region_name AND a.address >= s.address AND a.address < s.address + s.size_bytes + GROUP BY 1, 2 +); """; #endregion diff --git a/Core/ExportDestination/IExportDestinationWriter.cs b/Core/ExportDestination/IExportDestinationWriter.cs index 6926c37..b10b48e 100644 --- a/Core/ExportDestination/IExportDestinationWriter.cs +++ b/Core/ExportDestination/IExportDestinationWriter.cs @@ -43,4 +43,14 @@ WriteStats ConsumeAndWrite( /// Original snapshot data used for expected counts. /// Validation level (none, minimal, full). void Validate(string dbPath, RawSnapshotData rawData, ValidationMode mode); + + /// + /// Performs an in-place minor schema upgrade on an existing database: re-applies indexes and + /// analysis views and bumps schema_meta.schema_version_minor to + /// . Does not touch table data and never re-parses the + /// snapshot — only valid when the database's major version matches the current one (see + /// ). Opens the database read-write. + /// + /// Path to the database file to upgrade. + void UpgradeSchema(string dbPath); } diff --git a/Core/ExportDestination/SqliteExportDestination.cs b/Core/ExportDestination/SqliteExportDestination.cs index 8933d6c..16d720d 100644 --- a/Core/ExportDestination/SqliteExportDestination.cs +++ b/Core/ExportDestination/SqliteExportDestination.cs @@ -27,4 +27,8 @@ public void WriteSummaryMetrics(string dbPath, SummaryMetrics metrics) /// public void Validate(string dbPath, RawSnapshotData rawData, ValidationMode mode) => SqliteWriter.Validate(dbPath, rawData, mode); + + /// + public void UpgradeSchema(string dbPath) + => SqliteWriter.UpgradeSchema(dbPath); } diff --git a/Core/ExportDestination/SqliteWriter.cs b/Core/ExportDestination/SqliteWriter.cs index 659e8f4..f9db90e 100644 --- a/Core/ExportDestination/SqliteWriter.cs +++ b/Core/ExportDestination/SqliteWriter.cs @@ -175,6 +175,37 @@ INSERT INTO summary_metrics(metric_group, category, committed_bytes, resident_by #endregion + #region UpgradeSchema + + /// + /// Re-applies indexes and analysis views to an existing SQLite database and bumps the minor schema + /// version. See . + /// + public static void UpgradeSchema(string dbPath) + { + using var connection = new SqliteConnection($"Data Source={dbPath}"); + connection.Open(); + + using var transaction = connection.BeginTransaction(); + // Views are dropped+recreated by CreateViewsScript; indexes use IF NOT EXISTS — both re-runnable. + ExecScript(connection, transaction, DropViewsScript); + ExecScript(connection, transaction, CreateIndexesScript); + ExecScript(connection, transaction, CreateViewsScript); + + using (var cmd = connection.CreateCommand()) + { + cmd.Transaction = transaction; + cmd.CommandText = "UPDATE schema_meta SET schema_version_minor = $min, msdt_version = $m;"; + cmd.Parameters.AddWithValue("$min", DatabaseSchemaInfo.SchemaMinor); + cmd.Parameters.AddWithValue("$m", DatabaseSchemaInfo.ToolVersion); + cmd.ExecuteNonQuery(); + } + + transaction.Commit(); + } + + #endregion + #region ConsumeAndWrite /// @@ -212,13 +243,24 @@ public static WriteStats ConsumeAndWrite( { ExecScript(connection, transaction, SchemaTablesScript); + using (var metaCmd = connection.CreateCommand()) + { + metaCmd.Transaction = transaction; + metaCmd.CommandText = "INSERT INTO schema_meta(schema_version_major, schema_version_minor, msdt_version, created_at_utc) VALUES ($maj, $min, $m, $c);"; + metaCmd.Parameters.AddWithValue("$maj", DatabaseSchemaInfo.SchemaMajor); + metaCmd.Parameters.AddWithValue("$min", DatabaseSchemaInfo.SchemaMinor); + metaCmd.Parameters.AddWithValue("$m", DatabaseSchemaInfo.ToolVersion); + metaCmd.Parameters.AddWithValue("$c", DateTime.UtcNow.ToString("O")); + metaCmd.ExecuteNonQuery(); + } + using var snapshotCmd = connection.CreateCommand(); snapshotCmd.Transaction = transaction; snapshotCmd.CommandText = """ INSERT INTO snapshot_info( snapshot_path, exported_at_utc, unity_version, - snap_format_version, session_guid, product_name, platform, record_date_utc) - VALUES ($p, $e, $u, $sf, $sg, $pn, $pl, $rd); + snap_format_version, session_guid, product_name, platform, record_date_utc, page_size) + VALUES ($p, $e, $u, $sf, $sg, $pn, $pl, $rd, $ps); """; snapshotCmd.Parameters.AddWithValue("$p", snapshotInfo.SnapshotPath); snapshotCmd.Parameters.AddWithValue("$e", snapshotInfo.ExportedAtUtc); @@ -228,6 +270,7 @@ INSERT INTO snapshot_info( snapshotCmd.Parameters.AddWithValue("$pn", string.IsNullOrEmpty(snapshotInfo.ProductName) ? DBNull.Value : snapshotInfo.ProductName); snapshotCmd.Parameters.AddWithValue("$pl", string.IsNullOrEmpty(snapshotInfo.Platform) ? DBNull.Value : snapshotInfo.Platform); snapshotCmd.Parameters.AddWithValue("$rd", string.IsNullOrEmpty(snapshotInfo.RecordDateUtc) ? DBNull.Value : snapshotInfo.RecordDateUtc); + snapshotCmd.Parameters.AddWithValue("$ps", snapshotInfo.PageSize == 0 ? DBNull.Value : unchecked((long)snapshotInfo.PageSize)); snapshotCmd.ExecuteNonQuery(); state.AddWritten(1); var insertSw = Stopwatch.StartNew(); @@ -393,6 +436,7 @@ INSERT INTO snapshot_info( using (var indexTransaction = connection.BeginTransaction()) { ExecScript(connection, indexTransaction, CreateIndexesScript); + ExecScript(connection, indexTransaction, CreateViewsScript); indexTransaction.Commit(); } indexSw.Stop(); @@ -764,6 +808,12 @@ private static long QueryCount(SqliteConnection connection, string sql) #endregion private const string SchemaTablesScript = """ +DROP VIEW IF EXISTS v_assetbundle_utilization; +DROP VIEW IF EXISTS v_connection_edges; +DROP VIEW IF EXISTS v_region_owner_breakdown; +DROP VIEW IF EXISTS v_system_region_summary; +DROP VIEW IF EXISTS v_allocation_enriched; +DROP TABLE IF EXISTS schema_meta; DROP TABLE IF EXISTS snapshot_info; DROP TABLE IF EXISTS native_objects; DROP TABLE IF EXISTS managed_objects; @@ -773,6 +823,13 @@ private static long QueryCount(SqliteConnection connection, string sql) DROP TABLE IF EXISTS native_allocations; DROP TABLE IF EXISTS system_memory_regions; +CREATE TABLE schema_meta ( + schema_version_major INTEGER NOT NULL, + schema_version_minor INTEGER NOT NULL, + msdt_version TEXT, + created_at_utc TEXT NOT NULL +); + CREATE TABLE snapshot_info ( snapshot_path TEXT NOT NULL, exported_at_utc TEXT NOT NULL, @@ -781,7 +838,8 @@ CREATE TABLE snapshot_info ( session_guid INTEGER, product_name TEXT, platform TEXT, - record_date_utc TEXT + record_date_utc TEXT, + page_size INTEGER ); CREATE TABLE native_objects ( @@ -861,15 +919,129 @@ resident_available INTEGER NOT NULL ); """; + // CREATE INDEX IF NOT EXISTS so this script is idempotent and re-runnable by the in-place + // schema upgrade path (UpgradeSchema), not just on a fresh export. private const string CreateIndexesScript = """ -CREATE INDEX idx_connections_from ON connections(from_kind, from_index); -CREATE INDEX idx_connections_to ON connections(to_kind, to_index); -CREATE INDEX idx_native_objects_instance_id ON native_objects(instance_id); -CREATE INDEX idx_native_objects_is_destroyed ON native_objects(is_destroyed); -CREATE INDEX idx_managed_objects_address ON managed_objects(address); -CREATE INDEX idx_memory_regions_address_base ON memory_regions(address_base); -CREATE INDEX idx_native_allocations_address ON native_allocations(address); -CREATE INDEX idx_native_allocations_region ON native_allocations(memory_region_index); +CREATE INDEX IF NOT EXISTS idx_connections_from ON connections(from_kind, from_index); +CREATE INDEX IF NOT EXISTS idx_connections_to ON connections(to_kind, to_index); +CREATE INDEX IF NOT EXISTS idx_native_objects_instance_id ON native_objects(instance_id); +CREATE INDEX IF NOT EXISTS idx_native_objects_is_destroyed ON native_objects(is_destroyed); +CREATE INDEX IF NOT EXISTS idx_managed_objects_address ON managed_objects(address); +CREATE INDEX IF NOT EXISTS idx_memory_regions_address_base ON memory_regions(address_base); +CREATE INDEX IF NOT EXISTS idx_native_allocations_address ON native_allocations(address); +CREATE INDEX IF NOT EXISTS idx_native_allocations_region ON native_allocations(memory_region_index); +CREATE INDEX IF NOT EXISTS idx_system_memory_regions_address ON system_memory_regions(address); +"""; + + // Drops the analysis views so CreateViewsScript (which uses CREATE VIEW, not CREATE OR REPLACE, + // for SQLite) is re-runnable by the in-place upgrade path. Order does not matter with IF EXISTS. + private const string DropViewsScript = """ +DROP VIEW IF EXISTS v_assetbundle_utilization; +DROP VIEW IF EXISTS v_connection_edges; +DROP VIEW IF EXISTS v_region_owner_breakdown; +DROP VIEW IF EXISTS v_system_region_summary; +DROP VIEW IF EXISTS v_allocation_enriched; +"""; + + // Analysis views mirroring the DuckDB ones (docs/database-schema.md). SQLite has no ASOF JOIN + // or table macros, so v_allocation_enriched resolves the containing system region via a + // correlated subquery (nearest region whose range covers the address), and the per-region + // page-density helper (region_page_density) is DuckDB-only — see the doc for the manual query. + private const string CreateViewsScript = """ +CREATE VIEW v_allocation_enriched AS +SELECT + a.allocation_index, + a.address, + a.size_bytes, + a.overhead_size_bytes, + a.padding_size_bytes, + a.memory_region_index, + mr.name AS unity_region_name, + (SELECT s.region_index FROM system_memory_regions s + WHERE s.address <= a.address AND a.address < s.address + s.size_bytes + ORDER BY s.address DESC LIMIT 1) AS system_region_index, + (SELECT s.name FROM system_memory_regions s + WHERE s.address <= a.address AND a.address < s.address + s.size_bytes + ORDER BY s.address DESC LIMIT 1) AS system_region_name, + a.root_reference_id, + rt.area_name, + rt.object_name AS root_object_name, + o.native_object_index, + o.native_type_name, + o.name AS object_name +FROM native_allocations a +LEFT JOIN memory_regions mr ON mr.region_index = a.memory_region_index +LEFT JOIN native_roots rt ON rt.root_id = a.root_reference_id +LEFT JOIN native_objects o ON o.root_reference_id = a.root_reference_id; + +CREATE VIEW v_system_region_summary AS +SELECT + s.region_index, + s.name, + printf('0x%x', s.address) AS addr_hex, + s.size_bytes AS committed_bytes, + s.resident_bytes, + ROUND(100.0 * s.resident_bytes / NULLIF(s.size_bytes, 0), 1) AS pct_resident, + COUNT(a.allocation_index) AS unity_alloc_count, + COALESCE(SUM(a.size_bytes), 0) AS unity_live_bytes, + ROUND(100.0 * COALESCE(SUM(a.size_bytes), 0) / NULLIF(s.resident_bytes, 0), 1) AS unity_live_pct_of_resident +FROM system_memory_regions s +LEFT JOIN native_allocations a + ON a.address >= s.address AND a.address < s.address + s.size_bytes +GROUP BY s.region_index, s.name, s.address, s.size_bytes, s.resident_bytes; + +CREATE VIEW v_region_owner_breakdown AS +SELECT + system_region_name, + COALESCE(native_type_name, area_name, '(untracked/no-root)') AS owner, + COUNT(*) AS alloc_count, + SUM(size_bytes) AS live_bytes +FROM v_allocation_enriched +GROUP BY 1, 2; + +CREATE VIEW v_connection_edges AS +SELECT + c.connection_type, + c.from_kind, + c.from_index, + COALESCE(fno.native_type_name, fmo.managed_type_name) AS from_type, + fno.name AS from_name, + c.to_kind, + c.to_index, + COALESCE(tno.native_type_name, tmo.managed_type_name) AS to_type, + tno.name AS to_name +-- Kind check folded into the join KEY (see the DuckDB copy) so the equi-join stays index-friendly. +-- The native and managed object index spaces overlap, so the kind guard is required for correctness. +FROM connections c +LEFT JOIN native_objects fno ON fno.native_object_index = (CASE WHEN c.from_kind = 'native_object' THEN c.from_index END) +LEFT JOIN managed_objects fmo ON fmo.managed_object_index = (CASE WHEN c.from_kind = 'managed_object' THEN c.from_index END) +LEFT JOIN native_objects tno ON tno.native_object_index = (CASE WHEN c.to_kind = 'native_object' THEN c.to_index END) +LEFT JOIN managed_objects tmo ON tmo.managed_object_index = (CASE WHEN c.to_kind = 'managed_object' THEN c.to_index END); + +CREATE VIEW v_assetbundle_utilization AS +WITH refs AS ( + SELECT DISTINCT c.from_index AS bundle_index, c.to_index AS ref_index + FROM connections c + JOIN native_objects b ON b.native_object_index = c.from_index AND b.native_type_name = 'AssetBundle' + WHERE c.from_kind = 'native_object' AND c.to_kind = 'native_object' + AND c.connection_type = 'native_connection' AND c.to_index <> c.from_index +) +SELECT + b.native_object_index, + b.name, + b.size_bytes AS bundle_size_bytes, + b.resident_size_bytes AS bundle_resident_bytes, + b.is_destroyed, + COUNT(DISTINCT r.ref_index) AS referenced_object_count, + COUNT(DISTINCT o.native_type_name) AS referenced_type_count, + COALESCE(SUM(o.size_bytes), 0) AS referenced_size_bytes, + COALESCE(SUM(o.resident_size_bytes), 0) AS referenced_resident_bytes, + (COUNT(DISTINCT r.ref_index) > 0) AS references_loaded_assets +FROM native_objects b +LEFT JOIN refs r ON r.bundle_index = b.native_object_index +LEFT JOIN native_objects o ON o.native_object_index = r.ref_index +WHERE b.native_type_name = 'AssetBundle' +GROUP BY b.native_object_index, b.name, b.size_bytes, b.resident_size_bytes, b.is_destroyed; """; } diff --git a/Core/Models/DatabaseSchemaInfo.cs b/Core/Models/DatabaseSchemaInfo.cs new file mode 100644 index 0000000..e0194ae --- /dev/null +++ b/Core/Models/DatabaseSchemaInfo.cs @@ -0,0 +1,169 @@ +using System.Data.Common; +using System.Reflection; + +namespace MemorySnapshotDataTools; + +/// +/// What a consumer should do about an exported database whose schema version differs from this build. +/// +public enum SchemaAction +{ + /// Schema matches the current major and minor version; nothing to do. + None, + + /// + /// Same major version, older minor: only views/indexes changed. The tool can upgrade the database + /// in place (re-apply views and indexes) without re-parsing the snapshot. + /// + UpgradeInPlace, + + /// + /// Older (or pre-versioning) major version: tables/columns changed. The database must be re-exported + /// from the original .snap; it cannot be upgraded in place. + /// + ReExport, + + /// Database was written by a newer build of the tool than this one; upgrade the tool. + ToolOutdated, +} + +/// +/// Single source of truth for the exported-database schema version, stored in the schema_meta +/// table (schema_version_major, schema_version_minor) by both writers and checked by the CLI. +/// +/// +/// +/// The version has two parts: +/// +/// +/// +/// Major () — the table/column structure. Bump it for any change that +/// alters tables or columns (add/rename/remove a column or table, or change a column's meaning/units). +/// A database with a lower major requires a re-export from the original snapshot +/// (); it cannot be upgraded in place. +/// +/// +/// Minor () — the set of derived objects (analysis views and +/// indexes) layered on top of the tables. Bump it when you add/change a view or index. A database +/// with the current major but a lower minor can be upgraded in place +/// () by re-running the view/index DDL — no re-export needed. +/// +/// +/// +/// Reset minor to 0 whenever you bump major. Mirror every change in docs/database-schema.md +/// (see the memory-db-sql Claude skill for the checklist). +/// +/// +public static class DatabaseSchemaInfo +{ + /// Current major schema version (table/column structure). A lower major requires re-export. + public const int SchemaMajor = 1; + + /// Current minor schema version (views/indexes). A lower minor can be upgraded in place. + public const int SchemaMinor = 2; + + /// Name used in advisories to refer to the CLI tool. + public const string ToolName = "MemorySnapshotDataTools"; + + /// Version of the MemorySnapshotDataTools build, recorded in schema_meta.msdt_version. + public static string ToolVersion { get; } = + typeof(DatabaseSchemaInfo).Assembly.GetCustomAttribute()?.InformationalVersion + ?? typeof(DatabaseSchemaInfo).Assembly.GetName().Version?.ToString() + ?? "unknown"; + + /// Classifies a database's stored (major, minor) version against this build. + /// Value from schema_meta.schema_version_major, or 0 if the table is absent. + /// Value from schema_meta.schema_version_minor, or 0 if absent. + /// The recommended action. + public static SchemaAction Evaluate(int major, int minor) + { + if (major > SchemaMajor || (major == SchemaMajor && minor > SchemaMinor)) + return SchemaAction.ToolOutdated; + if (major < SchemaMajor) + return SchemaAction.ReExport; // includes major == 0 (pre-versioning databases) + if (minor < SchemaMinor) + return SchemaAction.UpgradeInPlace; // major == SchemaMajor, behind on views/indexes + return SchemaAction.None; + } + + /// + /// Formats a stored (major, minor) version for display, appending a short advisory when it differs + /// from this build. Used by the summary, report, and multi-report outputs. + /// + /// Stored major version (0 = pre-versioning). + /// Stored minor version. + /// A display string such as "1.1" or "1.0 (upgrade available → 1.1)". + public static string DescribeVersion(int major, int minor) + { + if (major == 0) + return "unversioned (re-export recommended)"; + + var current = $"{SchemaMajor}.{SchemaMinor}"; + var stored = $"{major}.{minor}"; + return Evaluate(major, minor) switch + { + SchemaAction.None => stored, + SchemaAction.UpgradeInPlace => $"{stored} (upgrade available → {current})", + SchemaAction.ReExport => $"{stored} (re-export recommended → {current})", + SchemaAction.ToolOutdated => $"{stored} (newer than tool {current})", + _ => stored, + }; + } + + /// + /// Reads schema_meta from an open connection. Returns (0, 0) for pre-versioning databases that + /// lack the table (or the major/minor columns), which treats as + /// . + /// + /// An open DuckDB or SQLite connection. + /// The stored (major, minor) version. + public static (int Major, int Minor) ReadVersion(DbConnection connection) + { + try + { + using var cmd = connection.CreateCommand(); + cmd.CommandText = "SELECT schema_version_major, schema_version_minor FROM schema_meta LIMIT 1"; + using var reader = cmd.ExecuteReader(); + if (reader.Read()) + { + var major = reader.IsDBNull(0) ? 0 : Convert.ToInt32(reader.GetValue(0)); + var minor = reader.IsDBNull(1) ? 0 : Convert.ToInt32(reader.GetValue(1)); + return (major, minor); + } + } + catch (DbException) + { + // Pre-versioning database: no schema_meta table or no major/minor columns. + } + + return (0, 0); + } + + /// Reads snapshot_info.snapshot_path (the original .snap path), or null when unavailable. + /// An open DuckDB or SQLite connection. + /// The source snapshot path, or null. + public static string? ReadSnapshotPath(DbConnection connection) + { + try + { + using var cmd = connection.CreateCommand(); + cmd.CommandText = "SELECT snapshot_path FROM snapshot_info LIMIT 1"; + return cmd.ExecuteScalar() as string; + } + catch (DbException) + { + return null; + } + } + + /// Builds the exact CLI export command a user should run to re-export a database. + /// Path to the source .snap (from snapshot_info.snapshot_path). + /// Destination database path to overwrite. + /// True if the destination is SQLite (adds --destination sqlite). + /// A copy-pasteable command string. + public static string BuildReExportCommand(string snapshotPath, string databasePath, bool sqlite) + { + var dest = sqlite ? " --destination sqlite" : string.Empty; + return $"{ToolName} export \"{snapshotPath}\" \"{databasePath}\"{dest}"; + } +} diff --git a/Core/Models/SnapshotData.cs b/Core/Models/SnapshotData.cs index a5b66a0..a9944e1 100644 --- a/Core/Models/SnapshotData.cs +++ b/Core/Models/SnapshotData.cs @@ -67,4 +67,10 @@ public sealed class SnapshotInfo /// Capture timestamp (UTC ISO-8601), when known. public string RecordDateUtc { get; set; } = string.Empty; + + /// + /// OS memory page size in bytes for the captured device (from SystemMemoryResidentPages_PageSize; + /// e.g. 16384 on iOS arm64, 4096 elsewhere). Zero when unknown (format < 17 / no resident page data). + /// + public ulong PageSize { get; set; } } diff --git a/Core/Parser/SnapshotBridge.cs b/Core/Parser/SnapshotBridge.cs index 0b81a4d..7f6fae3 100644 --- a/Core/Parser/SnapshotBridge.cs +++ b/Core/Parser/SnapshotBridge.cs @@ -56,6 +56,7 @@ public static RawSnapshotData ExtractFromDecoded(DecodedSnapshot decoded, string RecordDateUtc = decoded.RecordDateTicksUtc > 0 ? new DateTime(decoded.RecordDateTicksUtc, DateTimeKind.Utc).ToString("O", CultureInfo.InvariantCulture) : string.Empty, + PageSize = decoded.SystemMemoryResidentPageSize, } }; diff --git a/Core/Report/MultiSnapshotReport/MultiSnapshotHtmlRenderer.cs b/Core/Report/MultiSnapshotReport/MultiSnapshotHtmlRenderer.cs index 3e21f49..f052daf 100644 --- a/Core/Report/MultiSnapshotReport/MultiSnapshotHtmlRenderer.cs +++ b/Core/Report/MultiSnapshotReport/MultiSnapshotHtmlRenderer.cs @@ -149,9 +149,23 @@ private static void AppendSnapshotCell(StringBuilder sb, SnapshotMetricsRow snap sb.Append(EscapeAttr(snap.SnapshotName)); sb.Append("\">"); sb.Append(PlatformIconHtml.Render(snap.PlatformKind, snap.Platform)); - sb.Append(""); + sb.Append("'); sb.Append(Escape(snap.SnapshotName)); - sb.Append(""); + sb.Append(""); + if (!snap.SchemaUpToDate && !string.IsNullOrWhiteSpace(snap.SchemaVersion)) + { + sb.Append(""); + } + sb.Append(""); } private static void AppendCountCell(StringBuilder sb, int col, int count) diff --git a/Core/Report/MultiSnapshotReport/MultiSnapshotReportBuilder.cs b/Core/Report/MultiSnapshotReport/MultiSnapshotReportBuilder.cs index dbad14a..bff1aa5 100644 --- a/Core/Report/MultiSnapshotReport/MultiSnapshotReportBuilder.cs +++ b/Core/Report/MultiSnapshotReport/MultiSnapshotReportBuilder.cs @@ -77,7 +77,7 @@ private static SnapshotMetricsRow QueryDuckDb(string dbPath) var snapshotMeta = QuerySnapshotMetadata(connection, isDuckDb: true); var nativeTypes = QueryNativeTypes(connection, isDuckDb: true); var remapperRoots = QueryRemapperRoots(connection, isDuckDb: true); - return BuildRow(dbPath, nativeTypes, remapperRoots, snapshotMeta); + return BuildRow(dbPath, nativeTypes, remapperRoots, snapshotMeta, DatabaseSchemaInfo.ReadVersion(connection)); } private static SnapshotMetricsRow QuerySqlite(string dbPath) @@ -89,7 +89,7 @@ private static SnapshotMetricsRow QuerySqlite(string dbPath) var snapshotMeta = QuerySnapshotMetadata(connection, isDuckDb: false); var nativeTypes = QueryNativeTypes(connection, isDuckDb: false); var remapperRoots = QueryRemapperRoots(connection, isDuckDb: false); - return BuildRow(dbPath, nativeTypes, remapperRoots, snapshotMeta); + return BuildRow(dbPath, nativeTypes, remapperRoots, snapshotMeta, DatabaseSchemaInfo.ReadVersion(connection)); } private static Dictionary QueryNativeTypes(object connection, bool isDuckDb) @@ -262,7 +262,8 @@ private static SnapshotMetricsRow BuildRow( string dbPath, Dictionary nativeTypes, List remapperRoots, - DbSnapshotMetadata dbMeta) + DbSnapshotMetadata dbMeta, + (int Major, int Minor) schemaVersion) { var fileName = Path.GetFileNameWithoutExtension(dbPath); var meta = EnrichMetadata(dbPath, fileName, dbMeta); @@ -283,6 +284,8 @@ private static SnapshotMetricsRow BuildRow( SortTimestamp = meta.SortTimestamp, NativeTypes = nativeTypes, RemapperRoots = remapperRoots, + SchemaVersion = DatabaseSchemaInfo.DescribeVersion(schemaVersion.Major, schemaVersion.Minor), + SchemaUpToDate = DatabaseSchemaInfo.Evaluate(schemaVersion.Major, schemaVersion.Minor) == SchemaAction.None, }; row = row with { SessionKey = MultiSnapshotSessionGrouper.BuildClusterKey(row) }; diff --git a/Core/Report/MultiSnapshotReport/MultiSnapshotReportModel.cs b/Core/Report/MultiSnapshotReport/MultiSnapshotReportModel.cs index 3a83826..d456a8d 100644 --- a/Core/Report/MultiSnapshotReport/MultiSnapshotReportModel.cs +++ b/Core/Report/MultiSnapshotReport/MultiSnapshotReportModel.cs @@ -59,6 +59,12 @@ public sealed record SnapshotMetricsRow /// Snap file format version when known. public uint SnapFormatVersion { get; init; } + /// Database schema version display (e.g. "1.1", or with an advisory when behind the current build). + public string SchemaVersion { get; init; } = string.Empty; + + /// True when the database schema matches the current build (no upgrade/re-export needed). + public bool SchemaUpToDate { get; init; } = true; + /// Profiler session GUID from snapshot metadata. public uint SessionGuid { get; init; } diff --git a/Core/Report/Queries/ReportSql.cs b/Core/Report/Queries/ReportSql.cs index cac5145..2f448eb 100644 --- a/Core/Report/Queries/ReportSql.cs +++ b/Core/Report/Queries/ReportSql.cs @@ -8,6 +8,9 @@ internal static class ReportSql /// Query for snapshot_info row (path, exported_at_utc, unity_version). public const string SnapshotInfo = "SELECT snapshot_path, exported_at_utc, unity_version FROM snapshot_info;"; + /// Query for the stored schema version. Only run when the schema_meta table exists (use HasColumn first). + public const string SchemaMeta = "SELECT schema_version_major, schema_version_minor FROM schema_meta LIMIT 1;"; + public const string TableCounts = """ SELECT 'native_objects' AS table_name, COUNT(*) AS row_count FROM native_objects UNION ALL SELECT 'managed_objects', COUNT(*) FROM managed_objects diff --git a/Core/Report/ReportBuilder.cs b/Core/Report/ReportBuilder.cs index 9997c2d..f2c64a9 100644 --- a/Core/Report/ReportBuilder.cs +++ b/Core/Report/ReportBuilder.cs @@ -38,6 +38,7 @@ public static ReportModel Build(IReportQueryBackend backend, string title, strin kv["Exported At (UTC)"] = r.Length > 1 ? r[1] : null; kv["Unity Version"] = r.Length > 2 ? r[2] : null; } + kv["Schema Version"] = ReadSchemaVersion(backend); kv["Report Generated"] = generatedAtUtc; var totalRows = countRows.Sum(row => row.Length > 1 && row[1] != null ? Convert.ToInt64(row[1]) : 0); @@ -352,6 +353,23 @@ private static void AddNav(ReportModel model, ReportGroup group) model.NavGroups.Add(navGroup); } + /// + /// Reads the schema version via the backend for display in the Snapshot Info section, returning a + /// re-export advisory for pre-versioning databases that lack schema_meta. Uses the constant + /// query (no external input). + /// + private static string ReadSchemaVersion(IReportQueryBackend backend) + { + if (!backend.HasColumn("schema_meta", "schema_version_major")) + return DatabaseSchemaInfo.DescribeVersion(0, 0); + + var (_, rows) = backend.ExecuteQuery(ReportSql.SchemaMeta); + if (rows.Count == 0 || rows[0].Length < 2 || rows[0][0] is null || rows[0][1] is null) + return DatabaseSchemaInfo.DescribeVersion(0, 0); + + return DatabaseSchemaInfo.DescribeVersion(Convert.ToInt32(rows[0][0]), Convert.ToInt32(rows[0][1])); + } + private static double ToDouble(object? o) { if (o == null) return 0.0; diff --git a/Core/Report/SummaryReportFormatter.cs b/Core/Report/SummaryReportFormatter.cs index 2ade1b4..75a20d9 100644 --- a/Core/Report/SummaryReportFormatter.cs +++ b/Core/Report/SummaryReportFormatter.cs @@ -76,6 +76,9 @@ private static void AppendMetadata(StringBuilder sb, SummaryReport report) if (info.SnapFormatVersion > 0) AppendField(sb, "Snap format", $"v{info.SnapFormatVersion}"); + if (!string.IsNullOrWhiteSpace(report.SchemaVersion)) + AppendField(sb, "Schema", report.SchemaVersion); + if (info.SessionGuid != 0) AppendField(sb, "Session", info.SessionGuid.ToString(CultureInfo.InvariantCulture)); } diff --git a/Core/Report/SummaryReportRunner.cs b/Core/Report/SummaryReportRunner.cs index c3b8d54..6c743c8 100644 --- a/Core/Report/SummaryReportRunner.cs +++ b/Core/Report/SummaryReportRunner.cs @@ -54,6 +54,12 @@ public sealed class SummaryReport /// False when a database lacked a summary_metrics table (export with the current tool). public bool SummaryAvailable { get; init; } = true; + + /// + /// Schema version display (e.g. "1.1", or with a re-export/upgrade advisory when behind). For a + /// snapshot source there is no exported database yet, so this notes the version a fresh export would write. + /// + public string SchemaVersion { get; init; } = string.Empty; } /// @@ -127,6 +133,7 @@ private static SummaryReport FromSnapshot(string snapshotPath, IProgressReporter Metrics = data.SummaryMetrics, UnityObjectCategories = Report.UnityObjectCategories.FromNativeObjects(data.NativeObjects), SummaryAvailable = true, + SchemaVersion = $"{DatabaseSchemaInfo.SchemaMajor}.{DatabaseSchemaInfo.SchemaMinor} (on export)", }; } @@ -144,6 +151,7 @@ private static SummaryReport FromDatabase(string databasePath, string extension) var info = ReadSnapshotInfo(connection); var (metrics, available) = ReadSummaryMetrics(connection); var categories = ReadUnityObjectCategories(connection); + var (major, minor) = DatabaseSchemaInfo.ReadVersion(connection); return new SummaryReport { @@ -153,6 +161,7 @@ private static SummaryReport FromDatabase(string databasePath, string extension) Metrics = metrics, UnityObjectCategories = categories, SummaryAvailable = available, + SchemaVersion = DatabaseSchemaInfo.DescribeVersion(major, minor), }; } diff --git a/README_UNITY.md b/README_UNITY.md index ee75514..5999515 100644 --- a/README_UNITY.md +++ b/README_UNITY.md @@ -1,4 +1,4 @@ -# cse-memory-snapshot-data-tool -[View this project in Unity Internal Developer Portal](https://developer.portal.internal.unity.com/catalog/default/component/cse-memory-snapshot-data-tool)
+# MemorySnapshotDataTools +[View this project in Unity Internal Developer Portal](https://developer.portal.internal.unity.com/catalog/default/component/MemorySnapshotDataTools)
# Converting to public repository Any and all Unity software of any description (including components) (1) whose source is to be made available other than under a Unity source code license or (2) in respect of which a public announcement is to be made concerning its inner workings, may be licensed and released only upon the prior approval of Legal. diff --git a/Tests/DatabaseSchemaInfoTests.cs b/Tests/DatabaseSchemaInfoTests.cs new file mode 100644 index 0000000..dcc9dd6 --- /dev/null +++ b/Tests/DatabaseSchemaInfoTests.cs @@ -0,0 +1,56 @@ +using MemorySnapshotDataTools; +using Xunit; + +namespace MemorySnapshotDataTools.Tests; + +/// +/// Tests for major/minor classification and the re-export command builder. +/// +public sealed class DatabaseSchemaInfoTests +{ + [Fact] + public void Evaluate_CurrentVersion_IsNone() + { + Assert.Equal(SchemaAction.None, + DatabaseSchemaInfo.Evaluate(DatabaseSchemaInfo.SchemaMajor, DatabaseSchemaInfo.SchemaMinor)); + } + + [Fact] + public void Evaluate_SameMajorLowerMinor_IsUpgradeInPlace() + { + // A database one minor behind only needs views/indexes re-applied. + var action = DatabaseSchemaInfo.Evaluate(DatabaseSchemaInfo.SchemaMajor, DatabaseSchemaInfo.SchemaMinor - 1); + Assert.Equal(SchemaAction.UpgradeInPlace, action); + } + + [Fact] + public void Evaluate_LowerMajor_IsReExport() + { + Assert.Equal(SchemaAction.ReExport, + DatabaseSchemaInfo.Evaluate(DatabaseSchemaInfo.SchemaMajor - 1, 99)); + } + + [Fact] + public void Evaluate_PreVersioningDatabase_IsReExport() + { + // Version (0, 0) = no schema_meta table; structure unknown → must re-export. + Assert.Equal(SchemaAction.ReExport, DatabaseSchemaInfo.Evaluate(0, 0)); + } + + [Fact] + public void Evaluate_NewerThanTool_IsToolOutdated() + { + Assert.Equal(SchemaAction.ToolOutdated, + DatabaseSchemaInfo.Evaluate(DatabaseSchemaInfo.SchemaMajor + 1, 0)); + Assert.Equal(SchemaAction.ToolOutdated, + DatabaseSchemaInfo.Evaluate(DatabaseSchemaInfo.SchemaMajor, DatabaseSchemaInfo.SchemaMinor + 1)); + } + + [Theory] + [InlineData(false, "MemorySnapshotDataTools export \"/snaps/a.snap\" \"/dbs/a.duckdb\"")] + [InlineData(true, "MemorySnapshotDataTools export \"/snaps/a.snap\" \"/dbs/a.duckdb\" --destination sqlite")] + public void BuildReExportCommand_FormatsCommand(bool sqlite, string expected) + { + Assert.Equal(expected, DatabaseSchemaInfo.BuildReExportCommand("/snaps/a.snap", "/dbs/a.duckdb", sqlite)); + } +} diff --git a/Tests/ResidentMemoryCalculatorTests.cs b/Tests/ResidentMemoryCalculatorTests.cs index 9ef1913..3f5c40e 100644 --- a/Tests/ResidentMemoryCalculatorTests.cs +++ b/Tests/ResidentMemoryCalculatorTests.cs @@ -39,6 +39,8 @@ public void ExtractFromDecoded_Format17_ExportsResidentSizeBytes() var data = SnapshotBridge.ExtractFromDecoded(decoded, "/test.snap"); var row = Assert.Single(data.NativeObjects); Assert.Equal(pageSize, row.ResidentSizeBytes); + // page_size is carried from the decoded snapshot into snapshot_info. + Assert.Equal(pageSize, data.SnapshotInfo.PageSize); } /// @@ -58,6 +60,8 @@ public void ExtractFromDecoded_Format16_ResidentSizeIsNull() var row = Assert.Single(data.NativeObjects); Assert.Null(row.ResidentSizeBytes); Assert.Single(data.SystemMemoryRegions); + // No resident page data (format < 17) → page_size unknown (0). + Assert.Equal(0UL, data.SnapshotInfo.PageSize); } private static DecodedSnapshot CreateMinimalDecoded(uint formatVersion) diff --git a/Tests/SummaryReportFormatterTests.cs b/Tests/SummaryReportFormatterTests.cs index 42021f8..6fdb28f 100644 --- a/Tests/SummaryReportFormatterTests.cs +++ b/Tests/SummaryReportFormatterTests.cs @@ -57,6 +57,7 @@ private static SummaryReport BuildReport() new UnityObjectCategory { TypeName = "Mesh", Count = 567, AllocatedBytes = 120_000_000 }, ], SummaryAvailable = true, + SchemaVersion = "1.1", }; } @@ -69,6 +70,7 @@ public void Format_RendersMetadataTotalsAndBreakdowns() Assert.Contains("Game_IOS.snap (snapshot)", text, StringComparison.Ordinal); Assert.Contains("iOS (IPhonePlayer)", text, StringComparison.Ordinal); Assert.Contains("v17", text, StringComparison.Ordinal); + Assert.Contains("Schema : 1.1", text, StringComparison.Ordinal); Assert.Contains("Allocated Memory Distribution", text, StringComparison.Ordinal); Assert.Contains("Managed Heap Utilization", text, StringComparison.Ordinal); Assert.Contains("Top Unity Object Categories", text, StringComparison.Ordinal); diff --git a/catalog-info.yaml b/catalog-info.yaml index 9a81476..c78a9e4 100644 --- a/catalog-info.yaml +++ b/catalog-info.yaml @@ -3,9 +3,9 @@ apiVersion: backstage.io/v1alpha1 kind: Component metadata: annotations: - github.com/project-slug: Unity-Technologies/cse-memory-snapshot-data-tool + github.com/project-slug: Unity-Technologies/MemorySnapshotDataTools backstage.io/techdocs-ref: dir:. - name: cse-memory-snapshot-data-tool + name: MemorySnapshotDataTools description: "A command line tool, developed by the CSE Consulting team, offering tools to export Unity Memory Snapshot files to database files and generate HTML reports." labels: costcenter: "1061" diff --git a/docs/database-schema.md b/docs/database-schema.md new file mode 100644 index 0000000..868d5fd --- /dev/null +++ b/docs/database-schema.md @@ -0,0 +1,379 @@ +# Exported database schema + +This is the **canonical reference** for the database that the tool produces from a Unity +memory snapshot (`.snap`). It covers the schema version, every table and column, the analysis +views and macros, and the join keys you need to query native memory correctly. + +The same logical schema is written to both backends: + +- **DuckDB** (`.duckdb`, recommended) — created by [`DuckDbExportDestination`](https://github.com/Unity-Technologies/MemorySnapshotDataTools/blob/main/Core/ExportDestination/DuckDbExportDestination.cs). +- **SQLite** (`.db`) — created by [`SqliteWriter`](https://github.com/Unity-Technologies/MemorySnapshotDataTools/blob/main/Core/ExportDestination/SqliteWriter.cs). + +> **Keep this doc in sync.** Any change to a table, column, view, or macro must be reflected +> here in the same change, and breaking changes must bump the schema version. See +> [Schema version](#schema-version) and the `memory-db-sql` Claude skill for the checklist. + +--- + +## Schema version + +Every exported database records a **two-part version** in the **`schema_meta`** table so tools can +tell whether a database needs a full re-export or just an in-place refresh. + +```sql +SELECT schema_version_major, schema_version_minor, msdt_version, created_at_utc FROM schema_meta; +``` + +The versions are defined once in code by +[`DatabaseSchemaInfo`](https://github.com/Unity-Technologies/MemorySnapshotDataTools/blob/main/Core/Models/DatabaseSchemaInfo.cs) +(`SchemaMajor`, `SchemaMinor`), which both writers stamp and the CLI checks. + +| Part | Meaning | Bump when… | A lower value means | +|------|---------|------------|---------------------| +| **major** (`SchemaMajor`) | Table/column **structure** | You add/rename/remove a table or column, or change a column's meaning/units | **Re-export required** — the data itself must be re-extracted from the `.snap` | +| **minor** (`SchemaMinor`) | Derived **views and indexes** | You add/change a view or index | **Upgradeable in place** — re-run the view/index DDL, no re-export | + +Reset minor to 0 whenever you bump major. + +| Version | Changes | +|---------|---------| +| 1.0 | First versioned schema: `schema_meta`, `snapshot_info.page_size`, region analysis views/macros. | +| 1.1 | Added `v_connection_edges` and `v_assetbundle_utilization` views (minor — upgradeable in place). | +| 1.2 | Reformulated `v_connection_edges` joins (kind check folded into the join key) so DuckDB hash-joins instead of nested-loop — `SELECT … WHERE from_type=…` drops from minutes to sub-second (minor). | + +**What the CLI does.** Before a read command (`report`, `summary`, `validate`), +`DatabaseSchemaInfo.Evaluate(major, minor)` classifies the database and the CLI acts: + +| Classification | Meaning | CLI behavior | +|----------------|---------|--------------| +| `None` | Current | Proceeds silently. | +| `UpgradeInPlace` | Same major, older minor | Offers to upgrade in place (interactive prompt; non-interactive prints `… upgrade ""`). | +| `ReExport` | Older/pre-versioning major | Prints the exact `export` command; if the source `.snap` still exists at `snapshot_info.snapshot_path`, offers to re-export it now. | +| `ToolOutdated` | DB newer than the tool | Warns to update the tool. | + +Run an in-place minor upgrade explicitly with: + +```bash +MemorySnapshotDataTools upgrade +``` + +This re-applies indexes and views (`DatabaseMaintenance.UpgradeInPlace`) and bumps the stored minor +version. It refuses major-version gaps and tells you to re-export instead. Non-interactive sessions +(stdin redirected) never auto-modify a database — they only print the advisory and command. + +The stored version is also **displayed in output**: `summary` prints a `Schema` field, the HTML +`report` shows a *Schema Version* row in Snapshot Info, and `multi-report` shows it per database +(as a tooltip, with a ⚠ marker when behind) — each via `DatabaseSchemaInfo.DescribeVersion`. + +--- + +## Tables + +Column types shown are DuckDB; SQLite uses `INTEGER`/`TEXT` equivalents (DuckDB `BIGINT` → SQLite +`INTEGER`, `VARCHAR` → `TEXT`, `BOOLEAN` → `INTEGER` 0/1). Sizes are **bytes** unless the column +name says otherwise. + +### `schema_meta` +One row. The schema version stamp. + +| Column | Type | Notes | +|--------|------|-------| +| `schema_version_major` | INTEGER | `DatabaseSchemaInfo.SchemaMajor` at export time. Lower → re-export. | +| `schema_version_minor` | INTEGER | `DatabaseSchemaInfo.SchemaMinor`. Lower (same major) → upgrade in place. | +| `msdt_version` | VARCHAR | MemorySnapshotDataTools build version. | +| `created_at_utc` | VARCHAR | Export timestamp (ISO-8601 UTC). | + +### `snapshot_info` +One row. Provenance of the capture. + +| Column | Type | Notes | +|--------|------|-------| +| `snapshot_path` | VARCHAR | Source `.snap` path. | +| `exported_at_utc` | VARCHAR | When the export ran (ISO-8601 UTC). | +| `unity_version` | VARCHAR | Unity version, or `format:` fallback. | +| `snap_format_version` | INTEGER | Snapshot format version (resident data requires ≥ 17). | +| `session_guid` | BIGINT | Profiler session GUID, or NULL. | +| `product_name` | VARCHAR | Project/product name, or NULL. | +| `platform` | VARCHAR | Runtime platform (e.g. `IPhonePlayer`, `OSXPlayer`), or NULL. | +| `record_date_utc` | VARCHAR | Capture timestamp, when known. | +| `page_size` | BIGINT | OS page size of the captured device (e.g. 16384 iOS arm64, 4096 elsewhere); NULL when unknown (format < 17). Used by `region_page_density`. | + +### `native_objects` +High-level Unity objects (textures, meshes, GameObjects…). One row per native object. + +| Column | Type | Notes | +|--------|------|-------| +| `native_object_index` | INTEGER PK | Zero-based index; target of `connections` with `kind='native_object'`. | +| `instance_id` | VARCHAR | Unity instance id (string). | +| `name` | VARCHAR | Object name. | +| `size_bytes` | BIGINT | Object's own native size. | +| `native_object_address` | BIGINT | Object pointer. **Not** an allocation address — see [gotchas](#gotchas). | +| `root_reference_id` | BIGINT | → `native_roots.root_id` (−1 when unknown). | +| `type_index` | INTEGER | Index into native type names. | +| `native_type_name` | VARCHAR | Resolved type (e.g. `Texture2D`). | +| `is_destroyed` | BOOLEAN | Marked destroyed but still resident. | +| `resident_size_bytes` | BIGINT | Resident bytes for the object's root (format ≥ 17), else NULL. | + +### `managed_objects` +Managed (C#) heap objects. One row per managed object. + +| Column | Type | Notes | +|--------|------|-------| +| `managed_object_index` | INTEGER PK | Target of `connections` with `kind='managed_object'`. | +| `address` | BIGINT | Managed heap address. | +| `size_bytes` | BIGINT | Size. | +| `type_index` | INTEGER | Index into managed type descriptions. | +| `managed_type_name` | VARCHAR | Resolved managed type. | +| `native_object_index` | BIGINT | → `native_objects.native_object_index`, or NULL (orphaned wrapper). | + +### `connections` +Directed edges of the object reference graph. + +| Column | Type | Notes | +|--------|------|-------| +| `from_kind` | VARCHAR | `native_object` or `managed_object`. | +| `from_index` | BIGINT | Index into the corresponding object table. | +| `to_kind` | VARCHAR | `native_object` or `managed_object`. | +| `to_index` | BIGINT | Index into the corresponding object table. | +| `connection_type` | VARCHAR | e.g. `native_connection`, `GCHandle`. | + +### `native_roots` +Unity memory areas. Backbone for attribution. One row per root; `root_id` is **unique**. + +| Column | Type | Notes | +|--------|------|-------| +| `root_index` | INTEGER PK | Zero-based index. | +| `root_id` | BIGINT | Join key for `native_objects.root_reference_id` and `native_allocations.root_reference_id`. | +| `area_name` | VARCHAR | Subsystem grouping (`System`, `Managers`, `Objects`, `SerializedFile`, `Rendering`…). | +| `object_name` | VARCHAR | Root's object name. | +| `accumulated_size_bytes` | BIGINT | Committed bytes attributed to this root. | +| `resident_size_bytes` | BIGINT | Resident bytes (format ≥ 17), else NULL. | + +### `memory_regions` +Unity's **internal allocator** buckets (`ALLOC_DEFAULT`, `ALLOC_GFX`, TLSF blocks, temp/stack +allocators). **Not** OS regions — see [the two region tables](#the-two-region-tables). + +| Column | Type | Notes | +|--------|------|-------| +| `region_index` | INTEGER PK | Target of `native_allocations.memory_region_index`. | +| `address_base` | BIGINT | Allocator block base. | +| `address_size` | BIGINT | Allocator block reserve. **Often 0** for grouping allocators (e.g. `ALLOC_DEFAULT`) — do not use as a container bound. | +| `name` | VARCHAR | Allocator name. | +| `parent_region_index` | INTEGER | → `memory_regions.region_index` (hierarchy), or NULL. | +| `first_allocation_index` | INTEGER | → `native_allocations.allocation_index`, or NULL. | +| `num_allocations` | INTEGER | Allocation count in this bucket. | + +### `native_allocations` +Low-level allocations Unity's allocators requested. One row per allocation. + +| Column | Type | Notes | +|--------|------|-------| +| `allocation_index` | INTEGER PK | Zero-based index. | +| `address` | BIGINT | Allocation address. Falls inside a `system_memory_regions` range. | +| `size_bytes` | BIGINT | Payload size (live bytes). | +| `overhead_size_bytes` | BIGINT | Allocator overhead. | +| `padding_size_bytes` | BIGINT | Alignment padding. | +| `memory_region_index` | INTEGER | → `memory_regions.region_index` (Unity allocator), or NULL. | +| `root_reference_id` | BIGINT | → `native_roots.root_id`, or NULL. | + +### `system_memory_regions` +OS / virtual-memory regions — what `vmmap` reports (`MALLOC_NANO`, `MALLOC_LARGE`, dyld shared +cache, `IOACCELERATOR`, framework/dylib mappings…). The ground truth for process RAM. **No foreign +keys** — bridge to allocations by address range only. + +| Column | Type | Notes | +|--------|------|-------| +| `region_index` | INTEGER PK | Zero-based index. | +| `address` | BIGINT | Region base. | +| `size_bytes` | BIGINT | Committed / virtual size. | +| `resident_bytes` | BIGINT | Physical RAM resident. | +| `type` | INTEGER | Region type code (frequently `0` for all rows on iOS — use `name`). | +| `name` | VARCHAR | Region name. | + +### `summary_metrics` +MemoryProfiler "Summary" page breakdown (Allocated Memory Distribution + Managed Heap Utilization). + +| Column | Type | Notes | +|--------|------|-------| +| `metric_group` | VARCHAR | Group label. | +| `category` | VARCHAR | Category label. | +| `committed_bytes` | BIGINT | Committed bytes. | +| `resident_bytes` | BIGINT | Resident bytes. | +| `resident_available` | INTEGER | 1 if resident data is available, else 0. | + +--- + +## Views and macros + +These remove the repetitive joins needed to analyze native memory. **Views exist on both +backends; macros are DuckDB-only** (SQLite has no table macros — query the view directly, see +[SQLite differences](#sqlite-differences)). + +### `v_allocation_enriched` (view) +One row per allocation, joined to its Unity allocator bucket, the **OS region containing its +address**, its root, and its owning object. + +Columns: `allocation_index`, `address`, `size_bytes`, `overhead_size_bytes`, `padding_size_bytes`, +`memory_region_index`, `unity_region_name`, `system_region_index`, `system_region_name`, +`root_reference_id`, `area_name`, `root_object_name`, `native_object_index`, `native_type_name`, +`object_name`. + +`system_region_*` is NULL for the rare allocation that falls in a gap between OS regions. DuckDB +resolves the containing region with an `ASOF` join; SQLite with an equivalent correlated subquery +(nearest region whose range covers the address). + +### `v_system_region_summary` (view) +One row per OS region: committed vs resident vs Unity-tracked live, plus how much of the region's +resident RAM Unity explains. The region overview. + +Columns: `region_index`, `name`, `addr_hex`, `committed_bytes`, `resident_bytes`, `pct_resident`, +`unity_alloc_count`, `unity_live_bytes`, `unity_live_pct_of_resident`. + +```sql +-- Where is resident RAM, and how much does Unity account for? +SELECT name, committed_bytes, resident_bytes, pct_resident, unity_live_pct_of_resident +FROM v_system_region_summary ORDER BY resident_bytes DESC; +``` + +### `v_region_owner_breakdown` (view) +Within each OS region, who owns the allocations — by native type when the allocation's root has an +object, otherwise by area name. + +Columns: `system_region_name`, `owner`, `alloc_count`, `live_bytes`. + +```sql +SELECT * FROM v_region_owner_breakdown +WHERE system_region_name = 'MALLOC_NANO' ORDER BY alloc_count DESC; +``` + +### `v_connection_edges` (view) +The object reference graph with **both endpoints resolved** to type (and native name) — so you don't +re-join `connections` to `native_objects`/`managed_objects` every time. One row per edge; meant to be +**filtered** (the table is large), not selected wholesale. + +Columns: `connection_type`, `from_kind`, `from_index`, `from_type`, `from_name`, `to_kind`, +`to_index`, `to_type`, `to_name`. (`*_name` is native-only; managed objects have no name.) + +```sql +-- What does a specific object reference, by target type? +SELECT to_type, COUNT(*) FROM v_connection_edges +WHERE from_type = 'AssetBundle' AND to_kind = 'native_object' GROUP BY 1 ORDER BY 2 DESC; +``` + +### `v_assetbundle_utilization` (view) +One row per `AssetBundle` native object measuring whether it actually keeps loaded assets resident. +"References" counts outbound `native_connection` edges to **other** native objects (excluding the +bundle's self-reference and its own managed wrappers — no magic numbers). An *empty* bundle +(`references_loaded_assets = false`) is loaded but holds nothing live — usually reclaimable overhead. + +Columns: `native_object_index`, `name`, `bundle_size_bytes`, `bundle_resident_bytes`, `is_destroyed`, +`referenced_object_count`, `referenced_type_count`, `referenced_size_bytes`, +`referenced_resident_bytes`, `references_loaded_assets`. + +```sql +-- Utilization at a glance: empty vs. asset-holding bundles, and empty-bundle overhead. +SELECT references_loaded_assets, + COUNT(*) AS bundles, + ROUND(SUM(bundle_size_bytes) / 1048576.0, 1) AS bundle_mb +FROM v_assetbundle_utilization GROUP BY 1; + +-- Which bundles reference the most other loaded objects (and how much do they pull in)? +SELECT name, referenced_object_count, referenced_type_count, + ROUND(referenced_size_bytes / 1048576.0, 2) AS referenced_mb +FROM v_assetbundle_utilization +WHERE references_loaded_assets ORDER BY referenced_object_count DESC; +``` + +> `referenced_size_bytes` is the **own size of directly-referenced** native objects. Unity records +> flattened bundle→contained-object edges, so this is comprehensive for bundles; it is not transitive +> retained size, and the same shared asset may be counted under more than one bundle. + +### `region_allocations(region_name)` (DuckDB macro) +All `v_allocation_enriched` rows for one OS region: `SELECT * FROM region_allocations('MALLOC_NANO');` + +### `region_page_density(region_name)` (DuckDB macro) +Page-touch / fill analysis for a region, using `snapshot_info.page_size` (fallback 16384). + +Columns: `touched_pages`, `touched_bytes`, `avg_live_bytes_per_page`, `avg_fill_pct`, +`avg_allocs_per_page`. + +```sql +SELECT * FROM region_page_density('MALLOC_NANO'); +``` + +> **Scope.** This metric is designed for **small-allocation zones** (`MALLOC_NANO`, `MALLOC_TINY`, +> `MALLOC_SMALL`) where each allocation fits within a page. It attributes every allocation to its +> *starting* page, so on a region containing allocations larger than a page (e.g. a dylib mapping or +> `MALLOC_LARGE`) `avg_fill_pct` can exceed 100% — a signal the model does not apply there, not a bug. + +For a custom page size or page-spanning regions, query directly instead: + +```sql +SELECT a.address >> 14 AS page_16k, COUNT(*), SUM(a.size_bytes) +FROM native_allocations a +JOIN system_memory_regions s + ON s.name = 'MALLOC_NANO' AND a.address >= s.address AND a.address < s.address + s.size_bytes +GROUP BY 1; +``` + +--- + +## Relationships and join keys + +``` +native_objects.root_reference_id ──┐ + ├──► native_roots.root_id (root_id UNIQUE; root↔object is 1:1 for area 'Objects') +native_allocations.root_reference_id ┘ + +native_allocations.memory_region_index ──► memory_regions.region_index (Unity allocator bucket) +memory_regions.parent_region_index ──► memory_regions.region_index (hierarchy) +memory_regions.first_allocation_index ──► native_allocations.allocation_index + +managed_objects.native_object_index ──► native_objects.native_object_index (C# wrapper ↔ native object) +connections.(from_kind,from_index) / (to_kind,to_index) — object reference graph + +system_memory_regions — NO foreign key. Bridge by ADDRESS RANGE only + (a.address >= s.address AND a.address < s.address + s.size_bytes). +``` + +### The two region tables + +| Table | What | Size field | +|-------|------|-----------| +| `memory_regions` | Unity's **internal allocator** buckets. `native_allocations.memory_region_index` points here. | `address_size` (often 0 — not a bound) | +| `system_memory_regions` | **OS virtual-memory** regions (vmmap). The RAM ground truth. No FK; bridge by address range. | `size_bytes` (committed), `resident_bytes` | + +These overlap the same address space but are not linked by a key. Joining an allocation to its OS +region is exactly what `v_allocation_enriched` does for you. + +### Gotchas + +- **`native_object_address` ≠ `native_allocations.address`.** Objects and allocations are different + layers; an object address never matches an allocation address. Bridge them through a shared + **root** (`root_reference_id` → `root_id`), not by address. +- **Don't use `memory_regions.address_size` as a denominator.** Grouping allocators like + `ALLOC_DEFAULT` report size 0 while holding most allocations. Use the allocation payload sum. +- **`system_memory_regions.type` is uniformly 0 on iOS.** Group/filter by `name`. +- **Resident data needs format ≥ 17.** Below that, `resident_size_bytes` and `page_size` are NULL. +- **The four "sizes" don't reconcile.** `system_memory_regions` (whole process VM), `native_roots` + (Unity subsystem attribution), `native_allocations` (Unity allocator requests), and + `native_objects` (high-level assets) are overlapping lenses, not a partition — never sum them. + +--- + +## SQLite differences + +- **Views**: identical names/columns; `v_allocation_enriched` resolves the OS region with a + correlated subquery instead of `ASOF`. +- **Macros**: `region_allocations` and `region_page_density` are **not** available (SQLite has no + table macros). Use the underlying views, or the direct query shown above. +- **Open read-only** for analysis: `Data Source=;Mode=ReadOnly` (SQLite), + `Data Source=;ACCESS_MODE=READ_ONLY` (DuckDB). See [SQL safety](sql-safety.md). + +--- + +## See also + +- [SQL safety](sql-safety.md) — never build SQL from external data; parameterize. +- [Snap file format](snap-file-format.md) — where these tables come from in the `.snap` binary. +- [Architecture and design](design.md) — the export pipeline. diff --git a/docs/index.md b/docs/index.md index 82d314c..7cfcfca 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,4 +1,4 @@ -# TechDocs space for cse-memory-snapshot-data-tool +# TechDocs space for MemorySnapshotDataTools !!! note This repo has been pre-populated by RepoDB with a skeleton to get your documentation started. More information on TechDocs used at Unity can be found [here](http://go/docs-techdocs) @@ -9,4 +9,4 @@ To update this file, see `docs/index.md` in your repo. ### If you are viewing this page in GitHub -View this documentation rendered in Unity Internal Developer Portal [here](https://developer.portal.internal.unity.com/catalog/default/component/cse-memory-snapshot-data-tool/docs). \ No newline at end of file +View this documentation rendered in Unity Internal Developer Portal [here](https://developer.portal.internal.unity.com/catalog/default/component/MemorySnapshotDataTools/docs). \ No newline at end of file diff --git a/docs/sql-safety.md b/docs/sql-safety.md index ff54677..67aef82 100644 --- a/docs/sql-safety.md +++ b/docs/sql-safety.md @@ -3,7 +3,7 @@ This tool is built around composing and executing SQL against DuckDB and SQLite databases. This page is the canonical reference for writing query code safely. It is written for both human contributors and for Claude (the project rule lives in -[`CLAUDE.md`](https://github.com/Unity-Technologies/cse-memory-snapshot-data-tool/blob/main/CLAUDE.md) +[`CLAUDE.md`](https://github.com/Unity-Technologies/MemorySnapshotDataTools/blob/main/CLAUDE.md) and points here). ## The one rule diff --git a/mkdocs.yml b/mkdocs.yml index 3b99230..3950aaf 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -1,7 +1,7 @@ # This file was added for TechDocs in Unity Internal Developer Portal http://go/docs-techdocs # For more information on the configuration below see https://www.mkdocs.org/user-guide/configuration/ -site_name: cse-memory-snapshot-data-tool -repo_url: https://github.com/Unity-Technologies/cse-memory-snapshot-data-tool +site_name: MemorySnapshotDataTools +repo_url: https://github.com/Unity-Technologies/MemorySnapshotDataTools repo_name: GitHub docs_dir: docs edit_uri: "edit/main/docs/" @@ -10,9 +10,10 @@ plugins: - techdocs-core nav: - About TechDocs: "index.md" - - Introduction to cse-memory-snapshot-data-tool: "intro.md" + - Introduction to MemorySnapshotDataTools: "intro.md" - Architecture and design: "design.md" - Snap File Format: "snap-file-format.md" + - Database schema: "database-schema.md" - SQL safety: "sql-safety.md" - Installing for local development: "installation.md" - Troubleshooting: "troubleshooting.md"