Skip to content

spec(manipulation/memory): object memory tracker on memory2, propose …#2067

Draft
jhengyilin wants to merge 1 commit into
dimensionalOS:mainfrom
jhengyilin:feature/mainpulation_with_memory_jhengyi
Draft

spec(manipulation/memory): object memory tracker on memory2, propose …#2067
jhengyilin wants to merge 1 commit into
dimensionalOS:mainfrom
jhengyilin:feature/mainpulation_with_memory_jhengyi

Conversation

@jhengyilin
Copy link
Copy Markdown

@jhengyilin jhengyilin commented May 13, 2026

Spec for the object memory tracker on memory2 (Refs #1893)

flowchart TB
    camera([camera])
    perception["ObjectSceneRegistrationModule<br/><i>detection only — no ObjectDB</i>"]
    tracker["ObjectMemoryTracker<br/><i>(MemoryModule)</i>"]
    manipulation["PickAndPlaceModule<br/><i>manipulation — no API change</i>"]
    skills["@skill recall(name)<br/><i>cross-session memory</i>"]

    camera --> perception
    perception -->|"raw detections<br/>list[DetObject]"| tracker
    tracker -->|"tracked_objects (port)<br/>list[DetObject]"| manipulation

    subgraph m2 ["memory2 — source of truth"]
        direction TB
        obs[("object_observations<br/><i>dense — log</i>")]
        events[("object_events<br/><i>sparse — lifecycle</i>")]
    end

    tracker == "append + inline cache update" ==> obs
    tracker == "append + inline cache update" ==> events

    obs -. "sync .to_list() replay on start()" .-> tracker
    events -. "sync .to_list() replay on start()" .-> tracker

    events --> skills

    classDef stream fill:#fef3c7,stroke:#d97706,stroke-width:2px
    classDef module fill:#dbeafe,stroke:#2563eb,stroke-width:2px
    classDef external fill:#f3f4f6,stroke:#6b7280,stroke-width:1px
    class obs,events stream
    class perception,tracker,manipulation module
    class camera,skills external
Loading

How it works — Propose architecture workflow

t (s) Event Tracker's response
0 First scan sees "cup" at (0.4, 0.1, 0.9) No match → APPEARED event + observation. confidence = 1.0
2–10 More scans of the same cup Tight-spatial match → observation each time. After 6 detections → PROMOTED. Cup is in tracked_objects.
14 Hand covers camera, no detection confidence ≈ 0.77 — still confident
20 Hand still there confidence ≈ 0.51 — borderline
24 Hand still there confidence ≈ 0.41tentative. Out of snapshot, still match-eligible.
25 Hand moves, scan sees cup again Tight-spatial match → observation. Confidence resets to 1.0. No duplicate identity.
60 User moves cup to (1.0, 0.5, 0.9) — 70cm away Tight match fails (>0.2m). Wider-radius voted-name match (drift)MOVED event. No phantom at old position.
120 User takes the cup away After ~45s of decay, confidence < 0.1LOST event. Cup moves to recently-lost bucket.
200 Process crashes and restarts Sync replay (stream.to_list() over both streams) rebuilds the cache from memory2 before the tracker accepts new detections. No bespoke load code.
205 Agent calls recall("cup") Query: events.tags(name="cup").last() → returns LOST event. Process answers about a cup it never saw in its own lifetime.

Why this design:

  • Two streams in memory2object_observations (every matched detection — the evidence) and object_events (lifecycle transitions APPEARED / PROMOTED / LABEL_CHANGED / MOVED / LOST — the story).

  • Continuous belief over binary present/absent — one tunable (time_constant_s = 15) controls how forgiving the tracker is of occlusion. The tentative band (0.2 – 0.5) keeps mid-confidence objects match-eligible, so a single missed scan can't create a duplicate identity.

  • Memory2 holds the persistent record — object history lives in the streams across sessions. The tracker reads from memory2 on startup, so cross-session memory comes for free.

  • No change to manipulation's APItracked_objects publishes list[Object] (same type used today). PickAndPlaceModule work without modification.

Solves the two issues we discussed:

  1. Stable labels across YOLO flickering label (vote across detections instead of latest-frame-detections-wins)
  2. Memory between actions (soft persistence + re-acquire + survives restart)

…architecture design for integrating memory2 with current manipulation stack
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant