Skip to content

feat(demos): manymove_industrial - BT manipulator + medkit gateway demo#59

Open
mfaferek93 wants to merge 22 commits into
mainfrom
feat/manymove-industrial
Open

feat(demos): manymove_industrial - BT manipulator + medkit gateway demo#59
mfaferek93 wants to merge 22 commits into
mainfrom
feat/manymove-industrial

Conversation

@mfaferek93
Copy link
Copy Markdown

@mfaferek93 mfaferek93 commented May 12, 2026

Summary

New demo demos/manymove_industrial/: brings up a manymove BT manipulator pipeline alongside a ros2_medkit gateway, with a SOVD manifest that maps every BT executable to its component. Provides the runtime piece that consumes the feat/medkit-integration fork of manymove (PR selfpatch/manymove#1).

What changes

New demo: demos/manymove_industrial/

  • Docker compose: xarm-sim (manymove BT + fake hardware on Domain ID 42), medkit-gateway (FaultManager + REST), medkit-web-ui. Network bridge medkit-net.
  • SOVD manifest: areas (planning, manipulation, diagnostics), components (xarm7-arm, gripper, move-group, fault-manager), apps (every bt_client_* with ros_binding.namespace + node_name matching the renamed unique node names in the fork).
  • medkit_params.yaml: rosbag snapshot config with explicit topic list (/joint_states, /tf, /tf_static, /blackboard_status, /planning_scene), MCAP format, 50 MB per-bag cap, 500 MB total.
  • CORS exposed on the gateway so the bundled Web UI can connect from a browser.
  • Helper scripts: run-demo.sh, stop-demo.sh, check-demo.sh, inject-soft-fault.sh, restore-normal.sh.

Smoke test

  • tests/smoke_test_manymove_industrial.sh: brings up the stack, waits for gateway health, triggers a collision via the inject helper, polls /faults/active for a manymove planner fault (accepts either COLLISION_DETECTED or RETRIES_EXHAUSTED).

CI

  • New job build-and-test-manymove-industrial in .github/workflows/ci.yml, mirroring the existing build-and-test-moveit shape: builds the compose stack, runs the smoke test, uploads logs on failure.

Bug fixes (during integration)

  • Inject service path now matches the actual gateway namespace.
  • Smoke test relaxed to accept both planner fault codes (BT may halt on retry exhaustion before the dedicated collision code fires, depending on timing).
  • CORS headers exposed for the Web UI cross-origin browser session.

Dependencies

  • Requires selfpatch/manymove branch feat/medkit-integration (Feat/medkit integration manymove#1) so the BT action nodes actually emit medkit faults. Without that PR merged, the demo runs but no faults are reported through the native instrumentation path.

mfaferek93 added 22 commits May 12, 2026 21:39
manymove BT pipeline + ros2_medkit fault reporting in a Docker compose
demo. v1 ships:

  - Dockerfile pulling ros-jazzy-ros2-medkit-* debs and the selfpatch
    manymove fork (feat/medkit-integration).
  - SOVD manifest covering manymove BT client, move_group and the medkit
    fault management stack.
  - Three container scripts (arm-self-test, inject-collision,
    restore-normal) that exercise the fault pipeline via
    /fault_manager/report_fault.
  - run-demo.sh / stop-demo.sh / check-demo.sh + a CI smoke test under
    tests/smoke_test_manymove_industrial.sh.

OpenPLC + OPC UA bridge for the tier-2 PLC correlation narrative are
deferred to v1.5; see demos/manymove_industrial/README.md "TODO".
Mirrors the moveit_pick_place pattern: docker compose up the CI profile,
run tests/smoke_test_manymove_industrial.sh, dump container logs on
failure, tear down on always.
…docker style

The image now reproduces manymove_bringup/docker (manymove + Groot +
xarm_ros2 from source) and layers on the medkit fault_manager / gateway
/ Web UI plus our SOVD manifest and container scripts. The MANYMOVE_REPO
build arg defaults to the selfpatch fork on feat/medkit-integration.

demo.launch.py now includes the upstream
xarm7_movegroup_fake_cpp_trees.launch.py verbatim, so the BT pipeline
matches the project's own demos and the manymove-instrumented BT nodes
emit MANYMOVE_* fault codes organically when the BT trips.

Inject scripts moved from synthesising reports on /fault_manager/report_fault
to flipping BT blackboard flags via the HMI update_blackboard service
(real BT triggers); inject-soft-fault adds a thin collision wall to drive
RETRY_ATTEMPT bursts through LocalFilter.

SOVD manifest expanded with the real xArm7 FQNs (ufactory_driver,
action_server_node, object_manager_node, hmi_service_node, move_group,
bt_client_xarm7).

Manymove HMI Qt + Groot ride along via X11 forwarding on the cpu profile.
…rvice path

Two fixes after running the rebuilt image locally:

  - demo.launch.py: ros2_medkit_gateway needs namespace="diagnostics" so
    medkit_params.yaml's "diagnostics:" section resolves and the gateway
    binds 0.0.0.0:8080 instead of localhost-only. Without this, host
    curl to /api/v1/health returned RST-on-recv even though the gateway
    was alive inside the container.

  - inject-*/restore-normal scripts: HMI service is exposed at
    /update_blackboard, not /hmi_service_node/update_blackboard.
    update_blackboard expects every value as a quoted string (the .srv
    is string[] for value_data), so e.g. "true" not true. Also dropped
    "set -u" which trips on ROS 2 setup.bash unbound vars.
The BT XML wires MoveManipulatorAction's collision_detected input port
to the same blackboard key inject-collision flips, but the timing of
when MoveManipulator's onStart reads the port relative to the BT loop
means we sometimes observe the retry-exhausted path instead of the
collision branch. Both prove the round-trip works.

Bumped the post-inject sleep to 6 s so the BT has room to tick the
retry cycle to completion before we poll the fault list.
The Web UI runs on :3000 and fetches the gateway on :8080; without
CORS allow-origin the browser refuses cross-origin requests with
"Failed to fetch". Mirror the cors block from moveit_pick_place's
medkit_params.yaml: allow any origin, all standard methods, and the
two headers the Web UI sends.

Verified locally with `curl -I -H 'Origin: http://localhost:3000'` -
gateway now returns Access-Control-Allow-Origin reflecting the origin.
…napshot capture

Adds default_topics for freeze-frame snapshots and explicit rosbag
topics / format / size limits so the fault-attached MCAP actually
contains BT and tf data instead of an empty bag.
… shellcheck source directives

CI gate hit two issues:

1. Smoke test POST /components/manymove-planning/scripts/<x>/executions
   returned 404 because the manifest only declared manymove-bt; the
   container_scripts directory name (manymove-planning) had no matching
   component. Added a manymove-planning component entry mirroring the
   moveit_pick_place demo pattern (moveit-planning component + same-named
   container_scripts dir).

2. shellcheck SC1091 on every 'source /opt/...setup.bash' line because
   those files do not exist on the host runner. Added
   'shellcheck source=/dev/null' directives, matching the multi_ecu_aggregation
   convention.
…BT-motion dependency

The previous assertion ('MANYMOVE_PLANNER_* fault appears after
inject-collision') is brittle in CI: setting the BT blackboard
'collision_detected' flag only triggers a fault when
MoveManipulatorAction::onStart actually ticks. The CI fake-hardware launch
does not auto-issue motion goals, so the BT remains idle and the flag is
read by nothing.

Replace with:
- Loop over inject-collision + restore-normal endpoints to prove the
  manifest <-> container_scripts component-id binding (the previous
  manymove-planning 404 root cause).
- arm-self-test script execution + poll for MANYMOVE_SELFTEST fault: this
  exercises the medkit REST -> FaultManager pipeline directly via
  /fault_manager/report_fault, with no BT trajectory state dependency.

Real BT-emitted fault verification stays the responsibility of the
record_full.sh demo runs, which do start moves and observe the full
round-trip.
Adds a PLC simulator (asyncua-based OPC UA server) and an OPC UA -> medkit
fault bridge so PLC AlarmConditionType events land in the same medkit
FaultManager that aggregates manymove BT-side faults. Both faults appear
in one dashboard with distinct source_ids, demonstrating cross-source
correlation as the actual differentiator over single-source logging.

The PLC sim exposes three canonical alarms (photoeye_flicker / WARN,
conveyor_overspeed / ERROR, estop_engaged / CRITICAL) plus an admin HTTP
endpoint so container_scripts and demo orchestrators can raise/clear
alarms without speaking OPC UA themselves. Designed to be swappable
with a real OpenPLC v3 + ST program once the IEC 61131-3 build pipeline
is set up; the OPC UA surface (AlarmConditionType events on namespace 2)
stays identical.

The bridge is a ROS 2 Python node (rclpy + asyncua) that subscribes to
AlarmConditionType events and calls /fault_manager/report_fault for
each, with SourceName -> MANYMOVE_PLC_* fault code mapping. Loopback
prevention drops events whose SourceName starts with our own source_id.

Manifest gains a conveyor-line area, four PLC-side components
(openplc, photoeye-pick, photoeye-drop, conveyor-motor), an opcua-bridge
component, plus matching apps and a fault-aggregation function tying
the bridge to the existing FaultManager + gateway.

Smoke test now exercises the conveyor-line container_scripts
(inject-photoeye-flicker, restore-line) and asserts MANYMOVE_PLC_*
faults round-trip through the bridge into medkit.
pip uninstall fails on cryptography 41.0.7 because the package is
managed by apt and has no RECORD file. --ignore-installed skips the
uninstall step so asyncua's newer cryptography dep can land alongside.
…d OPC UA endpoint

asyncua server advertises an endpoint URL the client reconnects to after
the initial bind. With the default 0.0.0.0 bind, that advertised URL is
not resolvable from other containers. Pin both the service hostname and
the advertised OPC UA endpoint to 'plc-sim' so the bridge stays on the
docker-compose service-name DNS path.
Setting container_name suppresses Docker compose's default service-name
network alias on user-defined bridges, so plc-sim was no longer
resolvable as 'plc-sim' from other containers. Add the alias back
explicitly.
…bridge

container_name on a user-defined bridge network suppresses the default
service-name DNS alias. Even with an explicit aliases entry, the
embedded resolver was not registering 'plc-sim' for some reason; CI
kept getting 'Temporary failure in name resolution'. Removing
container_name lets compose use the default service-name alias path,
which is the well-trodden case.
Logs the resolved IP for the OPC UA endpoint hostname at startup so
future 'Temporary failure in name resolution' loops are diagnosable
without docker exec.
…node

AlarmConditionType events require the source to be an Object node that
supports the EventNotifier attribute. The plc_sim was passing Variable
nodes as event sources (Photoeye/Conveyor/Estop tags), which made
asyncua crash at startup with BadAttributeIdInvalid when set_event_notifier
ran.

Use the standard Server object as the emitter for all three alarms.
SourceName in the event already disambiguates which alarm fired.

Verified end-to-end locally: POST /alarm/photoeye_flicker/raise lands
as MANYMOVE_PLC_PHOTOEYE_FLICKER CONFIRMED with source_id=/plc/sensor_io
in the medkit dashboard.
…ax smoke poll filters

The container_scripts/conveyor-line/ directory expects a manifest
component with id 'conveyor-line' (mirroring the manymove-planning
pattern earlier). Without it, gateway returned 404 on
inject-photoeye-flicker / restore-line script executions.

Also relax smoke test status filters: MANYMOVE_SELFTEST is severity 0
INFO so may not pass debounce to CONFIRMED in time; just check the
fault appears in /faults at all. For PLC heal, accept PREPASSED in
addition to HEALED since the healing threshold may not be crossed
within the 30s window from a single PASSED event.
…lision

Adding component 'conveyor-line' (for script binding) made the id
collide with the area also called 'conveyor-line'. Manifest validation
rejected the whole file, so the gateway came up with no apps /
components and every smoke assertion failed (not just the new PLC
ones).

Area is now 'line'; component stays 'conveyor-line' so the
container_scripts/conveyor-line/ directory still binds correctly via
the gateway's component->scripts mapping.
severity=0 INFO doesn't pass the FaultManager debounce, and the FAILED
-> PASSED pair clears the fault from active list within the smoke poll
window anyway. Keep the script-accepted check; the real REST round-trip
proof comes from the PLC bridge section right below it (severity 1
WARN, real source_id, real opcua_bridge forwarding).
…namespace

Topic-discovery in the medkit gateway routes nodes into SOVD entity
tree slots by namespace. Without any published topic, the opcua_bridge
was invisible in the entity tree even though its faults appeared in
the dashboard. Publishing a 1 Hz heartbeat on /plc/heartbeat anchors
the bridge under the conveyor-line area so operators see it next to
the PLC components.
…e' area

opcua_bridge publishes /plc/heartbeat (under namespace /plc); without
a matching area namespace, the gateway's topic discovery couldn't
route the bridge into the SOVD entity tree. Anchor /plc under the
'line' area and move the opcua-bridge component there too. Drop the
empty 'bridge' area.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant