diff --git a/README.md b/README.md index c319e41a..ae95d0f2 100644 --- a/README.md +++ b/README.md @@ -174,6 +174,10 @@ For a full list of extensions and detailed documentation, visit **[robotcontrols For full documentation, including advanced installation, modular usage, and API references, please visit: 👉 **[robotcontrolstack.org](https://robotcontrolstack.org)** +Useful quick-reference pages: +- **[RCS Conventions](https://robotcontrolstack.org/user_guide/conventions)** for quaternion order, frames, Euler angles, and gripper semantics +- **[Sim Scene Configuration](https://robotcontrolstack.org/user_guide/scene_configuration)** for `SimEnvCreatorConfig`, scene frames, and example setup patterns + ## 🤝 Contribution We welcome contributions from the robotics and ML community! For contribution guidelines, please check out **[robotcontrolstack.org/contributing](https://robotcontrolstack.org/contributing)**. diff --git a/docs/index.md b/docs/index.md index 3c655216..64950871 100644 --- a/docs/index.md +++ b/docs/index.md @@ -18,6 +18,11 @@ ## Documentation +If you are looking for the most common frame, pose, and scene-setup details first, start here: + +- [RCS Conventions](user_guide/conventions.md) +- [Sim Scene Configuration](user_guide/scene_configuration.md) + ```{toctree} :maxdepth: 2 :caption: User Guide diff --git a/docs/user_guide/conventions.md b/docs/user_guide/conventions.md new file mode 100644 index 00000000..86fcf4e0 --- /dev/null +++ b/docs/user_guide/conventions.md @@ -0,0 +1,221 @@ +# RCS Conventions + +This page is the quick reference for coordinate frames, pose encodings, gripper semantics, and units used across RCS. +```text +Simple 3D sketch of the same right-handed frame (robot base is at the origin) + + +z (up) + ^ + | + o---> +y (left) + / + / + +x (forward) +``` + +## At a glance + +- **Quaternion order in RCS is `xyzw`** (`[qx, qy, qz, qw]`), matching Eigen's coefficient order. +- **MuJoCo free-joint `qpos` uses `wxyz`**, so direct MuJoCo state access needs an explicit reorder. +- **Robot/base frames are right-handed** with `x` forward, `y` left, and `z` up. +- **Gripper commands are normalized**: `0` means closed and `1` means open. +- **Euler angles are `roll`, `pitch`, `yaw`** around the `x`, `y`, and `z` axes. +- **Translations are in meters and angles are in radians** unless explicitly documented otherwise. + +If you are working with multi-robot scene composition, also see the [Sim scene configuration guide](scene_configuration.md). + +## Frames + +### World frame + +In simulation, scenes and free objects live in the global **world frame**. + +Use world coordinates when you want to place something in the room itself, for example a cube on a table or a fixed camera in the lab. + +The core robot API exposes explicit conversions between world and robot coordinates: + +- `Robot.get_base_pose_in_world_coordinates()` +- `Robot.to_pose_in_world_coordinates(...)` +- `Robot.to_pose_in_robot_coordinates(...)` + +### Robot base frame + +Kinematics and low-level robot poses are expressed in the robot's **base frame**. + +This is the frame assumed by the kinematics backends, for example in `src/rcs/Kinematics.cpp`, where `forward(...)` says the pose is assumed to be in the robot's coordinate frame. + +When you interpret or command robot poses, the expected base-frame axis orientation is: + +```text +x = forward +y = left +z = up +``` + +This is also the convention documented for Franka Quest teleoperation in `examples/teleop/README.md`. + +### End-effector frame: `attachment_site` + +Each robot config defines an `attachment_site`. This is the end-effector frame used by the kinematics stack. + +Common examples in the repository are: + +- `attachment_site_0` for FR3 / Panda +- `attachment_site` for UR5e / XArm7 +- `gripper` for SO101 + +If you are unsure which frame a robot uses, check its config or the relevant example scene in `python/rcs/envs/configs.py`. + +### Tool frame: `tcp_offset` + +`tcp_offset` is applied on top of the attachment site to define the actual tool center point (TCP) used by motion commands and IK. + +In practice: + +- `attachment_site` = frame coming from the robot model +- `tcp_offset` = extra transform from that frame to the tool you actually want to control + +A typical example is a wrist-mounted gripper where the tool center point is slightly in front of the model's attachment site. + +## Pose representations + +RCS uses several pose encodings. The important ones are: + +### `Pose` + +`rcs.common.Pose` is the main transform type. It supports construction from: + +- translation only +- quaternion + translation +- `RPY` + translation +- rotation matrix + translation + +### Quaternion order + +Within RCS, quaternions are stored in **xyzw** order. + +This is visible in two places: + +- `src/rcs/Pose.cpp`: `rotation_q()` returns `this->m_rotation.coeffs()` +- `python/tests/test_common.py` checks that the identity quaternion is `[0, 0, 0, 1]` + +So the convention is: + +```text +[qx, qy, qz, qw] +``` + +This matches Eigen's quaternion coefficient layout. + +### `tquat` + +`tquat` means translation plus quaternion and is used by the environment API. + +The value layout is: + +```text +[x, y, z, qx, qy, qz, qw] +``` + +This comes directly from `python/rcs/envs/base.py`, where `tquat` is built from `pose.translation()` followed by `pose.rotation_q()`. + +### `xyzrpy` + +`xyzrpy` is the translation plus roll-pitch-yaw representation used by the environment API. + +The value layout is: + +```text +[x, y, z, roll, pitch, yaw] +``` + +The `RPY` type in `python/rcs/_core/common.pyi` exposes the fields in exactly that order: + +- `roll` +- `pitch` +- `yaw` + +These angles are rotations around the `x`, `y`, and `z` axes, respectively. + +### Rotation vector / `rotvec` + +Some hardware integrations, notably UR, also use a 6D rotation-vector pose: + +```text +[x, y, z, rx, ry, rz] +``` + +You can see this in `extensions/rcs_ur5e/src/rcs_ur5e/hw.py`, where `common.RotVec(...).as_quaternion_vector()` is converted into an RCS `Pose`, and `Pose.rotvec()` is sent back to the robot. + +## Gripper convention + +Gripper actions and widths are normalized to the range `[0, 1]`. + +The convention used across RCS is: + +```text +0 = closed +1 = open +``` + +This is documented directly in multiple places: + +- `python/rcs/envs/base.py`: `# 0 for closed, 1 for open (>=0.5 for open)` +- `src/sim/SimGripper.h`: `// normalized width of the gripper, 0 is closed, 1 is open` +- `extensions/rcs_fr3/src/hw/FrankaHand.h`: `// normalized width of the gripper, 0 is closed, 1 is open` + +For binary grippers, the environment wrapper rounds and clips the command, and values `>= 0.5` are treated as open. + +## Units + +Unless a specific API says otherwise: + +- translations, distances, and Cartesian offsets are in **meters** +- joint angles and Euler angles are in **radians** +- camera depth is stored as **metric depth scaled by 1000** in `uint16`, so values are effectively in **millimeters** + +In practice, that means: + +- moving `0.01` in `tquat[0]` means moving **1 cm** in `x` +- `np.pi / 2` means **90 degrees** +- a depth value of `723` means about **0.723 m** + +The relevant camera path is: + +- `python/rcs/camera/sim.py` converts the MuJoCo depth buffer to **meters** when `physical_units=True` +- `python/rcs/camera/interface.py` defines `BaseCameraSet.DEPTH_SCALE = 1000` +- the resulting metric depth is multiplied by that scale factor and stored as `uint16` + +The same scaled-metric convention is also used by camera integrations such as RealSense and ZED. + +## MuJoCo caveat: free-joint quaternions use `wxyz` + +A common source of confusion is that **RCS uses `xyzw`**, but MuJoCo free-joint `qpos` stores the quaternion as **`wxyz`**. + +RCS exposes both forms explicitly: + +- `Pose.rotation_q()` returns `xyzw` +- `Pose.rotation_q_wxyz()` returns `wxyz` + +You can see the `wxyz` form used when writing MuJoCo-facing state in files such as `python/rcs/sim/sim.py` and `python/rcs/envs/tasks.py`. + +So: + +- **RCS `Pose` / env API**: `xyzw` +- **MuJoCo free-joint state**: `wxyz` + +If you manipulate MuJoCo state directly, convert between the two explicitly. + +## Practical checklist + +When something looks wrong, check these first: + +1. Are you working in **world frame** or **robot/base frame**? +2. Does your robot/base frame follow **x forward, y left, z up**? +3. Is your end-effector frame the correct `attachment_site`? +4. Did you apply the right `tcp_offset`? +5. Are your quaternions **xyzw** in RCS? +6. Are you accidentally feeding **MuJoCo `wxyz`** into an RCS API? +7. Are your Euler angles ordered as **roll, pitch, yaw** around **x, y, z**? +8. Are your gripper commands using **0 = closed, 1 = open**? +9. Are your translations in **meters** and your angles in **radians**? diff --git a/docs/user_guide/index.md b/docs/user_guide/index.md index 739caecc..f3e5081f 100644 --- a/docs/user_guide/index.md +++ b/docs/user_guide/index.md @@ -2,10 +2,17 @@ The User Guide provides in-depth information about the core concepts and components of the Robot Control Stack. +If you want the quickest orientation first, start with: + +- [RCS Conventions](conventions.md) +- [Sim Scene Configuration](scene_configuration.md) + ```{toctree} :maxdepth: 2 architecture gym_interface low_level_api +conventions +scene_configuration ``` diff --git a/docs/user_guide/scene_configuration.md b/docs/user_guide/scene_configuration.md new file mode 100644 index 00000000..aeddac88 --- /dev/null +++ b/docs/user_guide/scene_configuration.md @@ -0,0 +1,367 @@ +# Sim Scene Configuration + +This page explains the simulation scene API in `python/rcs/envs/scenes.py` and shows how the example scene configs in `python/rcs/envs/configs.py` fit together. + +If you are new to the frame conventions first, read [RCS Conventions](conventions.md). + +```text +world frame +└── root frame + ├── all composed scene objects live here + └── shared base frame + (same kinematic node as root frame) + (common robot-coordinate frame) + ├── i-th robot base + │ └── i-th robot attachment_site + ├── j-th robot base + │ └── j-th robot attachment_site + └── ... +``` + +## The simple mental model + +When building an RCS sim scene, it helps to think about five frames: + +1. **World frame** + - The global MuJoCo frame. + - This is the outermost reference for the whole scene. + +2. **Root frame** + - The scene-local frame for the composed robot setup. + - In the usual composed setup, all robot-scene objects are placed from here. + +3. **Shared base frame** + - The common coordinate frame used for all robot actions and observations. + - It is attached to the same kinematic node as the root frame, but represents the common robot coordinate convention. + +4. **i-th robot base frame** + - The base frame of one specific robot. + - Low-level kinematics and Cartesian commands are expressed here. + +5. **i-th robot `attachment_site`** + - The end-effector mounting frame for that robot, before any optional `tcp_offset`. + - Use this for wrist-mounted objects, cameras, and tools. + +A good rule of thumb is: + +- want the outer global reference -> **world frame** +- want the composed scene placement frame -> **root frame** +- want one common coordinate frame for all robots -> **shared base frame** +- want per-robot kinematics -> **i-th robot base frame** +- want wrist or tool mounting -> **i-th robot `attachment_site`** + +If needed, extra `world_frame_objects` can still be placed directly in world coordinates. + +## How the main placement frames work + +For composed robot scenes, `SimEnvCreator.create_model()` combines three scene-placement transforms in this order: + +```python +robot2world = root_frame_to_world * shared_base_frame_to_root_frame * robot_to_shared_base_frame[robot_name] +``` + +In simple terms: + +- `root_frame_to_world` places the whole rig into the MuJoCo world +- `shared_base_frame_to_root_frame` defines the shared robot coordinate frame relative to the root frame +- `robot_to_shared_base_frame` places each robot relative to that shared robot frame + +### Single-robot intuition + +For a simple single-arm scene, all three can often be identity transforms: + +```python +root_frame_to_world = rcs.common.Pose() +shared_base_frame_to_root_frame = rcs.common.Pose() +robot_to_shared_base_frame = {"robot": rcs.common.Pose()} +``` + +That means the robot base, shared base, root frame, and world origin all coincide. + +### Dual-arm intuition + +The FR3 duo example in `python/rcs/envs/configs.py` uses these frames more meaningfully: + +- `root_frame_to_world` keeps the whole duo rig aligned with the world +- `shared_base_frame_to_root_frame` lifts the shared base to the duo mount height +- `robot_to_shared_base_frame` offsets the left and right robots from the center + +That is why the dual-arm example can expose one common action frame while still placing each robot correctly. + +## `SimEnvCreatorConfig` keys + +This is the main top-level config for the scene API. + +| Key | What it controls | Typical use | +| --- | --- | --- | +| `robot_cfgs` | Maps robot names to `SimRobotConfig` | Define one robot or multiple named robots such as `"left"` and `"right"` | +| `sim_cfg` | MuJoCo runtime settings (`SimConfig`) | Realtime, async control, frequency, convergence behavior | +| `control_mode` | Action representation | For example `ControlMode.CARTESIAN_TQuat` | +| `task_cfg` | Optional task-specific config | Add pick/place or other task logic | +| `scene` | Base scene XML path or scene key | Usually from `SCENE_PATHS[...]` | +| `gripper_cfgs` | Optional gripper config per robot | Add one gripper per robot | +| `camera_cfgs` | Optional camera config dictionary | Define resolution, type, and frame rate for named cameras | +| `max_relative_movement` | Relative action limit | Limit per-step Cartesian delta | +| `relative_to` | Relative action reference | Usually `RelativeTo.LAST_STEP` or `RelativeTo.NONE` | +| `robot_to_shared_base_frame` | Per-robot offset relative to the shared base frame | Multi-robot layouts | +| `add_gravcomp` | Add gravity compensation to the composed scene | Often useful for manipulation scenes | +| `wrapper_cfg` | Wrapper behavior flags | Binary gripper, home-on-reset, depth output | +| `headless` | GUI toggle | `True` for no GUI | +| `shared_base_frame_to_root_frame` | Offset from shared base frame to root frame | Move shared command origin inside the rig | +| `root_frame_to_world` | Offset from root frame to MuJoCo world | Place the whole setup in the room | +| `alternative_combined_robot_mjcf` | Use a pre-combined robot MJCF instead of composing robots one by one | Advanced custom scenes | +| `world_frame_objects` | Objects placed directly in world coordinates | Loose props, room-fixed assets | +| `root_frame_objects` | Objects placed in root-frame coordinates | Tables, mounts, fixtures that should move with the rig | +| `robot_frame_objects` | Objects attached in a robot attachment-site frame | Wrist mounts, end-effector payloads | +| `camera_adds` | Cameras to add to the scene | Fixed overhead cameras or wrist cameras | +| `gripper_offsets` | Pose offsets for mounted grippers | Align visual or tool frames | +| `_original_cfg` | Internal helper used after prefixing | Usually ignore this in user code | + +## What usually lives inside the nested configs + +The scene config mostly wires together three lower-level config types. + +### `robot_cfgs: dict[str, SimRobotConfig]` + +Each robot entry usually defines things such as: + +- robot type +- kinematic model path +- `attachment_site` +- `tcp_offset` +- joint names and actuator names +- base link name +- degrees of freedom, joint limits, and `q_home` + +The single-arm and dual-arm examples in `python/rcs/envs/configs.py` are good templates for this. + +### `gripper_cfgs: dict[str, SimGripperConfig]` + +Each gripper entry usually defines: + +- gripper type +- gripper joint names and actuator name +- min/max width or actuator range +- collision geometry settings +- callback timing + +### `camera_cfgs: dict[str, SimCameraConfig]` + +Each camera entry usually defines: + +- camera identifier +- camera type +- resolution +- frame rate + +A useful pattern is: + +- `camera_cfgs` defines the camera runtime properties +- `camera_adds` defines where that camera is placed in the scene + +## The most important nested configs + +### `WrapperConfig` + +`WrapperConfig` controls behavior of the environment wrappers around the raw simulation. + +| Key | Meaning | +| --- | --- | +| `binary_gripper` | If `True`, gripper commands are treated as open/close instead of continuous width | +| `home_on_reset` | If `True`, the robot returns home during reset | +| `include_depth` | If `True`, camera wrappers include depth images. These are metric depth values scaled by `BaseCameraSet.DEPTH_SCALE = 1000` and stored as `uint16`, so they are effectively in millimeters | + +### `CameraAdderConfig` + +`CameraAdderConfig` describes how a camera is added to the scene. + +| Key | Meaning | +| --- | --- | +| `xml_path` | Optional camera XML asset to insert directly | +| `fovy` | Camera field of view, used when creating a camera directly | +| `offset` | Camera pose offset | +| `attachment_site` | Attachment site to use if mounted on a robot | +| `robot_name` | If set, mount the camera on that robot; otherwise add it as a scene camera | + +The important frame detail is: + +- if `robot_name` is **not** set, `offset` is interpreted in the **root frame** and then moved into world by `root_frame_to_world` +- if `robot_name` **is** set, `offset` is interpreted relative to that robot's **attachment site** + +## Easy examples + +### Minimal single-robot scene + +This is the basic shape used by `EmptyWorldFR3` in `python/rcs/envs/configs.py`: + +```python +cfg = SimEnvCreatorConfig( + robot_cfgs={"robot": robot_cfg}, + sim_cfg=SimConfig(async_control=False, realtime=True, frequency=1), + control_mode=ControlMode.CARTESIAN_TQuat, + scene=SCENE_PATHS["empty_world"], + gripper_cfgs={"robot": gripper_cfg}, + camera_cfgs={"bird_eye": bird_eye_cfg, "wrist": wrist_cfg}, + robot_to_shared_base_frame={"robot": rcs.common.Pose()}, + shared_base_frame_to_root_frame=rcs.common.Pose(), + root_frame_to_world=rcs.common.Pose(), +) +``` + +What this means in plain language: + +- there is one robot named `robot` +- it uses Cartesian `tquat` actions +- the base scene is the empty world +- there is one gripper on the robot +- there are two cameras +- all high-level frames start at the same origin + +### Dual-arm scene + +This is the important part of the `EmptyWorldFR3Duo` example: + +```python +robot_cfgs = {"left": robot_cfg_left, "right": robot_cfg_right} + +robot_to_shared_base_frame = { + "left": DEFAULT_TRANSFORMS["FR3_DUOMOUNT_LEFT_ROBOT"], + "right": DEFAULT_TRANSFORMS["FR3_DUOMOUNT_RIGHT_ROBOT"], +} + +shared_base_frame_to_root_frame = DEFAULT_TRANSFORMS["FR3_DUOMOUNT_HEIGHT_OFFSET"] +root_frame_to_world = rcs.common.Pose() +``` + +In plain language: + +- the shared base frame sits at the logical center of the duo setup +- the left and right robot bases are offset from that center +- the whole setup can still be moved together by changing `root_frame_to_world` + +### Object placement: which dictionary should I use? + +#### `world_frame_objects` + +Use this when the object should stay fixed in the room. + +```python +world_frame_objects = { + "cube": (OBJECT_PATHS["green_cube"], rcs.common.Pose(translation=np.array([0.5, 0.0, 0.2]))), +} +``` + +Example meaning: place a cube at a fixed world position. + +#### `root_frame_objects` + +Use this when the object belongs to the rig and should move together with it. + +```python +root_frame_objects = { + "duo_mount": (OBJECT_PATHS["fr3_duo_mount"], DEFAULT_TRANSFORMS["FR3_DUOMOUNT_BASE"]), +} +``` + +Example meaning: the duo mount is part of the setup, not a free world object. + +#### `robot_frame_objects` + +Use this when the object should be attached to one robot's tool frame. + +```python +robot_frame_objects = { + "left": { + "left_d405_mount": ( + OBJECT_PATHS["robotiq_d405_mount"], + DEFAULT_TRANSFORMS["FR3_ROBOTIQ_WRIST_D405_MOUNT"], + ) + } +} +``` + +Example meaning: attach a wrist mount to the left robot only. + +## Camera depth units + +When depth is enabled, the camera wrapper exposes depth images as scaled metric depth: + +- sim depth is first converted to **meters** in `python/rcs/camera/sim.py` +- camera frames use `BaseCameraSet.DEPTH_SCALE = 1000` +- depth is then stored as `uint16` + +So in practice: + +- divide by `1000` to get **meters** +- or read the values directly as **millimeters** + +Example: + +- `depth[y, x] == 1500` means the point is about **1.5 m** away from the camera + +### Camera placement + +#### Fixed scene camera + +This pattern from `EmptyWorldFR3` adds an overhead camera: + +```python +camera_adds = { + "bird_eye": CameraAdderConfig( + fovy=60.0, + offset=rcs.common.Pose( + translation=np.array([0.271, 0.0, 2.080]), + quaternion=np.array([0.0060, -0.0060, -0.7067, 0.7074]), + ), + ) +} +``` + +Because `robot_name` is not set, this pose is interpreted in the **root frame**. + +#### Wrist camera + +This pattern mounts a camera to a robot: + +```python +camera_adds = { + "wrist": CameraAdderConfig( + fovy=60.0, + offset=some_pose, + robot_name="robot", + ) +} +``` + +Because `robot_name` is set, `offset` is interpreted relative to that robot's **attachment site**. + +## Common mistakes + +1. **Mixing up world and root frame** + - If the whole rig should move together, use `root_frame_to_world` or `root_frame_objects`, not `world_frame_objects`. + +2. **Using the wrong frame for camera offsets** + - Scene cameras use root-frame offsets. + - Robot-mounted cameras use attachment-site offsets. + +3. **Forgetting matching camera names** + - If a camera is added without `xml_path`, its name must also exist in `camera_cfgs`. + +4. **Putting wrist-mounted assets into world objects** + - Use `robot_frame_objects` for things that should follow the robot wrist. + +5. **Using `alternative_combined_robot_mjcf` without the expected prefixes** + - The docstring in `scenes.py` requires names like `robot{robot_name}`. + +## A practical workflow + +When building a new scene, this usually works well: + +1. Start with one robot and identity transforms. +2. Add `root_frame_objects` for mounts or fixtures. +3. Add `world_frame_objects` only for room-fixed props. +4. Add `robot_frame_objects` for wrist payloads. +5. Add cameras with `camera_adds`. +6. Only then introduce non-trivial `shared_base_frame_to_root_frame` and `robot_to_shared_base_frame` offsets. + +That order keeps the frame reasoning much easier.