Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
275 changes: 275 additions & 0 deletions configs/language/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,275 @@
# Language Support for VLA Training

This directory contains configuration and examples for the hierarchical language support feature in EmbodiChain, enabling Vision-Language-Action (VLA) model training with Online Data Streaming (ODS).

## Overview

The language support feature adds hierarchical language descriptions to the rollout buffer, organized at three abstraction levels:

1. **Task Level**: High-level goal or overall task description
2. **Subtask Level**: Intermediate step descriptions
3. **Primitive Level**: Low-level action descriptions

This hierarchical structure enables VLA models to learn from multi-scale language representations, similar to human task understanding.

## Features

- **Multiple Language Sources**: Support for file-based, environment-based, template-based, and LLM-generated language
- **Hierarchical Structure**: Organize instructions at multiple abstraction levels
- **Flexible Storage**: Support for tokens, embeddings, or hybrid storage modes
- **Dynamic Chunk Sizes**: Works with variable-length trajectory chunks
- **Curriculum Learning**: Gradually increase language complexity during training
- **Token Agnostic**: Works with various tokenizers (GPT, BERT, etc.)

## Quick Start

### 1. Prepare Language Configuration

Create a YAML file with task descriptions:

```yaml
# tasks.yaml
pick_and_place:
task:
- "Pick up the red block and place it in the blue basket."

subtask:
- "Move the gripper to the red block."
- "Grasp the red block."
- "Lift the block and move to the blue basket."
- "Release the block into the basket."

primitive:
- "Close gripper."
- "Move up."
- "Move right."
- "Open gripper."
```

### 2. Configure ODS Engine

```python
from embodichain.agents.engine.data import OnlineDataEngine, OnlineDataEngineCfg

language_cfg = {
"mode": "tokens",
"hierarchy_levels": ["task", "subtask", "primitive"],
"max_tokens": 512,
"tokenizer": "gpt2",
"language_source": "file",
"language_config_path": "configs/language/tasks.yaml",
"max_instructions_per_level": 5,
}

engine_cfg = OnlineDataEngineCfg(
buffer_size=16,
max_episode_steps=300,
state_dim=14,
gym_config={...},
language_cfg=language_cfg,
)

engine = OnlineDataEngine(engine_cfg)
engine.start()
```

### 3. Use Language Data in Training

```python
from embodichain.agents.datasets.online_data import OnlineDataset
from torch.utils.data import DataLoader

dataset = OnlineDataset(engine, chunk_size=64, batch_size=8)
loader = DataLoader(dataset, batch_size=None)

for batch in loader:
obs = batch["obs"]
actions = batch["actions"]
language = batch["language"]

# Access language at different hierarchy levels
task_tokens = language["task_level_tokens"]
subtask_tokens = language["subtask_level_tokens"]
primitive_tokens = language["primitive_level_tokens"]

# Train your VLA model
# loss = vla_model(obs, language, actions)
```

## Configuration Options

### Language Configuration

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `mode` | str | "tokens" | Storage mode: 'tokens', 'embeddings', or 'hybrid' |
| `hierarchy_levels` | list | ["task", "subtask", "primitive"] | Hierarchy levels to store |
| `max_tokens` | int | 512 | Maximum sequence length per instruction |
| `tokenizer` | str | "gpt2" | Tokenizer identifier |
| `pad_token_id` | int | 0 | Token ID used for padding |
| `max_instructions_per_level` | int | 3 | Maximum number of instructions per level |
| `embedding_dim` | int | 768 | Dimension for embeddings (if mode='embeddings') |
| `language_source` | str | "env" | Source of language: 'env', 'file', 'llm', 'template' |
| `language_config_path` | str | None | Path to language config file (if source='file') |

### Language Sources

#### File-Based (`language_source: "file"`)
Load language descriptions from YAML or JSON files. Best for static task descriptions.

```python
language_cfg = {
"language_source": "file",
"language_config_path": "configs/language/tasks.yaml",
}
```

#### Environment-Based (`language_source: "env"`)
Generate language descriptions from the environment. The environment should implement:
- `get_task_language(task_id, context) -> HierarchicalLanguageData`
- Or have a `task_description` attribute

```python
language_cfg = {
"language_source": "env",
}
```

#### Template-Based (`language_source: "template"`)
Use templates with variable substitution for structured tasks.

```python
language_cfg = {
"language_source": "template",
"templates": {
"pick_and_place": {
"task": "Pick up the {color} {object} and place it {location}.",
"subtasks": [...],
}
},
"variables": {"color": "red", "object": "block", "location": "in basket"},
}
```

#### LLM-Based (`language_source: "llm"`)
Generate descriptions using an LLM (e.g., GPT-4, Claude).

```python
language_cfg = {
"language_source": "llm",
"model": "gpt-4",
"api_key": "your-api-key",
}
```

## Buffer Structure

When language support is enabled, the rollout buffer includes the following fields:

### Per-Hierarchy-Level Fields

For each level in `hierarchy_levels` (e.g., "task", "subtask", "primitive"):

- `{level}_tokens`: `[batch_size, max_episode_steps, max_instructions, max_tokens]`
- `{level}_attention_mask`: `[batch_size, max_episode_steps, max_instructions, max_tokens]`
- `{level}_count`: `[batch_size, max_episode_steps]`

### Global Fields

- `instruction_counts`: `[batch_size, max_episode_steps, 3]` - Counts per hierarchy level
- `change_points`: `[batch_size, max_episode_steps, max_instructions]` - Timesteps where language changes
- `hierarchy_depth`: `[batch_size, max_episode_steps]` - Current depth of hierarchy (1-3)
- `instruction_types`: `[batch_size, max_episode_steps, max_instructions]` - Instruction type IDs

## Advanced Usage

### Custom Language Provider

```python
from embodichain.lab.gym.envs.managers import LanguageProvider, HierarchicalLanguageData

class MyLanguageProvider(LanguageProvider):
def get_language(self, task_id, context=None):
# Generate custom language data
return HierarchicalLanguageData(
task_level=[...],
subtask_level=[...],
primitive_level=[...],
)

def get_available_tasks(self):
return ["task1", "task2"]
```

### Language Augmentation

```python
from embodichain.lab.gym.envs.managers import LanguageAugmentationCfg

augmentation_cfg = LanguageAugmentationCfg(
synonym_replacement=0.1,
template_variation=True,
augmentation_prob=0.5,
)
```

### Curriculum Learning

```python
from embodichain.lab.gym.envs.managers import LanguageCurriculumCfg

curriculum_cfg = LanguageCurriculumCfg(
enabled=True,
stage_duration=1000,
stages=[
# Simple language first
LanguageCurriculumCfg.CurriculumStage(
max_words=10,
max_sentences=1,
max_hierarchy_depth=1,
),
# Then more complex
LanguageCurriculumCfg.CurriculumStage(
max_words=50,
max_sentences=3,
max_hierarchy_depth=3,
),
],
)
```

## Examples

See `usage_example.py` for complete examples of:
- File-based language loading
- Environment-based language generation
- Template-based language
- Dynamic chunk sizes with language
- Custom environments with language

## Files

- `tasks_example.yaml` - Example task descriptions in YAML format
- `usage_example.py` - Complete usage examples
- `README.md` - This file

## API Reference

### Core Classes

- `LanguageCfg` - Configuration for language data
- `LanguageManager` - Manages tokenization and language data
- `LanguageData` - Single-level language data
- `HierarchicalLanguageData` - Multi-level hierarchical language data
- `LanguageProvider` - Abstract base for language sources
- `FileBasedLanguageProvider` - Load from YAML/JSON files
- `LLMBasedLanguageProvider` - Generate with LLM
- `EnvBasedLanguageProvider` - Generate from environment
- `TemplateBasedLanguageProvider` - Template-based generation

## Notes

- Language data is broadcast across all timesteps in an episode
- Tokenization happens in the simulation subprocess for efficiency
- Shared memory ensures zero-copy data transfer to training process
- Compatible with all existing ODS features (dynamic chunks, etc.)
108 changes: 108 additions & 0 deletions configs/language/tasks_example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# Example language configuration file for VLA training
# This file demonstrates the hierarchical language structure for task descriptions

pick_and_place:
task:
- "Pick up the red block and place it in the blue basket."

subtask:
- "Move the gripper to the red block."
- "Grasp the red block."
- "Lift the block and move to the blue basket."
- "Release the block into the basket."

primitive:
- "Close gripper."
- "Move up."
- "Move right."
- "Open gripper."

change_points: [0, 10, 20, 30]

stack_blocks:
task:
- "Stack the red block on top of the green block."

subtask:
- "Locate the red block and green block."
- "Move the gripper to the red block."
- "Grasp the red block."
- "Lift the red block."
- "Position the red block above the green block."
- "Lower the red block onto the green block."
- "Release the red block."

primitive:
- "Close gripper."
- "Move up."
- "Move forward."
- "Move left."
- "Open gripper."

change_points: [0, 5, 10, 15, 20, 25, 30]

pour_liquid:
task:
- "Pour water from the cup into the bowl."

subtask:
- "Approach the cup with the gripper."
- "Grasp the cup securely."
- "Lift the cup."
- "Tilt the cup over the bowl."
- "Wait for liquid to pour."
- "Return the cup to upright position."

primitive:
- "Close gripper."
- "Move up."
- "Rotate wrist."
- "Wait."
- "Rotate wrist back."

change_points: [0, 5, 10, 15, 25, 30]

button_press:
task:
- "Press the red button to activate the mechanism."

subtask:
- "Locate the red button."
- "Move the end-effector to the button."
- "Apply downward force to press the button."
- "Release and retract."

primitive:
- "Move forward."
- "Move down."
- "Apply force."
- "Move up."
- "Move backward."

change_points: [0, 5, 10, 15, 20]

door_open:
task:
- "Open the cabinet door and place the object inside."

subtask:
- "Approach the cabinet handle."
- "Grasp the cabinet handle."
- "Pull the door open."
- "Pick up the object."
- "Move the object into the cabinet."
- "Release the object."
- "Close the cabinet door."

primitive:
- "Move forward."
- "Close gripper."
- "Move backward."
- "Move down."
- "Close gripper."
- "Move forward."
- "Open gripper."
- "Move backward."
- "Push forward."

change_points: [0, 5, 10, 15, 20, 25, 30, 35]
Loading
Loading