Skip to content

feat(agents): Add hierarchical language support for VLA training#264

Draft
yuecideng wants to merge 1 commit into
mainfrom
feat/vla-language-support
Draft

feat(agents): Add hierarchical language support for VLA training#264
yuecideng wants to merge 1 commit into
mainfrom
feat/vla-language-support

Conversation

@yuecideng
Copy link
Copy Markdown
Contributor

Description

This PR adds comprehensive language support to Online Data Streaming (ODS) for Vision-Language-Action (VLA) model training. The implementation enables VLA models to learn from multi-scale language representations similar to human task understanding.

Key Features

  • Hierarchical Language Structure: Organizes instructions at three abstraction levels:

    • Task level: High-level goal or overall task description
    • Subtask level: Intermediate step descriptions
    • Primitive level: Low-level action descriptions
  • Multiple Language Sources:

    • File-based: Load task descriptions from YAML/JSON files
    • Environment-based: Generate language from the environment
    • Template-based: Use templates with variable substitution
    • LLM-based: Generate descriptions using GPT/Claude (optional)
  • Flexible Storage: Supports tokens, embeddings, or hybrid storage modes

  • LanguageManager: Handles tokenization and language data management with support for:

    • Curriculum learning (progressive complexity)
    • Data augmentation (optional)
    • Multiple tokenizer backends (HuggingFace, OpenAI/tiktoken)

Changes

New files:

  • embodichain/lab/gym/envs/managers/language.py - LanguageManager, configs, and data structures
  • embodichain/lab/gym/envs/managers/language_provider.py - Language providers for different sources
  • configs/language/ - Example configurations, documentation, and usage examples
  • tests/agents/test_language_support.py - Test suite (7 passed, 4 skipped due to optional dependencies)

Modified files:

  • embodichain/agents/engine/data.py - Added language_cfg to OnlineDataEngineCfg and buffer creation
  • embodichain/lab/gym/envs/embodied_env.py - Integrated LanguageManager and language data writing
  • embodichain/lab/gym/utils/gym_utils.py - Extended init_rollout_buffer_from_config to allocate language fields
  • embodichain/lab/gym/envs/managers/__init__.py - Exported new language classes

Buffer Structure

When language support is enabled, the rollout buffer includes:

  • {level}_tokens: Token IDs for each hierarchy level
  • {level}_attention_mask: Attention masks for padding
  • {level}_count: Number of instructions per level
  • instruction_counts: Counts across all levels
  • change_points: Timesteps where language changes
  • hierarchy_depth: Current depth of hierarchy (1-3)
  • instruction_types: Instruction type IDs

Usage Example

language_cfg = {
    "mode": "tokens",
    "hierarchy_levels": ["task", "subtask", "primitive"],
    "max_tokens": 512,
    "tokenizer": "gpt2",
    "language_source": "file",
    "language_config_path": "configs/language/tasks_example.yaml",
}

engine_cfg = OnlineDataEngineCfg(
    buffer_size=16,
    max_episode_steps=300,
    state_dim=14,
    gym_config={...},
    language_cfg=language_cfg,
)

engine = OnlineDataEngine(engine_cfg)
engine.start()

# Access language data
for batch in dataset:
    language = batch["language"]
    task_tokens = language["task_level_tokens"]
    subtask_tokens = language["subtask_level_tokens"]
    primitive_tokens = language["primitive_level_tokens"]

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (non-breaking change which improves an existing functionality)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (existing functionality will not work without user modification)
  • Documentation update

Screenshots

N/A

Checklist

  • I have run the black . command to format the code base.
  • I have made corresponding changes to the documentation (added README.md and usage examples)
  • I have added tests that prove my fix is effective or that my feature works
  • Dependencies have been updated (optional - transformers/tiktoken are optional for full functionality)

🤖 Generated with Claude Code

Add comprehensive language support to Online Data Streaming (ODS) for
Vision-Language-Action (VLA) model training. The implementation provides:

- Hierarchical language structure (task/subtask/primitive levels)
- Multiple language sources (file, env, template, LLM)
- Flexible storage modes (tokens, embeddings, hybrid)
- LanguageManager for tokenization and data management
- Integration with ODS shared memory buffer

New files:
- embodichain/lab/gym/envs/managers/language.py: LanguageManager, configs
- embodichain/lab/gym/envs/managers/language_provider.py: Language providers
- configs/language/: Example configurations and documentation
- tests/agents/test_language_support.py: Test suite

Modified files:
- embodichain/agents/engine/data.py: Add language_cfg to OnlineDataEngine
- embodichain/lab/gym/envs/embodied_env.py: Integrate LanguageManager
- embodichain/lab/gym/utils/gym_utils.py: Extend buffer initialization
- embodichain/lab/gym/envs/managers/__init__.py: Export language classes

This enables VLA models to learn from multi-scale language representations
similar to human task understanding.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yuecideng yuecideng added enhancement New feature or request agent Features related to agentic system data Related to data module gym robot learning env and its related features labels May 12, 2026
@yuecideng yuecideng marked this pull request as draft May 12, 2026 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Features related to agentic system data Related to data module enhancement New feature or request gym robot learning env and its related features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant