feat(agents): Add hierarchical language support for VLA training by yuecideng · Pull Request #264 · DexForce/EmbodiChain

yuecideng · 2026-05-12T08:37:38Z

Description

This PR adds comprehensive language support to Online Data Streaming (ODS) for Vision-Language-Action (VLA) model training. The implementation enables VLA models to learn from multi-scale language representations similar to human task understanding.

Key Features

Hierarchical Language Structure: Organizes instructions at three abstraction levels:
- Task level: High-level goal or overall task description
- Subtask level: Intermediate step descriptions
- Primitive level: Low-level action descriptions
Multiple Language Sources:
- File-based: Load task descriptions from YAML/JSON files
- Environment-based: Generate language from the environment
- Template-based: Use templates with variable substitution
- LLM-based: Generate descriptions using GPT/Claude (optional)
Flexible Storage: Supports tokens, embeddings, or hybrid storage modes
LanguageManager: Handles tokenization and language data management with support for:
- Curriculum learning (progressive complexity)
- Data augmentation (optional)
- Multiple tokenizer backends (HuggingFace, OpenAI/tiktoken)

Changes

New files:

embodichain/lab/gym/envs/managers/language.py - LanguageManager, configs, and data structures
embodichain/lab/gym/envs/managers/language_provider.py - Language providers for different sources
configs/language/ - Example configurations, documentation, and usage examples
tests/agents/test_language_support.py - Test suite (7 passed, 4 skipped due to optional dependencies)

Modified files:

embodichain/agents/engine/data.py - Added language_cfg to OnlineDataEngineCfg and buffer creation
embodichain/lab/gym/envs/embodied_env.py - Integrated LanguageManager and language data writing
embodichain/lab/gym/utils/gym_utils.py - Extended init_rollout_buffer_from_config to allocate language fields
embodichain/lab/gym/envs/managers/__init__.py - Exported new language classes

Buffer Structure

When language support is enabled, the rollout buffer includes:

{level}_tokens: Token IDs for each hierarchy level
{level}_attention_mask: Attention masks for padding
{level}_count: Number of instructions per level
instruction_counts: Counts across all levels
change_points: Timesteps where language changes
hierarchy_depth: Current depth of hierarchy (1-3)
instruction_types: Instruction type IDs

Usage Example

language_cfg = {
    "mode": "tokens",
    "hierarchy_levels": ["task", "subtask", "primitive"],
    "max_tokens": 512,
    "tokenizer": "gpt2",
    "language_source": "file",
    "language_config_path": "configs/language/tasks_example.yaml",
}

engine_cfg = OnlineDataEngineCfg(
    buffer_size=16,
    max_episode_steps=300,
    state_dim=14,
    gym_config={...},
    language_cfg=language_cfg,
)

engine = OnlineDataEngine(engine_cfg)
engine.start()

# Access language data
for batch in dataset:
    language = batch["language"]
    task_tokens = language["task_level_tokens"]
    subtask_tokens = language["subtask_level_tokens"]
    primitive_tokens = language["primitive_level_tokens"]

Type of change

Bug fix (non-breaking change which fixes an issue)
Enhancement (non-breaking change which improves an existing functionality)
New feature (non-breaking change which adds functionality)
Breaking change (existing functionality will not work without user modification)
Documentation update

Screenshots

N/A

Checklist

I have run the black . command to format the code base.
I have made corresponding changes to the documentation (added README.md and usage examples)
I have added tests that prove my fix is effective or that my feature works
Dependencies have been updated (optional - transformers/tiktoken are optional for full functionality)

🤖 Generated with Claude Code

Add comprehensive language support to Online Data Streaming (ODS) for Vision-Language-Action (VLA) model training. The implementation provides: - Hierarchical language structure (task/subtask/primitive levels) - Multiple language sources (file, env, template, LLM) - Flexible storage modes (tokens, embeddings, hybrid) - LanguageManager for tokenization and data management - Integration with ODS shared memory buffer New files: - embodichain/lab/gym/envs/managers/language.py: LanguageManager, configs - embodichain/lab/gym/envs/managers/language_provider.py: Language providers - configs/language/: Example configurations and documentation - tests/agents/test_language_support.py: Test suite Modified files: - embodichain/agents/engine/data.py: Add language_cfg to OnlineDataEngine - embodichain/lab/gym/envs/embodied_env.py: Integrate LanguageManager - embodichain/lab/gym/utils/gym_utils.py: Extend buffer initialization - embodichain/lab/gym/envs/managers/__init__.py: Export language classes This enables VLA models to learn from multi-scale language representations similar to human task understanding. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

yuecideng added enhancement New feature or request agent Features related to agentic system data Related to data module gym robot learning env and its related features labels May 12, 2026

yuecideng marked this pull request as draft May 12, 2026 08:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agents): Add hierarchical language support for VLA training#264

feat(agents): Add hierarchical language support for VLA training#264
yuecideng wants to merge 1 commit into
mainfrom
feat/vla-language-support

yuecideng commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yuecideng commented May 12, 2026

Description

Key Features

Changes

Buffer Structure

Usage Example

Type of change

Screenshots

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant