4-stage evaluation framework for testing Claude Code plugin component triggering. Validates skills, agents, and commands activate correctly via programmatic detection and LLM judgment.
-
Updated
Apr 27, 2026 - TypeScript
4-stage evaluation framework for testing Claude Code plugin component triggering. Validates skills, agents, and commands activate correctly via programmatic detection and LLM judgment.
Benchmark harness for A/B testing Claude Code plugins against OOLONG long-context reasoning tasks. Compare truncation vs RLM-RS recursive chunking strategies. Features Claude Code hooks integration, SQLite persistence, and comprehensive scoring aligned with the OOLONG paper methodology.
🚀 Automate the evaluation of Claude Code plugin components to ensure accurate triggering of skills, agents, commands, and hooks.
Testing harness for AI agent skills and plugins — contract-driven YAML specs with mocked environments
Testing framework for Codify plugins with complete lifecycle testing, IPC communication, and cross-platform support
Performed end-to-end manual testing and source code analysis on the easy.jobs WordPress Plugin (v2.7.1), identifying 8 bugs across 22 test cases covering authentication, job management, candidate management, security, and edge cases.
Add a description, image, and links to the plugin-testing topic page so that developers can more easily learn about it.
To associate your repository with the plugin-testing topic, visit your repo's landing page and select "manage topics."