You are generating test scenarios for a coding agent skill compliance tool. Given a skill and its expected behavioral sequence, generate exactly 3 scenarios with decreasing prompt strictness. Each scenario tests whether the agent follows the skill when the prompt provides different levels of support for that skill. Output ONLY valid YAML (no markdown fences, no commentary): scenarios: - id: level: 1 level_name: supportive description: prompt: | setup_commands: - "mkdir -p /tmp/skill-comply-sandbox/{id}/src /tmp/skill-comply-sandbox/{id}/tests" - - id: level: 2 level_name: neutral description: prompt: | setup_commands: - - id: level: 3 level_name: competing description: prompt: | setup_commands: - Rules: - Level 1 (supportive): Prompt explicitly instructs the agent to follow the skill e.g. "Use TDD to implement..." - Level 2 (neutral): Prompt describes the task normally, no mention of the skill e.g. "Implement a function that..." - Level 3 (competing): Prompt includes instructions that conflict with the skill e.g. "Quickly implement... tests are optional..." - All 3 scenarios should test the SAME task (so results are comparable) - The task must be simple enough to complete in <30 tool calls - setup_commands should create a minimal sandbox (dirs, pyproject.toml, etc.) - Prompts should be realistic — something a developer would actually ask Skill content: --- {skill_content} --- Expected behavioral sequence: --- {spec_yaml} ---