mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-04-14 13:53:29 +08:00
feat(skills): add skill-comply — automated behavioral compliance measurement (#724)
* feat(skills): add skill-comply — automated behavioral compliance measurement Automated compliance measurement for skills, rules, and agent definitions. Generates behavioral specs, runs scenarios at 3 strictness levels, classifies tool calls via LLM, and produces self-contained reports. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(skill-comply): address bot review feedback - AGENTS.md: fix stale skill count (115 → 117) in project structure - run.py: replace remaining print() with logger, add zero-division guard, create parent dirs for --output path - runner.py: add returncode check for claude subprocess, clarify relative_to path traversal validation - parser.py: use is_file() instead of exists(), catch KeyError for missing trace fields, add file check in parse_spec - classifier.py: log warnings on malformed classification output, guard against non-dict JSON responses - grader.py: filter negative indices from LLM classification Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
24
skills/skill-comply/prompts/classifier.md
Normal file
24
skills/skill-comply/prompts/classifier.md
Normal file
@@ -0,0 +1,24 @@
|
||||
You are classifying tool calls from a coding agent session against expected behavioral steps.
|
||||
|
||||
For each tool call, determine which step (if any) it belongs to. A tool call can match at most one step.
|
||||
|
||||
Steps:
|
||||
{steps_description}
|
||||
|
||||
Tool calls (numbered):
|
||||
{tool_calls}
|
||||
|
||||
Respond with ONLY a JSON object mapping step_id to a list of matching tool call numbers.
|
||||
Include only steps that have at least one match. If no tool calls match a step, omit it.
|
||||
|
||||
Example response:
|
||||
{"write_test": [0, 1], "run_test_red": [2], "write_impl": [3, 4]}
|
||||
|
||||
Rules:
|
||||
- Match based on the MEANING of the tool call, not just keywords
|
||||
- A Write to "test_calculator.py" is a test file write, even if the content is implementation-like
|
||||
- A Write to "calculator.py" is an implementation write, even if it contains test helpers
|
||||
- A Bash running "pytest" that outputs "FAILED" is a RED phase test run
|
||||
- A Bash running "pytest" that outputs "passed" is a GREEN phase test run
|
||||
- Each tool call should match at most one step (pick the best match)
|
||||
- If a tool call doesn't match any step, don't include it
|
||||
Reference in New Issue
Block a user