everything-claude-code/skills/skill-comply/prompts/classifier.md at 142600a2fca60caf39d7a4f12ff0ebfbfdaea529

mirror of https://github.com/affaan-m/everything-claude-code.git synced 2026-04-01 14:43:28 +08:00

Files

Shimo a2e465c74d feat(skills): add skill-comply — automated behavioral compliance measurement (#724 )

* feat(skills): add skill-comply — automated behavioral compliance measurement

Automated compliance measurement for skills, rules, and agent definitions.
Generates behavioral specs, runs scenarios at 3 strictness levels,
classifies tool calls via LLM, and produces self-contained reports.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(skill-comply): address bot review feedback

- AGENTS.md: fix stale skill count (115 → 117) in project structure
- run.py: replace remaining print() with logger, add zero-division guard,
  create parent dirs for --output path
- runner.py: add returncode check for claude subprocess, clarify
  relative_to path traversal validation
- parser.py: use is_file() instead of exists(), catch KeyError for
  missing trace fields, add file check in parse_spec
- classifier.py: log warnings on malformed classification output,
  guard against non-dict JSON responses
- grader.py: filter negative indices from LLM classification

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-22 21:51:49 -07:00

1.0 KiB

Raw Blame History

You are classifying tool calls from a coding agent session against expected behavioral steps.

For each tool call, determine which step (if any) it belongs to. A tool call can match at most one step.

Steps: {steps_description}

Tool calls (numbered): {tool_calls}

Respond with ONLY a JSON object mapping step_id to a list of matching tool call numbers. Include only steps that have at least one match. If no tool calls match a step, omit it.

Example response: {"write_test": [0, 1], "run_test_red": [2], "write_impl": [3, 4]}

Rules:

Match based on the MEANING of the tool call, not just keywords
A Write to "test_calculator.py" is a test file write, even if the content is implementation-like
A Write to "calculator.py" is an implementation write, even if it contains test helpers
A Bash running "pytest" that outputs "FAILED" is a RED phase test run
A Bash running "pytest" that outputs "passed" is a GREEN phase test run
Each tool call should match at most one step (pick the best match)
If a tool call doesn't match any step, don't include it

1.0 KiB Raw Blame History

1.0 KiB

Raw Blame History