docs: record AgentShield fleet routing evidence

This commit is contained in:
Affaan Mustafa
2026-05-16 01:24:20 -04:00
parent 1c5c5d2389
commit cc83a85eb8
7 changed files with 38 additions and 19 deletions

View File

@@ -37,8 +37,9 @@ As of 2026-05-16:
CI install hardening, GitHub Actions cache purge, AgentShield #85
registry-signature verification, AgentShield #86 evidence-pack CI provenance,
AgentShield #87 plugin-cache runtime-confidence classification, AgentShield
#88 evidence-pack inspect/readback, ECC-Tools #75 billing-gate tightening,
PR #1947 supply-chain protection, and May 16 release-evidence refresh.
#88 evidence-pack inspect/readback, AgentShield #89 evidence-pack fleet
routing, ECC-Tools #75 billing-gate tightening, PR #1947 supply-chain
protection, and May 16 release-evidence refresh.
- `npm run harness:audit -- --format json` reports 70/70 on current `main`.
- `npm run observability:ready` reports 21/21 readiness on current `main`,
including the GitHub/Linear/handoff/roadmap progress-sync contract.
@@ -100,6 +101,12 @@ As of 2026-05-16:
finding counts, runtime confidence, policy, baseline, supply-chain, CI
context, remediation phases, and malformed artifact errors without manually
opening every bundle file.
- AgentShield PR #89 merged as `521ada9091bb6d818511ab8589ae675b920c106a`
and added `agentshield evidence-pack fleet <dirs...> [--json]` for
downstream fleet routing. Multiple verified evidence packs now aggregate into
ready, security-blocker, policy-review, baseline-regression,
supply-chain-review, and invalid routes with finding, policy, baseline,
supply-chain, and remediation totals.
- JARVIS PR #13 merged as `127efabbfb5033ae53d7a53e1546aa3c33d6f962`
and hardened CI/deploy workflows with npm registry signature verification,
disabled persisted checkout credentials in write-permission jobs, and pinned
@@ -520,11 +527,11 @@ is not complete unless the evidence column exists and has been freshly verified.
| Naming and rename readiness | Naming matrix across package/plugin/docs/social surfaces | `docs/releases/2.0.0-rc.1/naming-and-publication-matrix.md` records current package, repo, Claude plugin, Codex plugin, OpenCode, and npm availability evidence | Complete for rc.1; post-rc rename remains future work |
| Claude and Codex plugin publication | Contact/submission path with required artifacts and status | Publication readiness, naming matrix, and May 12 dry-run evidence document plugin validation, clean-checkout Claude tag/install smoke, and Codex marketplace CLI shape | Needs explicit approval for real tag/push and marketplace submission |
| Articles, tweets, and announcements | X thread, LinkedIn copy, GitHub release copy, push checklist | Draft launch collateral exists under rc.1 release docs | Needs URL-backed refresh |
| AgentShield enterprise iteration | Policy gates, SARIF, packs, provenance, corpus, HTML reports, exception lifecycle audit, baseline drift Action/CLI surfaces, evidence-pack redaction, harness adapter registry, enterprise research roadmap, supply-chain hardened release path, CI-safe baseline fingerprints, corpus accuracy recommendations, remediation workflow phases, env proxy hijack corpus coverage, Mini Shai-Hulud full-campaign package IOCs, CI-provenance evidence packs, plugin-cache runtime-confidence triage, and evidence-pack consumer readback | PRs #53, #55-#64, #67-#69, and #78-#88 landed with test evidence; native PDF export deferred in favor of self-contained HTML plus print-to-PDF until explicit enterprise demand appears; `docs/architecture/agentshield-enterprise-research-roadmap.md` now has baseline drift, evidence-pack bundle, redaction, adapter-registry, supply-chain hardening, hashed baseline fingerprints, corpus accuracy recommendation, remediation workflow, env proxy hijack corpus, Mini Shai-Hulud full-campaign package-table, `ci-context.json` provenance, `plugin-cache` confidence, and `evidence-pack inspect` readback slices landed | Next cross-harness depth and fleet routing |
| AgentShield enterprise iteration | Policy gates, SARIF, packs, provenance, corpus, HTML reports, exception lifecycle audit, baseline drift Action/CLI surfaces, evidence-pack redaction, harness adapter registry, enterprise research roadmap, supply-chain hardened release path, CI-safe baseline fingerprints, corpus accuracy recommendations, remediation workflow phases, env proxy hijack corpus coverage, Mini Shai-Hulud full-campaign package IOCs, CI-provenance evidence packs, plugin-cache runtime-confidence triage, evidence-pack consumer readback, and fleet-level evidence-pack routing | PRs #53, #55-#64, #67-#69, and #78-#89 landed with test evidence; native PDF export deferred in favor of self-contained HTML plus print-to-PDF until explicit enterprise demand appears; `docs/architecture/agentshield-enterprise-research-roadmap.md` now has baseline drift, evidence-pack bundle, redaction, adapter-registry, supply-chain hardening, hashed baseline fingerprints, corpus accuracy recommendation, remediation workflow, env proxy hijack corpus, Mini Shai-Hulud full-campaign package-table, `ci-context.json` provenance, `plugin-cache` confidence, `evidence-pack inspect` readback, and `evidence-pack fleet` routing slices landed | Next ECC-Tools fleet-summary consumption and cross-harness policy integration |
| ECC Tools next-level app | Billing audit, PR checks, deep analyzer, sync backlog, evaluator/RAG corpus, analysis-depth readiness, hosted execution planning, hosted CI diagnostics, hosted security evidence review, hosted harness compatibility audit, hosted reference-set evaluation, hosted AI routing/cost review, hosted team backlog routing, hosted depth-plan check-run, PR-comment hosted job dispatch, hosted job result history/check-runs, hosted result status command, status-aware depth-plan recommendations, hosted promotion readiness, hosted promotion output scoring, hosted promotion retrieval planning, hosted promotion judge contract, gated hosted promotion judge execution, payment-announcement readiness | PRs #26-#43 plus #53-#74 landed with test evidence, including AgentShield evidence-pack gap routing, canonical bundle recognition, supply-chain signature gates, PR draft follow-up Linear tracking, evidence-backed/deep-ready repository classification, the `/api/analysis/depth-plan` hosted job plan, `/api/analysis/jobs/ci-diagnostics`, `/api/analysis/jobs/security-evidence-review`, `/api/analysis/jobs/harness-compatibility-audit`, `/api/analysis/jobs/reference-set-evaluation`, `/api/analysis/jobs/ai-routing-cost-review`, `/api/analysis/jobs/team-backlog-routing`, the `ECC Tools / Hosted Depth Plan` check-run, `/ecc-tools analyze --job ...` PR-comment dispatch, non-blocking per-hosted-job result check-runs backed by 30-day result cache records, `/ecc-tools analyze --job status` cache lookup, cache-aware next-job recommendations in the depth-plan check-run, the `ECC Tools / Hosted Promotion Readiness` corpus-backed PR check-run, deterministic hosted-output scoring against cached completed job artifacts/findings, ranked retrieval/model-prompt planning, the fail-closed `hosted-promotion-judge.v1` request contract, opt-in live model-judge execution behind hosted evidence, entitlement, budget, provider, executor, strict JSON, and citation gates, a fail-closed `/api/billing/readiness` `announcementGate` for native GitHub payments claims, and `npm run billing:announcement-gate` as the non-secret operator verifier | Next work is hosted promotion telemetry, operator review UX, and live Marketplace test-account readback |
| GitGuardian/Dependabot/CodeRabbit-style checks | Non-blocking taxonomy, deterministic follow-up checks, and local supply-chain gates | ECC-Tools risk taxonomy check plus follow-up signals landed, including Skill Quality, Deep Analyzer Evidence, Analyzer Corpus Evidence, RAG/Evaluator Evidence, PR Review/Salvage Evidence, and AgentShield evidence-pack evidence; #1846 added npm registry signature gates; #1848 added the supply-chain incident-response playbook and `pull_request_target` cache-poisoning validator guard; #1851 added the privileged checkout credential-persistence guard; AgentShield #78, JARVIS #13, and ECC-Tools #53 applied the same hardening outside trunk | Current supply-chain gate complete; deeper hosted review features remain future |
| Harness-agnostic learning system | Audit, adapter matrix, observability, traces, promotion loop | Audit/adapters/observability gates plus `docs/architecture/evaluator-rag-prototype.md`, `examples/evaluator-rag-prototype/`, and ECC-Tools PR #40 define read-only stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, AgentShield policy-exception, skill-quality evidence, deep-analyzer evidence, and RAG/evaluator comparison scenarios with trace, report, playbook, verifier, and predictive-check artifacts; ECC-Tools PRs #68-#72 now turn that corpus into a deterministic PR check-run gate with cached hosted-output scoring, ranked retrieval candidates, a model prompt seed, a fail-closed hosted model-judge request contract, and opt-in live model execution behind strict hosted-evidence gates | Deterministic hosted PR check, cached output scoring, retrieval planning, judge contract, and gated model execution integrated |
| Linear roadmap is detailed | Linear project status plus repo mirror | Repo mirror exists; issue creation was retried on 2026-05-12 and remains blocked by the workspace free issue limit; this May 16 sync adds ECC #1860, AgentShield #78-#88, JARVIS #13, ECC-Tools #53-#74, resolved queue/discussion counts, and a generated `operator:dashboard` prompt-to-artifact audit for recurring status updates | Needs recurring status updates after each significant merge batch |
| Linear roadmap is detailed | Linear project status plus repo mirror | Repo mirror exists; issue creation was retried on 2026-05-12 and remains blocked by the workspace free issue limit; this May 16 sync adds ECC #1860, AgentShield #78-#89, JARVIS #13, ECC-Tools #53-#74, resolved queue/discussion counts, and a generated `operator:dashboard` prompt-to-artifact audit for recurring status updates | Needs recurring status updates after each significant merge batch |
| Flow separation and progress tracking | Flow lanes with owner artifacts and update cadence | This roadmap defines lanes below and `docs/architecture/progress-sync-contract.md` makes GitHub/Linear/handoff/roadmap sync part of the readiness gate | Active |
| Realtime Linear sync | Project updates while issue limit is blocked; issues later | ECC-Tools #39 implements opt-in Linear API sync for deferred follow-up backlog items, and ECC-Tools #54 adds copy-ready PR drafts to that backlog when draft PR shells are not opened; `docs/architecture/progress-sync-contract.md` defines the local file-backed realtime boundary while issue capacity is blocked | Needs workspace capacity/config rollout |
| Observability for self-use | Local readiness gate, traces, status snapshots, HUD/status contract, risk ledger, progress-sync contract | `npm run observability:ready` reports 21/21 | Complete for local gate |
@@ -544,7 +551,7 @@ repo evidence and merge commits.
| Release and publication | rc.1 release docs, publication readiness doc | Naming matrix and plugin submission/contact checklist | Before any tag |
| Harness OS core | Audit, adapter matrix, observability docs, `ecc2/` | HUD/session-control acceptance spec | Weekly until GA |
| Evaluation and RAG | Reference-set validation, harness audit, traces, ECC-Tools corpus | Read-only evaluator/RAG prototype plus stale-salvage, billing-readiness, CI-failure-diagnosis, harness-config-quality, AgentShield policy-exception, skill-quality evidence, deep-analyzer evidence, and RAG/evaluator comparison fixtures; ECC-Tools #68 publishes the corpus as a hosted promotion readiness check-run, #69 scores cached hosted job outputs against the same corpus, #70 emits ranked retrieval candidates plus a model prompt seed, #71 adds a fail-closed hosted model-judge request contract, and #72 executes that judge only when explicitly enabled and backed by hosted retrieval citations | Hosted promotion telemetry and operator review UX |
| AgentShield enterprise | AgentShield PR evidence and roadmap notes | Cross-harness depth and fleet routing after evidence-pack inspect/readback shipped in #88 | Next implementation batch |
| AgentShield enterprise | AgentShield PR evidence and roadmap notes | Fleet routing landed in #89 after evidence-pack inspect/readback shipped in #88 | ECC-Tools fleet-summary consumption and cross-harness policy integration |
| ECC Tools app | ECC-Tools PR evidence, billing audit, risk taxonomy, evaluator/RAG corpus | ECC-Tools #53 published the supply-chain workflow hardening branch, #54 tracks copy-ready PR drafts in the Linear/project backlog, #55 classifies analysis-depth readiness, #56 exposes the hosted execution plan, #57 executes the first hosted CI diagnostics job, #58 executes the hosted security evidence review job, #59 executes the hosted harness compatibility audit, #60 executes the hosted reference-set evaluation, #61 executes the hosted AI routing/cost review, #62 executes hosted team backlog routing, #63 publishes the hosted depth-plan check-run, #64 dispatches hosted jobs from PR comments, #65 persists hosted result history/check-runs, #66 exposes hosted job status from PR comments, #67 makes depth-plan recommendations cache-aware, #68 publishes hosted promotion readiness from the evaluator/RAG corpus, #69 scores cached hosted job outputs against that corpus, #70 emits ranked retrieval candidates plus a model prompt seed, #71 emits the gated `hosted-promotion-judge.v1` contract without live model calls, #72 adds opt-in live model-judge execution behind hosted-evidence and strict JSON/citation gates, #73 adds a fail-closed native-payments `announcementGate` to billing readiness, and #74 adds `npm run billing:announcement-gate` for operator verification | Live Marketplace test-account readback and hosted promotion telemetry |
| Linear progress | Linear project status updates, `docs/architecture/progress-sync-contract.md`, generated `operator:dashboard` output, and this mirror | Status update with queue/evidence/missing gates | Every significant merge batch |
@@ -766,8 +773,10 @@ Acceptance:
packs; PR #87 classified installed Claude plugin caches separately from
active top-level runtime config, including cached hook implementations; PR
#88 added `agentshield evidence-pack inspect` JSON/text readback for
downstream consumers; and ECC-Tools PRs #42/#43 now route and recognize
evidence packs. The next slice is cross-harness depth and fleet routing.
downstream consumers; PR #89 added `agentshield evidence-pack fleet`
summary/routing across multiple inspected bundles; and ECC-Tools PRs #42/#43
now route and recognize evidence packs. The next slice is ECC-Tools
fleet-summary consumption and cross-harness policy integration.
2. Run ECC-Tools `/api/billing/readiness` against a Marketplace-managed test
account and require `announcementGate.ready === true` before any native
GitHub payments announcement.

View File

@@ -1,6 +1,6 @@
# AgentShield Enterprise Research Roadmap
Generated: 2026-05-12; refreshed with May 16 AgentShield PR #87 and #88 evidence.
Generated: 2026-05-12; refreshed with May 16 AgentShield PR #87, #88, and #89 evidence.
This is a planning artifact for the next AgentShield enterprise iteration. It
does not modify AgentShield code. The goal is to turn the current scanner,
@@ -91,6 +91,10 @@ AgentShield is already more than a static lint tool:
JSON/text summaries for report score, finding counts, runtime confidence,
policy, baseline, supply-chain, CI context, remediation, and malformed
artifact errors.
- Fleet-level evidence-pack consumption now has a local routing primitive:
`agentshield evidence-pack fleet <dirs...> [--json]` aggregates multiple
inspected bundles into ready, security-blocker, policy-review,
baseline-regression, supply-chain-review, and invalid routes.
May 16 update: AgentShield PR #87 merged as
`26bb44650663816d07180e0d20c1895e431a326c`. It classifies installed Claude
@@ -103,10 +107,15 @@ AgentShield PR #88 merged as
`agentshield evidence-pack inspect <dir> [--json]`, validates the bundle before
readback, summarizes every consumer-facing evidence artifact, and keeps
malformed-but-valid JSON artifacts from crashing inspection.
AgentShield PR #89 merged as
`521ada9091bb6d818511ab8589ae675b920c106a`. It adds
`agentshield evidence-pack fleet <dirs...> [--json]`, verifies each pack through
the inspect path, aggregates finding, policy, baseline, supply-chain, and
remediation totals, and assigns each pack to a deterministic fleet route.
The next iteration should not be "add more regex rules" by default. The higher
leverage move is to make AgentShield remember, compare, route, and enforce
security posture across time, repos, teams, and harnesses.
The next iteration after fleet routing should not be "add more regex rules" by
default. The higher leverage move is to wire the new fleet summaries into
ECC-Tools follow-up routing and cross-harness policy integration.
## Enterprise Gaps

View File

@@ -21,7 +21,7 @@ surfaces, or posting announcements.
| `docs/releases/2.0.0-rc.1/launch-checklist.md` | Operator launch checklist | Must remain approval-gated for release, package, plugin, and announcement actions |
| `docs/releases/2.0.0-rc.1/publication-readiness.md` | Release gate | Requires fresh evidence from the exact release commit |
| `docs/releases/2.0.0-rc.1/publication-evidence-2026-05-15.md` | Current May 15 queue, roadmap, security, supply-chain watch, no-lifecycle CI install hardening, AgentShield #86 evidence-pack provenance, ECC Tools billing-gate, Actions cache purge, and `ecc2` test evidence through PR #1941 | Must be superseded by a final clean-checkout evidence file before real publication |
| `docs/releases/2.0.0-rc.1/publication-evidence-2026-05-16.md` | Current May 16 queue cleanup, recsys skill merge, GateGuard triage, PR #1947 supply-chain protection, AgentShield #87 plugin-cache confidence evidence, AgentShield #88 evidence-pack inspect/readback, dashboard refresh, and combined Node/Rust/release-surface gate evidence through `6bced468` | Must still be repeated from a strict clean checkout before real publication |
| `docs/releases/2.0.0-rc.1/publication-evidence-2026-05-16.md` | Current May 16 queue cleanup, recsys skill merge, GateGuard triage, PR #1947 supply-chain protection, AgentShield #87 plugin-cache confidence evidence, AgentShield #88 evidence-pack inspect/readback, AgentShield #89 evidence-pack fleet routing, dashboard refresh, and combined Node/Rust/release-surface gate evidence through `6bced468` | Must still be repeated from a strict clean checkout before real publication |
| `docs/releases/2.0.0-rc.1/naming-and-publication-matrix.md` | Naming, slug, and publication-path decision record | Keeps `Everything Claude Code / ECC`, npm `ecc-universal`, and plugin slug `ecc` for rc.1 |
| `docs/releases/2.0.0-rc.1/x-thread.md` | X launch draft | Must replace placeholders with live URLs after release/package/plugin publication |
| `docs/releases/2.0.0-rc.1/linkedin-post.md` | LinkedIn launch draft | Must replace placeholders with live URLs after release/package/plugin publication |

View File

@@ -9,7 +9,7 @@ npm publication, plugin tag, marketplace submission, or announcement post.
| --- | --- |
| Upstream main | `6bced468d76b269243a6f0bd28472853aa78e0e4` |
| Git remote | `https://github.com/affaan-m/everything-claude-code.git` |
| Evidence scope | Current `main` after PR #1944, PR #1945, issue #1946 triage, PR #1947 supply-chain protection, AgentShield PR #87, AgentShield PR #88, ITO-57 sync, and operator dashboard refresh |
| Evidence scope | Current `main` after PR #1944, PR #1945, issue #1946 triage, PR #1947 supply-chain protection, AgentShield PR #87, AgentShield PR #88, AgentShield PR #89, ITO-57 sync, and operator dashboard refresh |
| Local status caveat | `git status --short --branch` showed `## main...origin/main` plus unrelated untracked `docs/drafts/` |
The actual release operator should repeat all publish-facing checks from the
@@ -34,8 +34,9 @@ final release commit with a strictly clean checkout before publishing.
| PR #1947 | Merged scheduled supply-chain watch/advisory-source evidence as `4093d1bb7a14db1b4d4ea5bd00f2073baf94bfb0`; trunk now has the TanStack/Mini Shai-Hulud/node-ipc IOC scan plus advisory-source report surfaces wired into scheduled watch evidence |
| AgentShield PR #87 | Merged plugin-cache runtime-confidence classification as `26bb44650663816d07180e0d20c1895e431a326c`; installed Claude plugin cache findings now emit `runtimeConfidence: plugin-cache`, `plugins/cache` only maps to Claude cache under `.claude`, and cached hook implementations are no longer mislabeled as active `hook-code` |
| AgentShield PR #88 | Merged evidence-pack inspect/readback as `65ed6e2a87545dc99d962b58413f49096a4d70ec`; `agentshield evidence-pack inspect` now emits verified JSON/text summaries for report, policy, baseline, supply-chain, CI context, remediation, and malformed artifact errors |
| AgentShield PR #89 | Merged evidence-pack fleet routing as `521ada9091bb6d818511ab8589ae675b920c106a`; `agentshield evidence-pack fleet <dirs...> [--json]` now aggregates multiple verified bundles into ready, security-blocker, policy-review, baseline-regression, supply-chain-review, and invalid routes with finding, policy, baseline, supply-chain, and remediation totals |
| ITO-57 | Updated with PR #1947 advisory-source evidence, post-merge source refresh, IOC scan, npm audit/signature checks, and OpenAI app update caveat |
| ITO-49 | Updated with AgentShield PR #87 and #88 merge evidence, local test evidence, CI status, live `~/.claude` scan classification counts, and local Mini Shai-Hulud protection scan results |
| ITO-49 | Updated with AgentShield PR #87, #88, and #89 merge evidence, local test evidence, CI status, live `~/.claude` scan classification counts, and local Mini Shai-Hulud protection scan results |
| ITO-44 | Updated with queue cleanup, dashboard refresh, and remaining macro gaps |
## Release Gate Commands

View File

@@ -22,8 +22,8 @@ refresh through PR #1941, see
[`publication-evidence-2026-05-15.md`](publication-evidence-2026-05-15.md).
For the May 16 queue cleanup, recsys skill merge, GateGuard issue triage,
AgentShield #87 plugin-cache runtime-confidence evidence, AgentShield #88
evidence-pack inspect/readback, operator dashboard refresh, and combined
final-gate rerun on current `main`, see
evidence-pack inspect/readback, AgentShield #89 evidence-pack fleet routing,
operator dashboard refresh, and combined final-gate rerun on current `main`, see
[`publication-evidence-2026-05-16.md`](publication-evidence-2026-05-16.md).
For the operator-facing prompt-to-artifact readiness dashboard from the same
May 16 pass, see