feat: deliver v1.8.0 harness reliability and parity updates

This commit is contained in:
Affaan Mustafa
2026-03-04 14:48:06 -08:00
parent 32e9c293f0
commit 48b883d741
84 changed files with 2990 additions and 725 deletions

View File

@@ -0,0 +1,50 @@
---
name: enterprise-agent-ops
description: Operate long-lived agent workloads with observability, security boundaries, and lifecycle management.
origin: ECC
---
# Enterprise Agent Ops
Use this skill for cloud-hosted or continuously running agent systems that need operational controls beyond single CLI sessions.
## Operational Domains
1. runtime lifecycle (start, pause, stop, restart)
2. observability (logs, metrics, traces)
3. safety controls (scopes, permissions, kill switches)
4. change management (rollout, rollback, audit)
## Baseline Controls
- immutable deployment artifacts
- least-privilege credentials
- environment-level secret injection
- hard timeout and retry budgets
- audit log for high-risk actions
## Metrics to Track
- success rate
- mean retries per task
- time to recovery
- cost per successful task
- failure class distribution
## Incident Pattern
When failure spikes:
1. freeze new rollout
2. capture representative traces
3. isolate failing route
4. patch with smallest safe change
5. run regression + security checks
6. resume gradually
## Deployment Integrations
This skill pairs with:
- PM2 workflows
- systemd services
- container orchestrators
- CI/CD gates