mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-04-16 23:23:29 +08:00
feat: deliver v1.8.0 harness reliability and parity updates
This commit is contained in:
50
skills/enterprise-agent-ops/SKILL.md
Normal file
50
skills/enterprise-agent-ops/SKILL.md
Normal file
@@ -0,0 +1,50 @@
|
||||
---
|
||||
name: enterprise-agent-ops
|
||||
description: Operate long-lived agent workloads with observability, security boundaries, and lifecycle management.
|
||||
origin: ECC
|
||||
---
|
||||
|
||||
# Enterprise Agent Ops
|
||||
|
||||
Use this skill for cloud-hosted or continuously running agent systems that need operational controls beyond single CLI sessions.
|
||||
|
||||
## Operational Domains
|
||||
|
||||
1. runtime lifecycle (start, pause, stop, restart)
|
||||
2. observability (logs, metrics, traces)
|
||||
3. safety controls (scopes, permissions, kill switches)
|
||||
4. change management (rollout, rollback, audit)
|
||||
|
||||
## Baseline Controls
|
||||
|
||||
- immutable deployment artifacts
|
||||
- least-privilege credentials
|
||||
- environment-level secret injection
|
||||
- hard timeout and retry budgets
|
||||
- audit log for high-risk actions
|
||||
|
||||
## Metrics to Track
|
||||
|
||||
- success rate
|
||||
- mean retries per task
|
||||
- time to recovery
|
||||
- cost per successful task
|
||||
- failure class distribution
|
||||
|
||||
## Incident Pattern
|
||||
|
||||
When failure spikes:
|
||||
1. freeze new rollout
|
||||
2. capture representative traces
|
||||
3. isolate failing route
|
||||
4. patch with smallest safe change
|
||||
5. run regression + security checks
|
||||
6. resume gradually
|
||||
|
||||
## Deployment Integrations
|
||||
|
||||
This skill pairs with:
|
||||
- PM2 workflows
|
||||
- systemd services
|
||||
- container orchestrators
|
||||
- CI/CD gates
|
||||
Reference in New Issue
Block a user