docs(zh-CN): sync Chinese docs with latest upstream changes (#341)

* docs(zh-CN): sync Chinese docs with latest upstream changes * docs(zh-CN): update link --------- Co-authored-by: neo <neo.dowithless@gmail.com>
2026-04-17 15:43:30 +08:00 · 2026-03-08 06:48:02 +08:00
parent da17d33ac3
commit abcf38b085
53 changed files with 2977 additions and 610 deletions
--- a/docs/zh-CN/skills/eval-harness/SKILL.md
+++ b/docs/zh-CN/skills/eval-harness/SKILL.md
@@ -267,3 +267,38 @@ npm test -- --testPathPattern="existing"
 状态：可以发布

 ```
+
+## 产品评估 (v1.8)
+
+当单元测试无法单独捕获行为质量时，使用产品评估。
+
+### 评分器类型
+
+1. 代码评分器（确定性断言）
+2. 规则评分器（正则表达式/模式约束）
+3. 模型评分器（LLM 作为评判者的评估准则）
+4. 人工评分器（针对模糊输出的人工裁定）
+
+### pass@k 指南
+
+* `pass@1`：直接可靠性
+* `pass@3`：受控重试下的实际可靠性
+* `pass^3`：稳定性测试（所有 3 次运行必须通过）
+
+推荐阈值：
+
+* 能力评估：pass@3 >= 0.90
+* 回归评估：对于发布关键路径，pass^3 = 1.00
+
+### 评估反模式
+
+* 将提示过度拟合到已知的评估示例
+* 仅测量正常路径输出
+* 在追求通过率时忽略成本和延迟漂移
+* 在发布关卡中允许不稳定的评分器
+
+### 最小评估工件布局
+
+* `.claude/evals/<feature>.md` 定义
+* `.claude/evals/<feature>.log` 运行历史
+* `docs/releases/<version>/eval-summary.md` 发布快照