Files
everything-claude-code/docs/zh-CN/skills/nutrient-document-processing/SKILL.md
zdoc.app 95f63c3cb0 docs(zh-CN): sync Chinese docs with latest upstream changes (#202)
* docs(zh-CN): sync Chinese docs with latest upstream changes

* docs: improve Chinese translation consistency in go-test.md

* docs(zh-CN): update image paths to use shared assets directory

- Update image references from ./assets/ to ../../assets/
- Remove zh-CN/assets directory to use shared assets

---------

Co-authored-by: neo <neo.dowithless@gmail.com>
2026-02-13 01:04:58 -08:00

166 lines
5.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: nutrient-document-processing
description: 使用Nutrient DWS API处理、转换、OCR、提取、编辑、签署和填写文档。支持PDF、DOCX、XLSX、PPTX、HTML和图像文件。
---
# 文档处理
使用 [Nutrient DWS Processor API](https://www.nutrient.io/api/) 处理文档。转换格式、提取文本和表格、对扫描文档进行 OCR、编辑 PII、添加水印、数字签名以及填写 PDF 表单。
## 设置
**[nutrient.io](https://dashboard.nutrient.io/sign_up/?product=processor)** 获取一个免费的 API 密钥
```bash
export NUTRIENT_API_KEY="pdf_live_..."
```
所有请求都以 multipart POST 形式发送到 `https://api.nutrient.io/build`,并附带一个 `instructions` JSON 字段。
## 操作
### 转换文档
```bash
# DOCX to PDF
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.docx=@document.docx" \
-F 'instructions={"parts":[{"file":"document.docx"}]}' \
-o output.pdf
# PDF to DOCX
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}' \
-o output.docx
# HTML to PDF
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "index.html=@index.html" \
-F 'instructions={"parts":[{"html":"index.html"}]}' \
-o output.pdf
```
支持的输入格式PDF, DOCX, XLSX, PPTX, DOC, XLS, PPT, PPS, PPSX, ODT, RTF, HTML, JPG, PNG, TIFF, HEIC, GIF, WebP, SVG, TGA, EPS。
### 提取文本和数据
```bash
# Extract plain text
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}' \
-o output.txt
# Extract tables as Excel
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}' \
-o tables.xlsx
```
### OCR 扫描文档
```bash
# OCR to searchable PDF (supports 100+ languages)
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "scanned.pdf=@scanned.pdf" \
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}' \
-o searchable.pdf
```
支持语言:通过 ISO 639-2 代码支持 100 多种语言(例如,`eng`, `deu`, `fra`, `spa`, `jpn`, `kor`, `chi_sim`, `chi_tra`, `ara`, `hin`, `rus`)。完整的语言名称如 `english``german` 也适用。查看 [完整的 OCR 语言表](https://www.nutrient.io/guides/document-engine/ocr/language-support/) 以获取所有支持的代码。
### 编辑敏感信息
```bash
# Pattern-based (SSN, email)
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"social-security-number"}},{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"email-address"}}]}' \
-o redacted.pdf
# Regex-based
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","strategyOptions":{"regex":"\\b[A-Z]{2}\\d{6}\\b"}}]}' \
-o redacted.pdf
```
预设:`social-security-number`, `email-address`, `credit-card-number`, `international-phone-number`, `north-american-phone-number`, `date`, `time`, `url`, `ipv4`, `ipv6`, `mac-address`, `us-zip-code`, `vin`
### 添加水印
```bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":72,"opacity":0.3,"rotation":-45}]}' \
-o watermarked.pdf
```
### 数字签名
```bash
# Self-signed CMS signature
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "document.pdf=@document.pdf" \
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms"}]}' \
-o signed.pdf
```
### 填写 PDF 表单
```bash
curl -X POST https://api.nutrient.io/build \
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
-F "form.pdf=@form.pdf" \
-F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","formFields":{"name":"Jane Smith","email":"jane@example.com","date":"2026-02-06"}}]}' \
-o filled.pdf
```
## MCP 服务器(替代方案)
对于原生工具集成,请使用 MCP 服务器代替 curl
```json
{
"mcpServers": {
"nutrient-dws": {
"command": "npx",
"args": ["-y", "@nutrient-sdk/dws-mcp-server"],
"env": {
"NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
"SANDBOX_PATH": "/path/to/working/directory"
}
}
}
}
```
## 使用场景
* 在格式之间转换文档PDF, DOCX, XLSX, PPTX, HTML, 图像)
* 从 PDF 中提取文本、表格或键值对
* 对扫描文档或图像进行 OCR
* 在共享文档前编辑 PII
* 为草稿或机密文档添加水印
* 数字签署合同或协议
* 以编程方式填写 PDF 表单
## 链接
* [API 演练场](https://dashboard.nutrient.io/processor-api/playground/)
* [完整 API 文档](https://www.nutrient.io/guides/dws-processor/)
* [代理技能仓库](https://github.com/PSPDFKit-labs/nutrient-agent-skill)
* [npm MCP 服务器](https://www.npmjs.com/package/@nutrient-sdk/dws-mcp-server)