mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-04-01 06:33:27 +08:00
* docs(zh-CN): sync Chinese docs with latest upstream changes * docs: improve Chinese translation consistency in go-test.md * docs(zh-CN): update image paths to use shared assets directory - Update image references from ./assets/ to ../../assets/ - Remove zh-CN/assets directory to use shared assets --------- Co-authored-by: neo <neo.dowithless@gmail.com>
166 lines
5.7 KiB
Markdown
166 lines
5.7 KiB
Markdown
---
|
||
name: nutrient-document-processing
|
||
description: 使用Nutrient DWS API处理、转换、OCR、提取、编辑、签署和填写文档。支持PDF、DOCX、XLSX、PPTX、HTML和图像文件。
|
||
---
|
||
|
||
# 文档处理
|
||
|
||
使用 [Nutrient DWS Processor API](https://www.nutrient.io/api/) 处理文档。转换格式、提取文本和表格、对扫描文档进行 OCR、编辑 PII、添加水印、数字签名以及填写 PDF 表单。
|
||
|
||
## 设置
|
||
|
||
在 **[nutrient.io](https://dashboard.nutrient.io/sign_up/?product=processor)** 获取一个免费的 API 密钥
|
||
|
||
```bash
|
||
export NUTRIENT_API_KEY="pdf_live_..."
|
||
```
|
||
|
||
所有请求都以 multipart POST 形式发送到 `https://api.nutrient.io/build`,并附带一个 `instructions` JSON 字段。
|
||
|
||
## 操作
|
||
|
||
### 转换文档
|
||
|
||
```bash
|
||
# DOCX to PDF
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "document.docx=@document.docx" \
|
||
-F 'instructions={"parts":[{"file":"document.docx"}]}' \
|
||
-o output.pdf
|
||
|
||
# PDF to DOCX
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "document.pdf=@document.pdf" \
|
||
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}' \
|
||
-o output.docx
|
||
|
||
# HTML to PDF
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "index.html=@index.html" \
|
||
-F 'instructions={"parts":[{"html":"index.html"}]}' \
|
||
-o output.pdf
|
||
```
|
||
|
||
支持的输入格式:PDF, DOCX, XLSX, PPTX, DOC, XLS, PPT, PPS, PPSX, ODT, RTF, HTML, JPG, PNG, TIFF, HEIC, GIF, WebP, SVG, TGA, EPS。
|
||
|
||
### 提取文本和数据
|
||
|
||
```bash
|
||
# Extract plain text
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "document.pdf=@document.pdf" \
|
||
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}' \
|
||
-o output.txt
|
||
|
||
# Extract tables as Excel
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "document.pdf=@document.pdf" \
|
||
-F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}' \
|
||
-o tables.xlsx
|
||
```
|
||
|
||
### OCR 扫描文档
|
||
|
||
```bash
|
||
# OCR to searchable PDF (supports 100+ languages)
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "scanned.pdf=@scanned.pdf" \
|
||
-F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}' \
|
||
-o searchable.pdf
|
||
```
|
||
|
||
支持语言:通过 ISO 639-2 代码支持 100 多种语言(例如,`eng`, `deu`, `fra`, `spa`, `jpn`, `kor`, `chi_sim`, `chi_tra`, `ara`, `hin`, `rus`)。完整的语言名称如 `english` 或 `german` 也适用。查看 [完整的 OCR 语言表](https://www.nutrient.io/guides/document-engine/ocr/language-support/) 以获取所有支持的代码。
|
||
|
||
### 编辑敏感信息
|
||
|
||
```bash
|
||
# Pattern-based (SSN, email)
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "document.pdf=@document.pdf" \
|
||
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"social-security-number"}},{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"email-address"}}]}' \
|
||
-o redacted.pdf
|
||
|
||
# Regex-based
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "document.pdf=@document.pdf" \
|
||
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","strategyOptions":{"regex":"\\b[A-Z]{2}\\d{6}\\b"}}]}' \
|
||
-o redacted.pdf
|
||
```
|
||
|
||
预设:`social-security-number`, `email-address`, `credit-card-number`, `international-phone-number`, `north-american-phone-number`, `date`, `time`, `url`, `ipv4`, `ipv6`, `mac-address`, `us-zip-code`, `vin`。
|
||
|
||
### 添加水印
|
||
|
||
```bash
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "document.pdf=@document.pdf" \
|
||
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":72,"opacity":0.3,"rotation":-45}]}' \
|
||
-o watermarked.pdf
|
||
```
|
||
|
||
### 数字签名
|
||
|
||
```bash
|
||
# Self-signed CMS signature
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "document.pdf=@document.pdf" \
|
||
-F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms"}]}' \
|
||
-o signed.pdf
|
||
```
|
||
|
||
### 填写 PDF 表单
|
||
|
||
```bash
|
||
curl -X POST https://api.nutrient.io/build \
|
||
-H "Authorization: Bearer $NUTRIENT_API_KEY" \
|
||
-F "form.pdf=@form.pdf" \
|
||
-F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","formFields":{"name":"Jane Smith","email":"jane@example.com","date":"2026-02-06"}}]}' \
|
||
-o filled.pdf
|
||
```
|
||
|
||
## MCP 服务器(替代方案)
|
||
|
||
对于原生工具集成,请使用 MCP 服务器代替 curl:
|
||
|
||
```json
|
||
{
|
||
"mcpServers": {
|
||
"nutrient-dws": {
|
||
"command": "npx",
|
||
"args": ["-y", "@nutrient-sdk/dws-mcp-server"],
|
||
"env": {
|
||
"NUTRIENT_DWS_API_KEY": "YOUR_API_KEY",
|
||
"SANDBOX_PATH": "/path/to/working/directory"
|
||
}
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
## 使用场景
|
||
|
||
* 在格式之间转换文档(PDF, DOCX, XLSX, PPTX, HTML, 图像)
|
||
* 从 PDF 中提取文本、表格或键值对
|
||
* 对扫描文档或图像进行 OCR
|
||
* 在共享文档前编辑 PII
|
||
* 为草稿或机密文档添加水印
|
||
* 数字签署合同或协议
|
||
* 以编程方式填写 PDF 表单
|
||
|
||
## 链接
|
||
|
||
* [API 演练场](https://dashboard.nutrient.io/processor-api/playground/)
|
||
* [完整 API 文档](https://www.nutrient.io/guides/dws-processor/)
|
||
* [代理技能仓库](https://github.com/PSPDFKit-labs/nutrient-agent-skill)
|
||
* [npm MCP 服务器](https://www.npmjs.com/package/@nutrient-sdk/dws-mcp-server)
|