From 11ad2a875f07de7604d5b5d9e5d1a47e8250a605 Mon Sep 17 00:00:00 2001 From: Jonathan Rhyne Date: Sun, 8 Feb 2026 19:12:08 -0500 Subject: [PATCH] feat: add nutrient-document-processing skill (#166) Add nutrient-document-processing skill for PDF conversion, OCR, redaction, signing, and form filling via Nutrient DWS API. --- skills/nutrient-document-processing/SKILL.md | 165 +++++++++++++++++++ 1 file changed, 165 insertions(+) create mode 100644 skills/nutrient-document-processing/SKILL.md diff --git a/skills/nutrient-document-processing/SKILL.md b/skills/nutrient-document-processing/SKILL.md new file mode 100644 index 00000000..eeb7a34c --- /dev/null +++ b/skills/nutrient-document-processing/SKILL.md @@ -0,0 +1,165 @@ +--- +name: nutrient-document-processing +description: Process, convert, OCR, extract, redact, sign, and fill documents using the Nutrient DWS API. Works with PDFs, DOCX, XLSX, PPTX, HTML, and images. +--- + +# Nutrient Document Processing + +Process documents with the [Nutrient DWS Processor API](https://www.nutrient.io/api/). Convert formats, extract text and tables, OCR scanned documents, redact PII, add watermarks, digitally sign, and fill PDF forms. + +## Setup + +Get a free API key at **https://dashboard.nutrient.io/sign_up/?product=processor** + +```bash +export NUTRIENT_API_KEY="pdf_live_..." +``` + +All requests go to `https://api.nutrient.io/build` as multipart POST with an `instructions` JSON field. + +## Operations + +### Convert Documents + +```bash +# DOCX to PDF +curl -X POST https://api.nutrient.io/build \ + -H "Authorization: Bearer $NUTRIENT_API_KEY" \ + -F "document.docx=@document.docx" \ + -F 'instructions={"parts":[{"file":"document.docx"}]}' \ + -o output.pdf + +# PDF to DOCX +curl -X POST https://api.nutrient.io/build \ + -H "Authorization: Bearer $NUTRIENT_API_KEY" \ + -F "document.pdf=@document.pdf" \ + -F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"docx"}}' \ + -o output.docx + +# HTML to PDF +curl -X POST https://api.nutrient.io/build \ + -H "Authorization: Bearer $NUTRIENT_API_KEY" \ + -F "index.html=@index.html" \ + -F 'instructions={"parts":[{"html":"index.html"}]}' \ + -o output.pdf +``` + +Supported inputs: PDF, DOCX, XLSX, PPTX, DOC, XLS, PPT, PPS, PPSX, ODT, RTF, HTML, JPG, PNG, TIFF, HEIC, GIF, WebP, SVG, TGA, EPS. + +### Extract Text and Data + +```bash +# Extract plain text +curl -X POST https://api.nutrient.io/build \ + -H "Authorization: Bearer $NUTRIENT_API_KEY" \ + -F "document.pdf=@document.pdf" \ + -F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"text"}}' \ + -o output.txt + +# Extract tables as Excel +curl -X POST https://api.nutrient.io/build \ + -H "Authorization: Bearer $NUTRIENT_API_KEY" \ + -F "document.pdf=@document.pdf" \ + -F 'instructions={"parts":[{"file":"document.pdf"}],"output":{"type":"xlsx"}}' \ + -o tables.xlsx +``` + +### OCR Scanned Documents + +```bash +# OCR to searchable PDF (supports 100+ languages) +curl -X POST https://api.nutrient.io/build \ + -H "Authorization: Bearer $NUTRIENT_API_KEY" \ + -F "scanned.pdf=@scanned.pdf" \ + -F 'instructions={"parts":[{"file":"scanned.pdf"}],"actions":[{"type":"ocr","language":"english"}]}' \ + -o searchable.pdf +``` + +Languages: Supports 100+ languages via ISO 639-2 codes (e.g., `eng`, `deu`, `fra`, `spa`, `jpn`, `kor`, `chi_sim`, `chi_tra`, `ara`, `hin`, `rus`). Full language names like `english` or `german` also work. See the [complete OCR language table](https://www.nutrient.io/guides/document-engine/ocr/language-support/) for all supported codes. + +### Redact Sensitive Information + +```bash +# Pattern-based (SSN, email) +curl -X POST https://api.nutrient.io/build \ + -H "Authorization: Bearer $NUTRIENT_API_KEY" \ + -F "document.pdf=@document.pdf" \ + -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"social-security-number"}},{"type":"redaction","strategy":"preset","strategyOptions":{"preset":"email-address"}}]}' \ + -o redacted.pdf + +# Regex-based +curl -X POST https://api.nutrient.io/build \ + -H "Authorization: Bearer $NUTRIENT_API_KEY" \ + -F "document.pdf=@document.pdf" \ + -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"redaction","strategy":"regex","strategyOptions":{"regex":"\\b[A-Z]{2}\\d{6}\\b"}}]}' \ + -o redacted.pdf +``` + +Presets: `social-security-number`, `email-address`, `credit-card-number`, `international-phone-number`, `north-american-phone-number`, `date`, `time`, `url`, `ipv4`, `ipv6`, `mac-address`, `us-zip-code`, `vin`. + +### Add Watermarks + +```bash +curl -X POST https://api.nutrient.io/build \ + -H "Authorization: Bearer $NUTRIENT_API_KEY" \ + -F "document.pdf=@document.pdf" \ + -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"watermark","text":"CONFIDENTIAL","fontSize":72,"opacity":0.3,"rotation":-45}]}' \ + -o watermarked.pdf +``` + +### Digital Signatures + +```bash +# Self-signed CMS signature +curl -X POST https://api.nutrient.io/build \ + -H "Authorization: Bearer $NUTRIENT_API_KEY" \ + -F "document.pdf=@document.pdf" \ + -F 'instructions={"parts":[{"file":"document.pdf"}],"actions":[{"type":"sign","signatureType":"cms"}]}' \ + -o signed.pdf +``` + +### Fill PDF Forms + +```bash +curl -X POST https://api.nutrient.io/build \ + -H "Authorization: Bearer $NUTRIENT_API_KEY" \ + -F "form.pdf=@form.pdf" \ + -F 'instructions={"parts":[{"file":"form.pdf"}],"actions":[{"type":"fillForm","formFields":{"name":"Jane Smith","email":"jane@example.com","date":"2026-02-06"}}]}' \ + -o filled.pdf +``` + +## MCP Server (Alternative) + +For native tool integration, use the MCP server instead of curl: + +```json +{ + "mcpServers": { + "nutrient-dws": { + "command": "npx", + "args": ["-y", "@nutrient-sdk/dws-mcp-server"], + "env": { + "NUTRIENT_DWS_API_KEY": "YOUR_API_KEY", + "SANDBOX_PATH": "/path/to/working/directory" + } + } + } +} +``` + +## When to Use + +- Converting documents between formats (PDF, DOCX, XLSX, PPTX, HTML, images) +- Extracting text, tables, or key-value pairs from PDFs +- OCR on scanned documents or images +- Redacting PII before sharing documents +- Adding watermarks to drafts or confidential documents +- Digitally signing contracts or agreements +- Filling PDF forms programmatically + +## Links + +- [API Playground](https://dashboard.nutrient.io/processor-api/playground/) +- [Full API Docs](https://www.nutrient.io/guides/dws-processor/) +- [Agent Skill Repo](https://github.com/PSPDFKit-labs/nutrient-agent-skill) +- [npm MCP Server](https://www.npmjs.com/package/@nutrient-sdk/dws-mcp-server)