9 new test cases pin down the two previous commits' denylist
extensions. Each verifies both detection (validator exit non-zero +
the expected `dangerous-invisible U+<HEX>` line on stderr) and,
where applicable, `--write` sanitization.
Coverage:
Tag block (commit 1):
- U+E0041 TAG LATIN CAPITAL LETTER A — the range's printable ASCII
shadow; this is the byte sequence demonstrated in published ASCII
smuggling proofs of concept.
- U+E007F CANCEL TAG — the range end.
Other invisibles (commit 2):
- U+180E MONGOLIAN VOWEL SEPARATOR
- U+115F HANGUL CHOSEONG FILLER
- U+1160 HANGUL JUNGSEONG FILLER
- U+2061 FUNCTION APPLICATION (range start)
- U+2064 INVISIBLE PLUS (range end)
- U+3164 HANGUL FILLER
Detection table is data-driven (one loop, one assertion per row) so
adding the next invisible to the denylist also gets a paired
regression test by simply appending to NEWLY_COVERED_RANGES.
Plus a `--write` integration test:
- writes a markdown file containing both Tag block (5 chars) and
U+180E, runs `--write`, asserts both removed and surrounding text
preserved character-for-character ('# Title\n\nBenigntext.\n').
- re-runs the validator without `--write` and asserts exit 0,
confirming the sanitizer's output is idempotent under the
extended denylist.
Test count: 5 → 14 in this file; full `yarn test` green; `yarn lint`
clean.
Extend `isDangerousInvisibleCodePoint` with five additional code
points / ranges that are routinely cited in invisible-character
smuggling references but were not in the previous denylist:
- **U+180E** MONGOLIAN VOWEL SEPARATOR. Formerly classified as a
space separator (Zs) until Unicode 6.3 reclassified it as Cf
(Format control). Renders as zero-width; widely abused for
homograph attacks and prompt smuggling.
- **U+115F** HANGUL CHOSEONG FILLER and **U+1160** HANGUL JUNGSEONG
FILLER. Zero-width fillers used in Korean text shaping. Both are
cited as common LLM-injection vectors in Korean / multilingual
threat models.
- **U+2061–U+2064** invisible math operators (FUNCTION APPLICATION,
INVISIBLE TIMES, INVISIBLE SEPARATOR, INVISIBLE PLUS). Zero-width
and only meaningful inside math typesetting. No legitimate
Markdown or source code uses them.
- **U+3164** HANGUL FILLER. Reported in real-world Discord and
Twitter smuggling incidents; not used in legitimate Korean text.
Reproduced before this commit: a file containing any one of these
code points passed `check-unicode-safety.js` silently.
After this commit each one is reported as
`dangerous-invisible U+<HEX>` and `--write` mode strips it.
Verified by writing 8 single-character probe files
(`probe-0x180E.md`, `probe-0x115F.md`, …) and confirming exit=1 with
each violation listed.
ECC repo self-scan reports only the pre-existing `U+2605` BLACK
STAR warnings (unchanged) and exits with the same status (no new
in-repo violations introduced). Existing 5 unicode-safety tests
still pass; `yarn lint` clean.
Regression coverage for both the previous commit's Tag block fix
and this commit's additions lands in the next commit.
`isDangerousInvisibleCodePoint` enumerated seven ranges of invisible/
bidi/variation-selector code points but omitted the Unicode Tag block
(U+E0000–U+E007F). Tag characters were proposed for language tagging
in Unicode 3.1 and have been deprecated since Unicode 5.1, so no
legitimate text uses them. They are the canonical vector for
"ASCII Smuggling" / "Tag Smuggling" LLM prompt injection: an attacker
hides instructions inside an ASCII-looking string, the model reads
the tag bytes, the human reviewer sees nothing. Demonstrated against
multiple LLM assistants during 2024–2025.
`check-unicode-safety.js` is the repo's last line of defence before
contributor content reaches agent context; the same script also runs
in `--write` auto-sanitize mode on `.md` / `.mdx` / `.txt`. Today it
silently passes tag-block characters through unchanged in both
detection mode and `--write` mode.
Reproduced before this commit:
$ mkdir -p /tmp/uni-test && node -e "
const fs = require('fs');
const hidden = [...Array(5)].map((_,i) =>
String.fromCodePoint(0xE0041 + i)).join('');
fs.writeFileSync('/tmp/uni-test/innocent.md',
'# Title\\n\\nBenign text' + hidden + ' more.\\n');"
$ ECC_UNICODE_SCAN_ROOT=/tmp/uni-test \
node scripts/ci/check-unicode-safety.js
Unicode safety check passed.
$ echo $?
0
Expected: tag-block characters reported as `dangerous-invisible`
violations (exit 1) and stripped under `--write`.
Actual: validator passes, `--write` leaves the bytes intact.
Fix: extend the denylist with one new range
`(codePoint >= 0xE0000 && codePoint <= 0xE007F)`. The change is
purely additive; the existing seven ranges are untouched.
After this commit the same reproduction returns:
$ ECC_UNICODE_SCAN_ROOT=/tmp/uni-test \
node scripts/ci/check-unicode-safety.js
Unicode safety violations detected:
innocent.md:3:12 dangerous-invisible U+E0041
innocent.md:3:14 dangerous-invisible U+E0042
innocent.md:3:16 dangerous-invisible U+E0043
innocent.md:3:18 dangerous-invisible U+E0044
innocent.md:3:20 dangerous-invisible U+E0045
exit=1
`--write` mode also strips the bytes (verified: file length 47 → 42
after sanitize, regex `/[\u{E0000}-\u{E007F}]/u` no longer matches).
Existing 5 unicode-safety tests still pass; `yarn lint` clean. The
ECC repo's own self-scan (`node scripts/ci/check-unicode-safety.js`
with no `ECC_UNICODE_SCAN_ROOT`) reports the same warnings as before
this commit and exits with the same status (no regressions on
in-repo content).
A handful of other widely-cited invisible code points are missing
from the denylist (`U+180E`, `U+115F`, `U+1160`, `U+2061–U+2064`,
`U+3164`); those are addressed in the next commit so each fix
remains independently reviewable. Regression coverage for both
fixes lands two commits later.
Adds the Blender motion state inspection skill with maintainer refinements for tools metadata, usage guidance, meter-scale threshold assumptions, and Blender interpreter notes.
The OpenAI-compatible API can return HTTP 200 with an empty choices list
or choices[0].message = None (content-filtered responses on Gemini,
overwhelmed Ollama instances). Without a guard, both sites raise an
unhandled IndexError or AttributeError crashing the provider.
Added guard in OpenAIProvider.generate() and AstraFlowProvider.generate().