Files
everything-claude-code/skills/token-budget-advisor/SKILL.md
Xabilimon 4da1fb388c Update skills/token-budget-advisor/SKILL.md
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-03-25 22:02:20 +01:00

122 lines
5.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
name: token-budget-advisor
description: >-
Intercepts the response flow to offer the user an informed choice about
how much depth/tokens to consume — BEFORE responding. Use this skill
when the user wants to control token consumption, adjust response depth,
choose between short/long answers, or optimize their prompt.
TRIGGER when: "token budget", "token count", "token usage", "token limit",
"respuesta corta vs larga", "cuántos tokens", "ahorrar tokens",
"responde al 50%", "dame la versión corta", "quiero controlar cuánto usas",
"75%", "100%", "at 25%", "at 50%", "at 75%", "at 100%", "exhaustive", or any variant where the user wants
to control length, depth, or token usage — even without mentioning tokens.
DO NOT TRIGGER when: user has already specified a level in the current
session (maintain it) or the request is clearly a one-word answer.
origin: community
---
# Token Budget Advisor (TBA)
Intercept the response flow to offer the user a choice about response depth **before** Claude answers.
## When to Use
- User wants to control how long or detailed a response is
- User mentions tokens, budget, depth, or response length
- User says "short version", "tldr", "brief", "al 25%", "exhaustive", etc.
- Any time the user wants to choose depth/detail level upfront
**Do not trigger** when: user already set a level this session (maintain it silently), or the answer is trivially one line.
## Workflow
### Step 1 — Estimate input tokens
Use the calibration tables below to estimate the prompt's token count mentally.
**Chars-per-token by content type:**
| Content type | Chars / Token |
|-------------------|---------------|
| English natural | ~4.0 |
| Spanish natural | ~3.5 |
| Code | ~3.0 |
| JSON | ~2.8 |
| Markdown | ~3.3 |
Formula: `input_tokens ≈ char_count / chars_per_token`
For mixed content, use the dominant type's ratio.
### Step 2 — Estimate response size by complexity
Classify the prompt, then apply the multiplier range to get the full response window:
| Complexity | Multiplier range | Example prompts |
|--------------|------------------|------------------------------------------------------|
| Simple | 3× 8× | "What is X?", yes/no, single fact |
| Medium | 8× 20× | "How does X work?" |
| Medium-High | 10× 25× | Code request with context |
| Complex | 15× 40× | Multi-part analysis, comparisons, architecture |
| Creative | 10× 30× | Stories, essays, narrative writing |
Response window = `input_tokens × mult_min` to `input_tokens × mult_max` (but dont exceed your models configured output-token limit).
### Step 3 — Present depth options
Present this block **before** answering, using the actual estimated numbers:
```
Analyzing your prompt...
Input: ~[N] tokens | Type: [type] | Complexity: [level] | Language: [lang]
Choose your depth level:
[1] Essential (25%) -> ~[tokens] Direct answer only, no preamble
[2] Moderate (50%) -> ~[tokens] Answer + context + 1 example
[3] Detailed (75%) -> ~[tokens] Full answer with alternatives
[4] Exhaustive (100%) -> ~[tokens] Everything, no limits
Which level? (1-4 or say "25%", "50%", "75%", "100%")
Precision: heuristic estimate ~85-90% accuracy (±15%).
```
Level token estimates (within the response window):
- 25% → `min + (max - min) × 0.25`
- 50% → `min + (max - min) × 0.50`
- 75% → `min + (max - min) × 0.75`
- 100% → `max`
### Step 4 — Respond at the chosen level
| Level | Target length | Include | Omit |
|------------------|---------------------|-----------------------------------------------------|---------------------------------------------------|
| 25% Essential | 2-4 sentences max | Direct answer, key conclusion | Context, examples, nuance, alternatives |
| 50% Moderate | 1-3 paragraphs | Answer + necessary context + 1 example | Deep analysis, edge cases, references |
| 75% Detailed | Structured response | Multiple examples, pros/cons, alternatives | Extreme edge cases, exhaustive references |
| 100% Exhaustive | No restriction | Everything — full analysis, all code, all perspectives | Nothing |
## Shortcuts — skip the question
If the user already signals a level, respond at that level immediately without asking:
| What they say | Level |
|----------------------------------------------------|-------|
| "25%" / "short" / "brief" / "tldr" / "one-liner" | 25% |
| "50%" / "moderate" / "normal" | 50% |
| "75%" / "detailed" / "thorough" / "complete" | 75% |
| "100%" / "exhaustive" / "everything" / "full" | 100% |
If the user set a level earlier in the session, **maintain it silently** for subsequent responses unless they change it.
## Precision note
This skill uses heuristic estimation — no real tokenizer. Accuracy ~85-90%, variance ±15%. Always show the disclaimer.
## Source
Standalone skill from [TBA — Token Budget Advisor for Claude Code](https://github.com/Xabilimon1/Token-Budget-Advisor-Claude-Code-).
Full version includes a Python estimator script for exact counts: `npx token-budget-advisor`.