Revert "feat(ecc): prune plugin 43→12 items, promote 7 rules to .claude/rules/ (#245)"

This reverts commit 1bd68ff534.
This commit is contained in:
Affaan Mustafa
2026-02-20 01:11:30 -08:00
parent 1bd68ff534
commit 0e9f613fd1
536 changed files with 111479 additions and 253 deletions

View File

@@ -0,0 +1,438 @@
---
name: clickhouse-io
description: ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
---
# ClickHouse Analytics Patterns
ClickHouse-specific patterns for high-performance analytics and data engineering.
## When to Activate
- Designing ClickHouse table schemas (MergeTree engine selection)
- Writing analytical queries (aggregations, window functions, joins)
- Optimizing query performance (partition pruning, projections, materialized views)
- Ingesting large volumes of data (batch inserts, Kafka integration)
- Migrating from PostgreSQL/MySQL to ClickHouse for analytics
- Implementing real-time dashboards or time-series analytics
## Overview
ClickHouse is a column-oriented database management system (DBMS) for online analytical processing (OLAP). It's optimized for fast analytical queries on large datasets.
**Key Features:**
- Column-oriented storage
- Data compression
- Parallel query execution
- Distributed queries
- Real-time analytics
## Table Design Patterns
### MergeTree Engine (Most Common)
```sql
CREATE TABLE markets_analytics (
date Date,
market_id String,
market_name String,
volume UInt64,
trades UInt32,
unique_traders UInt32,
avg_trade_size Float64,
created_at DateTime
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(date)
ORDER BY (date, market_id)
SETTINGS index_granularity = 8192;
```
### ReplacingMergeTree (Deduplication)
```sql
-- For data that may have duplicates (e.g., from multiple sources)
CREATE TABLE user_events (
event_id String,
user_id String,
event_type String,
timestamp DateTime,
properties String
) ENGINE = ReplacingMergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (user_id, event_id, timestamp)
PRIMARY KEY (user_id, event_id);
```
### AggregatingMergeTree (Pre-aggregation)
```sql
-- For maintaining aggregated metrics
CREATE TABLE market_stats_hourly (
hour DateTime,
market_id String,
total_volume AggregateFunction(sum, UInt64),
total_trades AggregateFunction(count, UInt32),
unique_users AggregateFunction(uniq, String)
) ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(hour)
ORDER BY (hour, market_id);
-- Query aggregated data
SELECT
hour,
market_id,
sumMerge(total_volume) AS volume,
countMerge(total_trades) AS trades,
uniqMerge(unique_users) AS users
FROM market_stats_hourly
WHERE hour >= toStartOfHour(now() - INTERVAL 24 HOUR)
GROUP BY hour, market_id
ORDER BY hour DESC;
```
## Query Optimization Patterns
### Efficient Filtering
```sql
-- ✅ GOOD: Use indexed columns first
SELECT *
FROM markets_analytics
WHERE date >= '2025-01-01'
AND market_id = 'market-123'
AND volume > 1000
ORDER BY date DESC
LIMIT 100;
-- ❌ BAD: Filter on non-indexed columns first
SELECT *
FROM markets_analytics
WHERE volume > 1000
AND market_name LIKE '%election%'
AND date >= '2025-01-01';
```
### Aggregations
```sql
-- ✅ GOOD: Use ClickHouse-specific aggregation functions
SELECT
toStartOfDay(created_at) AS day,
market_id,
sum(volume) AS total_volume,
count() AS total_trades,
uniq(trader_id) AS unique_traders,
avg(trade_size) AS avg_size
FROM trades
WHERE created_at >= today() - INTERVAL 7 DAY
GROUP BY day, market_id
ORDER BY day DESC, total_volume DESC;
-- ✅ Use quantile for percentiles (more efficient than percentile)
SELECT
quantile(0.50)(trade_size) AS median,
quantile(0.95)(trade_size) AS p95,
quantile(0.99)(trade_size) AS p99
FROM trades
WHERE created_at >= now() - INTERVAL 1 HOUR;
```
### Window Functions
```sql
-- Calculate running totals
SELECT
date,
market_id,
volume,
sum(volume) OVER (
PARTITION BY market_id
ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_volume
FROM markets_analytics
WHERE date >= today() - INTERVAL 30 DAY
ORDER BY market_id, date;
```
## Data Insertion Patterns
### Bulk Insert (Recommended)
```typescript
import { ClickHouse } from 'clickhouse'
const clickhouse = new ClickHouse({
url: process.env.CLICKHOUSE_URL,
port: 8123,
basicAuth: {
username: process.env.CLICKHOUSE_USER,
password: process.env.CLICKHOUSE_PASSWORD
}
})
// ✅ Batch insert (efficient)
async function bulkInsertTrades(trades: Trade[]) {
const values = trades.map(trade => `(
'${trade.id}',
'${trade.market_id}',
'${trade.user_id}',
${trade.amount},
'${trade.timestamp.toISOString()}'
)`).join(',')
await clickhouse.query(`
INSERT INTO trades (id, market_id, user_id, amount, timestamp)
VALUES ${values}
`).toPromise()
}
// ❌ Individual inserts (slow)
async function insertTrade(trade: Trade) {
// Don't do this in a loop!
await clickhouse.query(`
INSERT INTO trades VALUES ('${trade.id}', ...)
`).toPromise()
}
```
### Streaming Insert
```typescript
// For continuous data ingestion
import { createWriteStream } from 'fs'
import { pipeline } from 'stream/promises'
async function streamInserts() {
const stream = clickhouse.insert('trades').stream()
for await (const batch of dataSource) {
stream.write(batch)
}
await stream.end()
}
```
## Materialized Views
### Real-time Aggregations
```sql
-- Create materialized view for hourly stats
CREATE MATERIALIZED VIEW market_stats_hourly_mv
TO market_stats_hourly
AS SELECT
toStartOfHour(timestamp) AS hour,
market_id,
sumState(amount) AS total_volume,
countState() AS total_trades,
uniqState(user_id) AS unique_users
FROM trades
GROUP BY hour, market_id;
-- Query the materialized view
SELECT
hour,
market_id,
sumMerge(total_volume) AS volume,
countMerge(total_trades) AS trades,
uniqMerge(unique_users) AS users
FROM market_stats_hourly
WHERE hour >= now() - INTERVAL 24 HOUR
GROUP BY hour, market_id;
```
## Performance Monitoring
### Query Performance
```sql
-- Check slow queries
SELECT
query_id,
user,
query,
query_duration_ms,
read_rows,
read_bytes,
memory_usage
FROM system.query_log
WHERE type = 'QueryFinish'
AND query_duration_ms > 1000
AND event_time >= now() - INTERVAL 1 HOUR
ORDER BY query_duration_ms DESC
LIMIT 10;
```
### Table Statistics
```sql
-- Check table sizes
SELECT
database,
table,
formatReadableSize(sum(bytes)) AS size,
sum(rows) AS rows,
max(modification_time) AS latest_modification
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytes) DESC;
```
## Common Analytics Queries
### Time Series Analysis
```sql
-- Daily active users
SELECT
toDate(timestamp) AS date,
uniq(user_id) AS daily_active_users
FROM events
WHERE timestamp >= today() - INTERVAL 30 DAY
GROUP BY date
ORDER BY date;
-- Retention analysis
SELECT
signup_date,
countIf(days_since_signup = 0) AS day_0,
countIf(days_since_signup = 1) AS day_1,
countIf(days_since_signup = 7) AS day_7,
countIf(days_since_signup = 30) AS day_30
FROM (
SELECT
user_id,
min(toDate(timestamp)) AS signup_date,
toDate(timestamp) AS activity_date,
dateDiff('day', signup_date, activity_date) AS days_since_signup
FROM events
GROUP BY user_id, activity_date
)
GROUP BY signup_date
ORDER BY signup_date DESC;
```
### Funnel Analysis
```sql
-- Conversion funnel
SELECT
countIf(step = 'viewed_market') AS viewed,
countIf(step = 'clicked_trade') AS clicked,
countIf(step = 'completed_trade') AS completed,
round(clicked / viewed * 100, 2) AS view_to_click_rate,
round(completed / clicked * 100, 2) AS click_to_completion_rate
FROM (
SELECT
user_id,
session_id,
event_type AS step
FROM events
WHERE event_date = today()
)
GROUP BY session_id;
```
### Cohort Analysis
```sql
-- User cohorts by signup month
SELECT
toStartOfMonth(signup_date) AS cohort,
toStartOfMonth(activity_date) AS month,
dateDiff('month', cohort, month) AS months_since_signup,
count(DISTINCT user_id) AS active_users
FROM (
SELECT
user_id,
min(toDate(timestamp)) OVER (PARTITION BY user_id) AS signup_date,
toDate(timestamp) AS activity_date
FROM events
)
GROUP BY cohort, month, months_since_signup
ORDER BY cohort, months_since_signup;
```
## Data Pipeline Patterns
### ETL Pattern
```typescript
// Extract, Transform, Load
async function etlPipeline() {
// 1. Extract from source
const rawData = await extractFromPostgres()
// 2. Transform
const transformed = rawData.map(row => ({
date: new Date(row.created_at).toISOString().split('T')[0],
market_id: row.market_slug,
volume: parseFloat(row.total_volume),
trades: parseInt(row.trade_count)
}))
// 3. Load to ClickHouse
await bulkInsertToClickHouse(transformed)
}
// Run periodically
setInterval(etlPipeline, 60 * 60 * 1000) // Every hour
```
### Change Data Capture (CDC)
```typescript
// Listen to PostgreSQL changes and sync to ClickHouse
import { Client } from 'pg'
const pgClient = new Client({ connectionString: process.env.DATABASE_URL })
pgClient.query('LISTEN market_updates')
pgClient.on('notification', async (msg) => {
const update = JSON.parse(msg.payload)
await clickhouse.insert('market_updates', [
{
market_id: update.id,
event_type: update.operation, // INSERT, UPDATE, DELETE
timestamp: new Date(),
data: JSON.stringify(update.new_data)
}
])
})
```
## Best Practices
### 1. Partitioning Strategy
- Partition by time (usually month or day)
- Avoid too many partitions (performance impact)
- Use DATE type for partition key
### 2. Ordering Key
- Put most frequently filtered columns first
- Consider cardinality (high cardinality first)
- Order impacts compression
### 3. Data Types
- Use smallest appropriate type (UInt32 vs UInt64)
- Use LowCardinality for repeated strings
- Use Enum for categorical data
### 4. Avoid
- SELECT * (specify columns)
- FINAL (merge data before query instead)
- Too many JOINs (denormalize for analytics)
- Small frequent inserts (batch instead)
### 5. Monitoring
- Track query performance
- Monitor disk usage
- Check merge operations
- Review slow query log
**Remember**: ClickHouse excels at analytical workloads. Design tables for your query patterns, batch inserts, and leverage materialized views for real-time aggregations.

View File

@@ -0,0 +1,160 @@
---
name: content-hash-cache-pattern
description: Cache expensive file processing results using SHA-256 content hashes — path-independent, auto-invalidating, with service layer separation.
---
# Content-Hash File Cache Pattern
Cache expensive file processing results (PDF parsing, text extraction, image analysis) using SHA-256 content hashes as cache keys. Unlike path-based caching, this approach survives file moves/renames and auto-invalidates when content changes.
## When to Activate
- Building file processing pipelines (PDF, images, text extraction)
- Processing cost is high and same files are processed repeatedly
- Need a `--cache/--no-cache` CLI option
- Want to add caching to existing pure functions without modifying them
## Core Pattern
### 1. Content-Hash Based Cache Key
Use file content (not path) as the cache key:
```python
import hashlib
from pathlib import Path
_HASH_CHUNK_SIZE = 65536 # 64KB chunks for large files
def compute_file_hash(path: Path) -> str:
"""SHA-256 of file contents (chunked for large files)."""
if not path.is_file():
raise FileNotFoundError(f"File not found: {path}")
sha256 = hashlib.sha256()
with open(path, "rb") as f:
while True:
chunk = f.read(_HASH_CHUNK_SIZE)
if not chunk:
break
sha256.update(chunk)
return sha256.hexdigest()
```
**Why content hash?** File rename/move = cache hit. Content change = automatic invalidation. No index file needed.
### 2. Frozen Dataclass for Cache Entry
```python
from dataclasses import dataclass
@dataclass(frozen=True, slots=True)
class CacheEntry:
file_hash: str
source_path: str
document: ExtractedDocument # The cached result
```
### 3. File-Based Cache Storage
Each cache entry is stored as `{hash}.json` — O(1) lookup by hash, no index file required.
```python
import json
from typing import Any
def write_cache(cache_dir: Path, entry: CacheEntry) -> None:
cache_dir.mkdir(parents=True, exist_ok=True)
cache_file = cache_dir / f"{entry.file_hash}.json"
data = serialize_entry(entry)
cache_file.write_text(json.dumps(data, ensure_ascii=False), encoding="utf-8")
def read_cache(cache_dir: Path, file_hash: str) -> CacheEntry | None:
cache_file = cache_dir / f"{file_hash}.json"
if not cache_file.is_file():
return None
try:
raw = cache_file.read_text(encoding="utf-8")
data = json.loads(raw)
return deserialize_entry(data)
except (json.JSONDecodeError, ValueError, KeyError):
return None # Treat corruption as cache miss
```
### 4. Service Layer Wrapper (SRP)
Keep the processing function pure. Add caching as a separate service layer.
```python
def extract_with_cache(
file_path: Path,
*,
cache_enabled: bool = True,
cache_dir: Path = Path(".cache"),
) -> ExtractedDocument:
"""Service layer: cache check -> extraction -> cache write."""
if not cache_enabled:
return extract_text(file_path) # Pure function, no cache knowledge
file_hash = compute_file_hash(file_path)
# Check cache
cached = read_cache(cache_dir, file_hash)
if cached is not None:
logger.info("Cache hit: %s (hash=%s)", file_path.name, file_hash[:12])
return cached.document
# Cache miss -> extract -> store
logger.info("Cache miss: %s (hash=%s)", file_path.name, file_hash[:12])
doc = extract_text(file_path)
entry = CacheEntry(file_hash=file_hash, source_path=str(file_path), document=doc)
write_cache(cache_dir, entry)
return doc
```
## Key Design Decisions
| Decision | Rationale |
|----------|-----------|
| SHA-256 content hash | Path-independent, auto-invalidates on content change |
| `{hash}.json` file naming | O(1) lookup, no index file needed |
| Service layer wrapper | SRP: extraction stays pure, cache is a separate concern |
| Manual JSON serialization | Full control over frozen dataclass serialization |
| Corruption returns `None` | Graceful degradation, re-processes on next run |
| `cache_dir.mkdir(parents=True)` | Lazy directory creation on first write |
## Best Practices
- **Hash content, not paths** — paths change, content identity doesn't
- **Chunk large files** when hashing — avoid loading entire files into memory
- **Keep processing functions pure** — they should know nothing about caching
- **Log cache hit/miss** with truncated hashes for debugging
- **Handle corruption gracefully** — treat invalid cache entries as misses, never crash
## Anti-Patterns to Avoid
```python
# BAD: Path-based caching (breaks on file move/rename)
cache = {"/path/to/file.pdf": result}
# BAD: Adding cache logic inside the processing function (SRP violation)
def extract_text(path, *, cache_enabled=False, cache_dir=None):
if cache_enabled: # Now this function has two responsibilities
...
# BAD: Using dataclasses.asdict() with nested frozen dataclasses
# (can cause issues with complex nested types)
data = dataclasses.asdict(entry) # Use manual serialization instead
```
## When to Use
- File processing pipelines (PDF parsing, OCR, text extraction, image analysis)
- CLI tools that benefit from `--cache/--no-cache` options
- Batch processing where the same files appear across runs
- Adding caching to existing pure functions without modifying them
## When NOT to Use
- Data that must always be fresh (real-time feeds)
- Cache entries that would be extremely large (consider streaming instead)
- Results that depend on parameters beyond file content (e.g., different extraction configs)

View File

@@ -0,0 +1,118 @@
---
name: continuous-learning
description: Automatically extract reusable patterns from Claude Code sessions and save them as learned skills for future use.
---
# Continuous Learning Skill
Automatically evaluates Claude Code sessions on end to extract reusable patterns that can be saved as learned skills.
## When to Activate
- Setting up automatic pattern extraction from Claude Code sessions
- Configuring the Stop hook for session evaluation
- Reviewing or curating learned skills in `~/.claude/skills/learned/`
- Adjusting extraction thresholds or pattern categories
- Comparing v1 (this) vs v2 (instinct-based) approaches
## How It Works
This skill runs as a **Stop hook** at the end of each session:
1. **Session Evaluation**: Checks if session has enough messages (default: 10+)
2. **Pattern Detection**: Identifies extractable patterns from the session
3. **Skill Extraction**: Saves useful patterns to `~/.claude/skills/learned/`
## Configuration
Edit `config.json` to customize:
```json
{
"min_session_length": 10,
"extraction_threshold": "medium",
"auto_approve": false,
"learned_skills_path": "~/.claude/skills/learned/",
"patterns_to_detect": [
"error_resolution",
"user_corrections",
"workarounds",
"debugging_techniques",
"project_specific"
],
"ignore_patterns": [
"simple_typos",
"one_time_fixes",
"external_api_issues"
]
}
```
## Pattern Types
| Pattern | Description |
|---------|-------------|
| `error_resolution` | How specific errors were resolved |
| `user_corrections` | Patterns from user corrections |
| `workarounds` | Solutions to framework/library quirks |
| `debugging_techniques` | Effective debugging approaches |
| `project_specific` | Project-specific conventions |
## Hook Setup
Add to your `~/.claude/settings.json`:
```json
{
"hooks": {
"Stop": [{
"matcher": "*",
"hooks": [{
"type": "command",
"command": "~/.claude/skills/continuous-learning/evaluate-session.sh"
}]
}]
}
}
```
## Why Stop Hook?
- **Lightweight**: Runs once at session end
- **Non-blocking**: Doesn't add latency to every message
- **Complete context**: Has access to full session transcript
## Related
- [The Longform Guide](https://x.com/affaanmustafa/status/2014040193557471352) - Section on continuous learning
- `/learn` command - Manual pattern extraction mid-session
---
## Comparison Notes (Research: Jan 2025)
### vs Homunculus (github.com/humanplane/homunculus)
Homunculus v2 takes a more sophisticated approach:
| Feature | Our Approach | Homunculus v2 |
|---------|--------------|---------------|
| Observation | Stop hook (end of session) | PreToolUse/PostToolUse hooks (100% reliable) |
| Analysis | Main context | Background agent (Haiku) |
| Granularity | Full skills | Atomic "instincts" |
| Confidence | None | 0.3-0.9 weighted |
| Evolution | Direct to skill | Instincts → cluster → skill/command/agent |
| Sharing | None | Export/import instincts |
**Key insight from homunculus:**
> "v1 relied on skills to observe. Skills are probabilistic—they fire ~50-80% of the time. v2 uses hooks for observation (100% reliable) and instincts as the atomic unit of learned behavior."
### Potential v2 Enhancements
1. **Instinct-based learning** - Smaller, atomic behaviors with confidence scoring
2. **Background observer** - Haiku agent analyzing in parallel
3. **Confidence decay** - Instincts lose confidence if contradicted
4. **Domain tagging** - code-style, testing, git, debugging, etc.
5. **Evolution path** - Cluster related instincts into skills/commands
See: `/Users/affoon/Documents/tasks/12-continuous-learning-v2.md` for full spec.

View File

@@ -0,0 +1,18 @@
{
"min_session_length": 10,
"extraction_threshold": "medium",
"auto_approve": false,
"learned_skills_path": "~/.claude/skills/learned/",
"patterns_to_detect": [
"error_resolution",
"user_corrections",
"workarounds",
"debugging_techniques",
"project_specific"
],
"ignore_patterns": [
"simple_typos",
"one_time_fixes",
"external_api_issues"
]
}

View File

@@ -0,0 +1,69 @@
#!/bin/bash
# Continuous Learning - Session Evaluator
# Runs on Stop hook to extract reusable patterns from Claude Code sessions
#
# Why Stop hook instead of UserPromptSubmit:
# - Stop runs once at session end (lightweight)
# - UserPromptSubmit runs every message (heavy, adds latency)
#
# Hook config (in ~/.claude/settings.json):
# {
# "hooks": {
# "Stop": [{
# "matcher": "*",
# "hooks": [{
# "type": "command",
# "command": "~/.claude/skills/continuous-learning/evaluate-session.sh"
# }]
# }]
# }
# }
#
# Patterns to detect: error_resolution, debugging_techniques, workarounds, project_specific
# Patterns to ignore: simple_typos, one_time_fixes, external_api_issues
# Extracted skills saved to: ~/.claude/skills/learned/
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
CONFIG_FILE="$SCRIPT_DIR/config.json"
LEARNED_SKILLS_PATH="${HOME}/.claude/skills/learned"
MIN_SESSION_LENGTH=10
# Load config if exists
if [ -f "$CONFIG_FILE" ]; then
if ! command -v jq &>/dev/null; then
echo "[ContinuousLearning] jq is required to parse config.json but not installed, using defaults" >&2
else
MIN_SESSION_LENGTH=$(jq -r '.min_session_length // 10' "$CONFIG_FILE")
LEARNED_SKILLS_PATH=$(jq -r '.learned_skills_path // "~/.claude/skills/learned/"' "$CONFIG_FILE" | sed "s|~|$HOME|")
fi
fi
# Ensure learned skills directory exists
mkdir -p "$LEARNED_SKILLS_PATH"
# Get transcript path from stdin JSON (Claude Code hook input)
# Falls back to env var for backwards compatibility
stdin_data=$(cat)
transcript_path=$(echo "$stdin_data" | grep -o '"transcript_path":"[^"]*"' | head -1 | cut -d'"' -f4)
if [ -z "$transcript_path" ]; then
transcript_path="${CLAUDE_TRANSCRIPT_PATH:-}"
fi
if [ -z "$transcript_path" ] || [ ! -f "$transcript_path" ]; then
exit 0
fi
# Count messages in session
message_count=$(grep -c '"type":"user"' "$transcript_path" 2>/dev/null || echo "0")
# Skip short sessions
if [ "$message_count" -lt "$MIN_SESSION_LENGTH" ]; then
echo "[ContinuousLearning] Session too short ($message_count messages), skipping" >&2
exit 0
fi
# Signal to Claude that session should be evaluated for extractable patterns
echo "[ContinuousLearning] Session has $message_count messages - evaluate for extractable patterns" >&2
echo "[ContinuousLearning] Save learned skills to: $LEARNED_SKILLS_PATH" >&2

View File

@@ -0,0 +1,722 @@
---
name: cpp-coding-standards
description: C++ coding standards based on the C++ Core Guidelines (isocpp.github.io). Use when writing, reviewing, or refactoring C++ code to enforce modern, safe, and idiomatic practices.
---
# C++ Coding Standards (C++ Core Guidelines)
Comprehensive coding standards for modern C++ (C++17/20/23) derived from the [C++ Core Guidelines](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines). Enforces type safety, resource safety, immutability, and clarity.
## When to Use
- Writing new C++ code (classes, functions, templates)
- Reviewing or refactoring existing C++ code
- Making architectural decisions in C++ projects
- Enforcing consistent style across a C++ codebase
- Choosing between language features (e.g., `enum` vs `enum class`, raw pointer vs smart pointer)
### When NOT to Use
- Non-C++ projects
- Legacy C codebases that cannot adopt modern C++ features
- Embedded/bare-metal contexts where specific guidelines conflict with hardware constraints (adapt selectively)
## Cross-Cutting Principles
These themes recur across the entire guidelines and form the foundation:
1. **RAII everywhere** (P.8, R.1, E.6, CP.20): Bind resource lifetime to object lifetime
2. **Immutability by default** (P.10, Con.1-5, ES.25): Start with `const`/`constexpr`; mutability is the exception
3. **Type safety** (P.4, I.4, ES.46-49, Enum.3): Use the type system to prevent errors at compile time
4. **Express intent** (P.3, F.1, NL.1-2, T.10): Names, types, and concepts should communicate purpose
5. **Minimize complexity** (F.2-3, ES.5, Per.4-5): Simple code is correct code
6. **Value semantics over pointer semantics** (C.10, R.3-5, F.20, CP.31): Prefer returning by value and scoped objects
## Philosophy & Interfaces (P.*, I.*)
### Key Rules
| Rule | Summary |
|------|---------|
| **P.1** | Express ideas directly in code |
| **P.3** | Express intent |
| **P.4** | Ideally, a program should be statically type safe |
| **P.5** | Prefer compile-time checking to run-time checking |
| **P.8** | Don't leak any resources |
| **P.10** | Prefer immutable data to mutable data |
| **I.1** | Make interfaces explicit |
| **I.2** | Avoid non-const global variables |
| **I.4** | Make interfaces precisely and strongly typed |
| **I.11** | Never transfer ownership by a raw pointer or reference |
| **I.23** | Keep the number of function arguments low |
### DO
```cpp
// P.10 + I.4: Immutable, strongly typed interface
struct Temperature {
double kelvin;
};
Temperature boil(const Temperature& water);
```
### DON'T
```cpp
// Weak interface: unclear ownership, unclear units
double boil(double* temp);
// Non-const global variable
int g_counter = 0; // I.2 violation
```
## Functions (F.*)
### Key Rules
| Rule | Summary |
|------|---------|
| **F.1** | Package meaningful operations as carefully named functions |
| **F.2** | A function should perform a single logical operation |
| **F.3** | Keep functions short and simple |
| **F.4** | If a function might be evaluated at compile time, declare it `constexpr` |
| **F.6** | If your function must not throw, declare it `noexcept` |
| **F.8** | Prefer pure functions |
| **F.16** | For "in" parameters, pass cheaply-copied types by value and others by `const&` |
| **F.20** | For "out" values, prefer return values to output parameters |
| **F.21** | To return multiple "out" values, prefer returning a struct |
| **F.43** | Never return a pointer or reference to a local object |
### Parameter Passing
```cpp
// F.16: Cheap types by value, others by const&
void print(int x); // cheap: by value
void analyze(const std::string& data); // expensive: by const&
void transform(std::string s); // sink: by value (will move)
// F.20 + F.21: Return values, not output parameters
struct ParseResult {
std::string token;
int position;
};
ParseResult parse(std::string_view input); // GOOD: return struct
// BAD: output parameters
void parse(std::string_view input,
std::string& token, int& pos); // avoid this
```
### Pure Functions and constexpr
```cpp
// F.4 + F.8: Pure, constexpr where possible
constexpr int factorial(int n) noexcept {
return (n <= 1) ? 1 : n * factorial(n - 1);
}
static_assert(factorial(5) == 120);
```
### Anti-Patterns
- Returning `T&&` from functions (F.45)
- Using `va_arg` / C-style variadics (F.55)
- Capturing by reference in lambdas passed to other threads (F.53)
- Returning `const T` which inhibits move semantics (F.49)
## Classes & Class Hierarchies (C.*)
### Key Rules
| Rule | Summary |
|------|---------|
| **C.2** | Use `class` if invariant exists; `struct` if data members vary independently |
| **C.9** | Minimize exposure of members |
| **C.20** | If you can avoid defining default operations, do (Rule of Zero) |
| **C.21** | If you define or `=delete` any copy/move/destructor, handle them all (Rule of Five) |
| **C.35** | Base class destructor: public virtual or protected non-virtual |
| **C.41** | A constructor should create a fully initialized object |
| **C.46** | Declare single-argument constructors `explicit` |
| **C.67** | A polymorphic class should suppress public copy/move |
| **C.128** | Virtual functions: specify exactly one of `virtual`, `override`, or `final` |
### Rule of Zero
```cpp
// C.20: Let the compiler generate special members
struct Employee {
std::string name;
std::string department;
int id;
// No destructor, copy/move constructors, or assignment operators needed
};
```
### Rule of Five
```cpp
// C.21: If you must manage a resource, define all five
class Buffer {
public:
explicit Buffer(std::size_t size)
: data_(std::make_unique<char[]>(size)), size_(size) {}
~Buffer() = default;
Buffer(const Buffer& other)
: data_(std::make_unique<char[]>(other.size_)), size_(other.size_) {
std::copy_n(other.data_.get(), size_, data_.get());
}
Buffer& operator=(const Buffer& other) {
if (this != &other) {
auto new_data = std::make_unique<char[]>(other.size_);
std::copy_n(other.data_.get(), other.size_, new_data.get());
data_ = std::move(new_data);
size_ = other.size_;
}
return *this;
}
Buffer(Buffer&&) noexcept = default;
Buffer& operator=(Buffer&&) noexcept = default;
private:
std::unique_ptr<char[]> data_;
std::size_t size_;
};
```
### Class Hierarchy
```cpp
// C.35 + C.128: Virtual destructor, use override
class Shape {
public:
virtual ~Shape() = default;
virtual double area() const = 0; // C.121: pure interface
};
class Circle : public Shape {
public:
explicit Circle(double r) : radius_(r) {}
double area() const override { return 3.14159 * radius_ * radius_; }
private:
double radius_;
};
```
### Anti-Patterns
- Calling virtual functions in constructors/destructors (C.82)
- Using `memset`/`memcpy` on non-trivial types (C.90)
- Providing different default arguments for virtual function and overrider (C.140)
- Making data members `const` or references, which suppresses move/copy (C.12)
## Resource Management (R.*)
### Key Rules
| Rule | Summary |
|------|---------|
| **R.1** | Manage resources automatically using RAII |
| **R.3** | A raw pointer (`T*`) is non-owning |
| **R.5** | Prefer scoped objects; don't heap-allocate unnecessarily |
| **R.10** | Avoid `malloc()`/`free()` |
| **R.11** | Avoid calling `new` and `delete` explicitly |
| **R.20** | Use `unique_ptr` or `shared_ptr` to represent ownership |
| **R.21** | Prefer `unique_ptr` over `shared_ptr` unless sharing ownership |
| **R.22** | Use `make_shared()` to make `shared_ptr`s |
### Smart Pointer Usage
```cpp
// R.11 + R.20 + R.21: RAII with smart pointers
auto widget = std::make_unique<Widget>("config"); // unique ownership
auto cache = std::make_shared<Cache>(1024); // shared ownership
// R.3: Raw pointer = non-owning observer
void render(const Widget* w) { // does NOT own w
if (w) w->draw();
}
render(widget.get());
```
### RAII Pattern
```cpp
// R.1: Resource acquisition is initialization
class FileHandle {
public:
explicit FileHandle(const std::string& path)
: handle_(std::fopen(path.c_str(), "r")) {
if (!handle_) throw std::runtime_error("Failed to open: " + path);
}
~FileHandle() {
if (handle_) std::fclose(handle_);
}
FileHandle(const FileHandle&) = delete;
FileHandle& operator=(const FileHandle&) = delete;
FileHandle(FileHandle&& other) noexcept
: handle_(std::exchange(other.handle_, nullptr)) {}
FileHandle& operator=(FileHandle&& other) noexcept {
if (this != &other) {
if (handle_) std::fclose(handle_);
handle_ = std::exchange(other.handle_, nullptr);
}
return *this;
}
private:
std::FILE* handle_;
};
```
### Anti-Patterns
- Naked `new`/`delete` (R.11)
- `malloc()`/`free()` in C++ code (R.10)
- Multiple resource allocations in a single expression (R.13 -- exception safety hazard)
- `shared_ptr` where `unique_ptr` suffices (R.21)
## Expressions & Statements (ES.*)
### Key Rules
| Rule | Summary |
|------|---------|
| **ES.5** | Keep scopes small |
| **ES.20** | Always initialize an object |
| **ES.23** | Prefer `{}` initializer syntax |
| **ES.25** | Declare objects `const` or `constexpr` unless modification is intended |
| **ES.28** | Use lambdas for complex initialization of `const` variables |
| **ES.45** | Avoid magic constants; use symbolic constants |
| **ES.46** | Avoid narrowing/lossy arithmetic conversions |
| **ES.47** | Use `nullptr` rather than `0` or `NULL` |
| **ES.48** | Avoid casts |
| **ES.50** | Don't cast away `const` |
### Initialization
```cpp
// ES.20 + ES.23 + ES.25: Always initialize, prefer {}, default to const
const int max_retries{3};
const std::string name{"widget"};
const std::vector<int> primes{2, 3, 5, 7, 11};
// ES.28: Lambda for complex const initialization
const auto config = [&] {
Config c;
c.timeout = std::chrono::seconds{30};
c.retries = max_retries;
c.verbose = debug_mode;
return c;
}();
```
### Anti-Patterns
- Uninitialized variables (ES.20)
- Using `0` or `NULL` as pointer (ES.47 -- use `nullptr`)
- C-style casts (ES.48 -- use `static_cast`, `const_cast`, etc.)
- Casting away `const` (ES.50)
- Magic numbers without named constants (ES.45)
- Mixing signed and unsigned arithmetic (ES.100)
- Reusing names in nested scopes (ES.12)
## Error Handling (E.*)
### Key Rules
| Rule | Summary |
|------|---------|
| **E.1** | Develop an error-handling strategy early in a design |
| **E.2** | Throw an exception to signal that a function can't perform its assigned task |
| **E.6** | Use RAII to prevent leaks |
| **E.12** | Use `noexcept` when throwing is impossible or unacceptable |
| **E.14** | Use purpose-designed user-defined types as exceptions |
| **E.15** | Throw by value, catch by reference |
| **E.16** | Destructors, deallocation, and swap must never fail |
| **E.17** | Don't try to catch every exception in every function |
### Exception Hierarchy
```cpp
// E.14 + E.15: Custom exception types, throw by value, catch by reference
class AppError : public std::runtime_error {
public:
using std::runtime_error::runtime_error;
};
class NetworkError : public AppError {
public:
NetworkError(const std::string& msg, int code)
: AppError(msg), status_code(code) {}
int status_code;
};
void fetch_data(const std::string& url) {
// E.2: Throw to signal failure
throw NetworkError("connection refused", 503);
}
void run() {
try {
fetch_data("https://api.example.com");
} catch (const NetworkError& e) {
log_error(e.what(), e.status_code);
} catch (const AppError& e) {
log_error(e.what());
}
// E.17: Don't catch everything here -- let unexpected errors propagate
}
```
### Anti-Patterns
- Throwing built-in types like `int` or string literals (E.14)
- Catching by value (slicing risk) (E.15)
- Empty catch blocks that silently swallow errors
- Using exceptions for flow control (E.3)
- Error handling based on global state like `errno` (E.28)
## Constants & Immutability (Con.*)
### All Rules
| Rule | Summary |
|------|---------|
| **Con.1** | By default, make objects immutable |
| **Con.2** | By default, make member functions `const` |
| **Con.3** | By default, pass pointers and references to `const` |
| **Con.4** | Use `const` for values that don't change after construction |
| **Con.5** | Use `constexpr` for values computable at compile time |
```cpp
// Con.1 through Con.5: Immutability by default
class Sensor {
public:
explicit Sensor(std::string id) : id_(std::move(id)) {}
// Con.2: const member functions by default
const std::string& id() const { return id_; }
double last_reading() const { return reading_; }
// Only non-const when mutation is required
void record(double value) { reading_ = value; }
private:
const std::string id_; // Con.4: never changes after construction
double reading_{0.0};
};
// Con.3: Pass by const reference
void display(const Sensor& s) {
std::cout << s.id() << ": " << s.last_reading() << '\n';
}
// Con.5: Compile-time constants
constexpr double PI = 3.14159265358979;
constexpr int MAX_SENSORS = 256;
```
## Concurrency & Parallelism (CP.*)
### Key Rules
| Rule | Summary |
|------|---------|
| **CP.2** | Avoid data races |
| **CP.3** | Minimize explicit sharing of writable data |
| **CP.4** | Think in terms of tasks, rather than threads |
| **CP.8** | Don't use `volatile` for synchronization |
| **CP.20** | Use RAII, never plain `lock()`/`unlock()` |
| **CP.21** | Use `std::scoped_lock` to acquire multiple mutexes |
| **CP.22** | Never call unknown code while holding a lock |
| **CP.42** | Don't wait without a condition |
| **CP.44** | Remember to name your `lock_guard`s and `unique_lock`s |
| **CP.100** | Don't use lock-free programming unless you absolutely have to |
### Safe Locking
```cpp
// CP.20 + CP.44: RAII locks, always named
class ThreadSafeQueue {
public:
void push(int value) {
std::lock_guard<std::mutex> lock(mutex_); // CP.44: named!
queue_.push(value);
cv_.notify_one();
}
int pop() {
std::unique_lock<std::mutex> lock(mutex_);
// CP.42: Always wait with a condition
cv_.wait(lock, [this] { return !queue_.empty(); });
const int value = queue_.front();
queue_.pop();
return value;
}
private:
std::mutex mutex_; // CP.50: mutex with its data
std::condition_variable cv_;
std::queue<int> queue_;
};
```
### Multiple Mutexes
```cpp
// CP.21: std::scoped_lock for multiple mutexes (deadlock-free)
void transfer(Account& from, Account& to, double amount) {
std::scoped_lock lock(from.mutex_, to.mutex_);
from.balance_ -= amount;
to.balance_ += amount;
}
```
### Anti-Patterns
- `volatile` for synchronization (CP.8 -- it's for hardware I/O only)
- Detaching threads (CP.26 -- lifetime management becomes nearly impossible)
- Unnamed lock guards: `std::lock_guard<std::mutex>(m);` destroys immediately (CP.44)
- Holding locks while calling callbacks (CP.22 -- deadlock risk)
- Lock-free programming without deep expertise (CP.100)
## Templates & Generic Programming (T.*)
### Key Rules
| Rule | Summary |
|------|---------|
| **T.1** | Use templates to raise the level of abstraction |
| **T.2** | Use templates to express algorithms for many argument types |
| **T.10** | Specify concepts for all template arguments |
| **T.11** | Use standard concepts whenever possible |
| **T.13** | Prefer shorthand notation for simple concepts |
| **T.43** | Prefer `using` over `typedef` |
| **T.120** | Use template metaprogramming only when you really need to |
| **T.144** | Don't specialize function templates (overload instead) |
### Concepts (C++20)
```cpp
#include <concepts>
// T.10 + T.11: Constrain templates with standard concepts
template<std::integral T>
T gcd(T a, T b) {
while (b != 0) {
a = std::exchange(b, a % b);
}
return a;
}
// T.13: Shorthand concept syntax
void sort(std::ranges::random_access_range auto& range) {
std::ranges::sort(range);
}
// Custom concept for domain-specific constraints
template<typename T>
concept Serializable = requires(const T& t) {
{ t.serialize() } -> std::convertible_to<std::string>;
};
template<Serializable T>
void save(const T& obj, const std::string& path);
```
### Anti-Patterns
- Unconstrained templates in visible namespaces (T.47)
- Specializing function templates instead of overloading (T.144)
- Template metaprogramming where `constexpr` suffices (T.120)
- `typedef` instead of `using` (T.43)
## Standard Library (SL.*)
### Key Rules
| Rule | Summary |
|------|---------|
| **SL.1** | Use libraries wherever possible |
| **SL.2** | Prefer the standard library to other libraries |
| **SL.con.1** | Prefer `std::array` or `std::vector` over C arrays |
| **SL.con.2** | Prefer `std::vector` by default |
| **SL.str.1** | Use `std::string` to own character sequences |
| **SL.str.2** | Use `std::string_view` to refer to character sequences |
| **SL.io.50** | Avoid `endl` (use `'\n'` -- `endl` forces a flush) |
```cpp
// SL.con.1 + SL.con.2: Prefer vector/array over C arrays
const std::array<int, 4> fixed_data{1, 2, 3, 4};
std::vector<std::string> dynamic_data;
// SL.str.1 + SL.str.2: string owns, string_view observes
std::string build_greeting(std::string_view name) {
return "Hello, " + std::string(name) + "!";
}
// SL.io.50: Use '\n' not endl
std::cout << "result: " << value << '\n';
```
## Enumerations (Enum.*)
### Key Rules
| Rule | Summary |
|------|---------|
| **Enum.1** | Prefer enumerations over macros |
| **Enum.3** | Prefer `enum class` over plain `enum` |
| **Enum.5** | Don't use ALL_CAPS for enumerators |
| **Enum.6** | Avoid unnamed enumerations |
```cpp
// Enum.3 + Enum.5: Scoped enum, no ALL_CAPS
enum class Color { red, green, blue };
enum class LogLevel { debug, info, warning, error };
// BAD: plain enum leaks names, ALL_CAPS clashes with macros
enum { RED, GREEN, BLUE }; // Enum.3 + Enum.5 + Enum.6 violation
#define MAX_SIZE 100 // Enum.1 violation -- use constexpr
```
## Source Files & Naming (SF.*, NL.*)
### Key Rules
| Rule | Summary |
|------|---------|
| **SF.1** | Use `.cpp` for code files and `.h` for interface files |
| **SF.7** | Don't write `using namespace` at global scope in a header |
| **SF.8** | Use `#include` guards for all `.h` files |
| **SF.11** | Header files should be self-contained |
| **NL.5** | Avoid encoding type information in names (no Hungarian notation) |
| **NL.8** | Use a consistent naming style |
| **NL.9** | Use ALL_CAPS for macro names only |
| **NL.10** | Prefer `underscore_style` names |
### Header Guard
```cpp
// SF.8: Include guard (or #pragma once)
#ifndef PROJECT_MODULE_WIDGET_H
#define PROJECT_MODULE_WIDGET_H
// SF.11: Self-contained -- include everything this header needs
#include <string>
#include <vector>
namespace project::module {
class Widget {
public:
explicit Widget(std::string name);
const std::string& name() const;
private:
std::string name_;
};
} // namespace project::module
#endif // PROJECT_MODULE_WIDGET_H
```
### Naming Conventions
```cpp
// NL.8 + NL.10: Consistent underscore_style
namespace my_project {
constexpr int max_buffer_size = 4096; // NL.9: not ALL_CAPS (it's not a macro)
class tcp_connection { // underscore_style class
public:
void send_message(std::string_view msg);
bool is_connected() const;
private:
std::string host_; // trailing underscore for members
int port_;
};
} // namespace my_project
```
### Anti-Patterns
- `using namespace std;` in a header at global scope (SF.7)
- Headers that depend on inclusion order (SF.10, SF.11)
- Hungarian notation like `strName`, `iCount` (NL.5)
- ALL_CAPS for anything other than macros (NL.9)
## Performance (Per.*)
### Key Rules
| Rule | Summary |
|------|---------|
| **Per.1** | Don't optimize without reason |
| **Per.2** | Don't optimize prematurely |
| **Per.6** | Don't make claims about performance without measurements |
| **Per.7** | Design to enable optimization |
| **Per.10** | Rely on the static type system |
| **Per.11** | Move computation from run time to compile time |
| **Per.19** | Access memory predictably |
### Guidelines
```cpp
// Per.11: Compile-time computation where possible
constexpr auto lookup_table = [] {
std::array<int, 256> table{};
for (int i = 0; i < 256; ++i) {
table[i] = i * i;
}
return table;
}();
// Per.19: Prefer contiguous data for cache-friendliness
std::vector<Point> points; // GOOD: contiguous
std::vector<std::unique_ptr<Point>> indirect_points; // BAD: pointer chasing
```
### Anti-Patterns
- Optimizing without profiling data (Per.1, Per.6)
- Choosing "clever" low-level code over clear abstractions (Per.4, Per.5)
- Ignoring data layout and cache behavior (Per.19)
## Quick Reference Checklist
Before marking C++ work complete:
- [ ] No raw `new`/`delete` -- use smart pointers or RAII (R.11)
- [ ] Objects initialized at declaration (ES.20)
- [ ] Variables are `const`/`constexpr` by default (Con.1, ES.25)
- [ ] Member functions are `const` where possible (Con.2)
- [ ] `enum class` instead of plain `enum` (Enum.3)
- [ ] `nullptr` instead of `0`/`NULL` (ES.47)
- [ ] No narrowing conversions (ES.46)
- [ ] No C-style casts (ES.48)
- [ ] Single-argument constructors are `explicit` (C.46)
- [ ] Rule of Zero or Rule of Five applied (C.20, C.21)
- [ ] Base class destructors are public virtual or protected non-virtual (C.35)
- [ ] Templates are constrained with concepts (T.10)
- [ ] No `using namespace` in headers at global scope (SF.7)
- [ ] Headers have include guards and are self-contained (SF.8, SF.11)
- [ ] Locks use RAII (`scoped_lock`/`lock_guard`) (CP.20)
- [ ] Exceptions are custom types, thrown by value, caught by reference (E.14, E.15)
- [ ] `'\n'` instead of `std::endl` (SL.io.50)
- [ ] No magic numbers (ES.45)

322
skills/cpp-testing/SKILL.md Normal file
View File

@@ -0,0 +1,322 @@
---
name: cpp-testing
description: Use only when writing/updating/fixing C++ tests, configuring GoogleTest/CTest, diagnosing failing or flaky tests, or adding coverage/sanitizers.
---
# C++ Testing (Agent Skill)
Agent-focused testing workflow for modern C++ (C++17/20) using GoogleTest/GoogleMock with CMake/CTest.
## When to Use
- Writing new C++ tests or fixing existing tests
- Designing unit/integration test coverage for C++ components
- Adding test coverage, CI gating, or regression protection
- Configuring CMake/CTest workflows for consistent execution
- Investigating test failures or flaky behavior
- Enabling sanitizers for memory/race diagnostics
### When NOT to Use
- Implementing new product features without test changes
- Large-scale refactors unrelated to test coverage or failures
- Performance tuning without test regressions to validate
- Non-C++ projects or non-test tasks
## Core Concepts
- **TDD loop**: red → green → refactor (tests first, minimal fix, then cleanups).
- **Isolation**: prefer dependency injection and fakes over global state.
- **Test layout**: `tests/unit`, `tests/integration`, `tests/testdata`.
- **Mocks vs fakes**: mock for interactions, fake for stateful behavior.
- **CTest discovery**: use `gtest_discover_tests()` for stable test discovery.
- **CI signal**: run subset first, then full suite with `--output-on-failure`.
## TDD Workflow
Follow the RED → GREEN → REFACTOR loop:
1. **RED**: write a failing test that captures the new behavior
2. **GREEN**: implement the smallest change to pass
3. **REFACTOR**: clean up while tests stay green
```cpp
// tests/add_test.cpp
#include <gtest/gtest.h>
int Add(int a, int b); // Provided by production code.
TEST(AddTest, AddsTwoNumbers) { // RED
EXPECT_EQ(Add(2, 3), 5);
}
// src/add.cpp
int Add(int a, int b) { // GREEN
return a + b;
}
// REFACTOR: simplify/rename once tests pass
```
## Code Examples
### Basic Unit Test (gtest)
```cpp
// tests/calculator_test.cpp
#include <gtest/gtest.h>
int Add(int a, int b); // Provided by production code.
TEST(CalculatorTest, AddsTwoNumbers) {
EXPECT_EQ(Add(2, 3), 5);
}
```
### Fixture (gtest)
```cpp
// tests/user_store_test.cpp
// Pseudocode stub: replace UserStore/User with project types.
#include <gtest/gtest.h>
#include <memory>
#include <optional>
#include <string>
struct User { std::string name; };
class UserStore {
public:
explicit UserStore(std::string /*path*/) {}
void Seed(std::initializer_list<User> /*users*/) {}
std::optional<User> Find(const std::string &/*name*/) { return User{"alice"}; }
};
class UserStoreTest : public ::testing::Test {
protected:
void SetUp() override {
store = std::make_unique<UserStore>(":memory:");
store->Seed({{"alice"}, {"bob"}});
}
std::unique_ptr<UserStore> store;
};
TEST_F(UserStoreTest, FindsExistingUser) {
auto user = store->Find("alice");
ASSERT_TRUE(user.has_value());
EXPECT_EQ(user->name, "alice");
}
```
### Mock (gmock)
```cpp
// tests/notifier_test.cpp
#include <gmock/gmock.h>
#include <gtest/gtest.h>
#include <string>
class Notifier {
public:
virtual ~Notifier() = default;
virtual void Send(const std::string &message) = 0;
};
class MockNotifier : public Notifier {
public:
MOCK_METHOD(void, Send, (const std::string &message), (override));
};
class Service {
public:
explicit Service(Notifier &notifier) : notifier_(notifier) {}
void Publish(const std::string &message) { notifier_.Send(message); }
private:
Notifier &notifier_;
};
TEST(ServiceTest, SendsNotifications) {
MockNotifier notifier;
Service service(notifier);
EXPECT_CALL(notifier, Send("hello")).Times(1);
service.Publish("hello");
}
```
### CMake/CTest Quickstart
```cmake
# CMakeLists.txt (excerpt)
cmake_minimum_required(VERSION 3.20)
project(example LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
include(FetchContent)
# Prefer project-locked versions. If using a tag, use a pinned version per project policy.
set(GTEST_VERSION v1.17.0) # Adjust to project policy.
FetchContent_Declare(
googletest
URL https://github.com/google/googletest/archive/refs/tags/${GTEST_VERSION}.zip
)
FetchContent_MakeAvailable(googletest)
add_executable(example_tests
tests/calculator_test.cpp
src/calculator.cpp
)
target_link_libraries(example_tests GTest::gtest GTest::gmock GTest::gtest_main)
enable_testing()
include(GoogleTest)
gtest_discover_tests(example_tests)
```
```bash
cmake -S . -B build -DCMAKE_BUILD_TYPE=Debug
cmake --build build -j
ctest --test-dir build --output-on-failure
```
## Running Tests
```bash
ctest --test-dir build --output-on-failure
ctest --test-dir build -R ClampTest
ctest --test-dir build -R "UserStoreTest.*" --output-on-failure
```
```bash
./build/example_tests --gtest_filter=ClampTest.*
./build/example_tests --gtest_filter=UserStoreTest.FindsExistingUser
```
## Debugging Failures
1. Re-run the single failing test with gtest filter.
2. Add scoped logging around the failing assertion.
3. Re-run with sanitizers enabled.
4. Expand to full suite once the root cause is fixed.
## Coverage
Prefer target-level settings instead of global flags.
```cmake
option(ENABLE_COVERAGE "Enable coverage flags" OFF)
if(ENABLE_COVERAGE)
if(CMAKE_CXX_COMPILER_ID MATCHES "GNU")
target_compile_options(example_tests PRIVATE --coverage)
target_link_options(example_tests PRIVATE --coverage)
elseif(CMAKE_CXX_COMPILER_ID MATCHES "Clang")
target_compile_options(example_tests PRIVATE -fprofile-instr-generate -fcoverage-mapping)
target_link_options(example_tests PRIVATE -fprofile-instr-generate)
endif()
endif()
```
GCC + gcov + lcov:
```bash
cmake -S . -B build-cov -DENABLE_COVERAGE=ON
cmake --build build-cov -j
ctest --test-dir build-cov
lcov --capture --directory build-cov --output-file coverage.info
lcov --remove coverage.info '/usr/*' --output-file coverage.info
genhtml coverage.info --output-directory coverage
```
Clang + llvm-cov:
```bash
cmake -S . -B build-llvm -DENABLE_COVERAGE=ON -DCMAKE_CXX_COMPILER=clang++
cmake --build build-llvm -j
LLVM_PROFILE_FILE="build-llvm/default.profraw" ctest --test-dir build-llvm
llvm-profdata merge -sparse build-llvm/default.profraw -o build-llvm/default.profdata
llvm-cov report build-llvm/example_tests -instr-profile=build-llvm/default.profdata
```
## Sanitizers
```cmake
option(ENABLE_ASAN "Enable AddressSanitizer" OFF)
option(ENABLE_UBSAN "Enable UndefinedBehaviorSanitizer" OFF)
option(ENABLE_TSAN "Enable ThreadSanitizer" OFF)
if(ENABLE_ASAN)
add_compile_options(-fsanitize=address -fno-omit-frame-pointer)
add_link_options(-fsanitize=address)
endif()
if(ENABLE_UBSAN)
add_compile_options(-fsanitize=undefined -fno-omit-frame-pointer)
add_link_options(-fsanitize=undefined)
endif()
if(ENABLE_TSAN)
add_compile_options(-fsanitize=thread)
add_link_options(-fsanitize=thread)
endif()
```
## Flaky Tests Guardrails
- Never use `sleep` for synchronization; use condition variables or latches.
- Make temp directories unique per test and always clean them.
- Avoid real time, network, or filesystem dependencies in unit tests.
- Use deterministic seeds for randomized inputs.
## Best Practices
### DO
- Keep tests deterministic and isolated
- Prefer dependency injection over globals
- Use `ASSERT_*` for preconditions, `EXPECT_*` for multiple checks
- Separate unit vs integration tests in CTest labels or directories
- Run sanitizers in CI for memory and race detection
### DON'T
- Don't depend on real time or network in unit tests
- Don't use sleeps as synchronization when a condition variable can be used
- Don't over-mock simple value objects
- Don't use brittle string matching for non-critical logs
### Common Pitfalls
- **Using fixed temp paths** → Generate unique temp directories per test and clean them.
- **Relying on wall clock time** → Inject a clock or use fake time sources.
- **Flaky concurrency tests** → Use condition variables/latches and bounded waits.
- **Hidden global state** → Reset global state in fixtures or remove globals.
- **Over-mocking** → Prefer fakes for stateful behavior and only mock interactions.
- **Missing sanitizer runs** → Add ASan/UBSan/TSan builds in CI.
- **Coverage on debug-only builds** → Ensure coverage targets use consistent flags.
## Optional Appendix: Fuzzing / Property Testing
Only use if the project already supports LLVM/libFuzzer or a property-testing library.
- **libFuzzer**: best for pure functions with minimal I/O.
- **RapidCheck**: property-based tests to validate invariants.
Minimal libFuzzer harness (pseudocode: replace ParseConfig):
```cpp
#include <cstddef>
#include <cstdint>
#include <string>
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
std::string input(reinterpret_cast<const char *>(data), size);
// ParseConfig(input); // project function
return 0;
}
```
## Alternatives to GoogleTest
- **Catch2**: header-only, expressive matchers
- **doctest**: lightweight, minimal compile overhead

View File

@@ -0,0 +1,334 @@
---
name: database-migrations
description: Database migration best practices for schema changes, data migrations, rollbacks, and zero-downtime deployments across PostgreSQL, MySQL, and common ORMs (Prisma, Drizzle, Django, TypeORM, golang-migrate).
---
# Database Migration Patterns
Safe, reversible database schema changes for production systems.
## When to Activate
- Creating or altering database tables
- Adding/removing columns or indexes
- Running data migrations (backfill, transform)
- Planning zero-downtime schema changes
- Setting up migration tooling for a new project
## Core Principles
1. **Every change is a migration** — never alter production databases manually
2. **Migrations are forward-only in production** — rollbacks use new forward migrations
3. **Schema and data migrations are separate** — never mix DDL and DML in one migration
4. **Test migrations against production-sized data** — a migration that works on 100 rows may lock on 10M
5. **Migrations are immutable once deployed** — never edit a migration that has run in production
## Migration Safety Checklist
Before applying any migration:
- [ ] Migration has both UP and DOWN (or is explicitly marked irreversible)
- [ ] No full table locks on large tables (use concurrent operations)
- [ ] New columns have defaults or are nullable (never add NOT NULL without default)
- [ ] Indexes created concurrently (not inline with CREATE TABLE for existing tables)
- [ ] Data backfill is a separate migration from schema change
- [ ] Tested against a copy of production data
- [ ] Rollback plan documented
## PostgreSQL Patterns
### Adding a Column Safely
```sql
-- GOOD: Nullable column, no lock
ALTER TABLE users ADD COLUMN avatar_url TEXT;
-- GOOD: Column with default (Postgres 11+ is instant, no rewrite)
ALTER TABLE users ADD COLUMN is_active BOOLEAN NOT NULL DEFAULT true;
-- BAD: NOT NULL without default on existing table (requires full rewrite)
ALTER TABLE users ADD COLUMN role TEXT NOT NULL;
-- This locks the table and rewrites every row
```
### Adding an Index Without Downtime
```sql
-- BAD: Blocks writes on large tables
CREATE INDEX idx_users_email ON users (email);
-- GOOD: Non-blocking, allows concurrent writes
CREATE INDEX CONCURRENTLY idx_users_email ON users (email);
-- Note: CONCURRENTLY cannot run inside a transaction block
-- Most migration tools need special handling for this
```
### Renaming a Column (Zero-Downtime)
Never rename directly in production. Use the expand-contract pattern:
```sql
-- Step 1: Add new column (migration 001)
ALTER TABLE users ADD COLUMN display_name TEXT;
-- Step 2: Backfill data (migration 002, data migration)
UPDATE users SET display_name = username WHERE display_name IS NULL;
-- Step 3: Update application code to read/write both columns
-- Deploy application changes
-- Step 4: Stop writing to old column, drop it (migration 003)
ALTER TABLE users DROP COLUMN username;
```
### Removing a Column Safely
```sql
-- Step 1: Remove all application references to the column
-- Step 2: Deploy application without the column reference
-- Step 3: Drop column in next migration
ALTER TABLE orders DROP COLUMN legacy_status;
-- For Django: use SeparateDatabaseAndState to remove from model
-- without generating DROP COLUMN (then drop in next migration)
```
### Large Data Migrations
```sql
-- BAD: Updates all rows in one transaction (locks table)
UPDATE users SET normalized_email = LOWER(email);
-- GOOD: Batch update with progress
DO $$
DECLARE
batch_size INT := 10000;
rows_updated INT;
BEGIN
LOOP
UPDATE users
SET normalized_email = LOWER(email)
WHERE id IN (
SELECT id FROM users
WHERE normalized_email IS NULL
LIMIT batch_size
FOR UPDATE SKIP LOCKED
);
GET DIAGNOSTICS rows_updated = ROW_COUNT;
RAISE NOTICE 'Updated % rows', rows_updated;
EXIT WHEN rows_updated = 0;
COMMIT;
END LOOP;
END $$;
```
## Prisma (TypeScript/Node.js)
### Workflow
```bash
# Create migration from schema changes
npx prisma migrate dev --name add_user_avatar
# Apply pending migrations in production
npx prisma migrate deploy
# Reset database (dev only)
npx prisma migrate reset
# Generate client after schema changes
npx prisma generate
```
### Schema Example
```prisma
model User {
id String @id @default(cuid())
email String @unique
name String?
avatarUrl String? @map("avatar_url")
createdAt DateTime @default(now()) @map("created_at")
updatedAt DateTime @updatedAt @map("updated_at")
orders Order[]
@@map("users")
@@index([email])
}
```
### Custom SQL Migration
For operations Prisma cannot express (concurrent indexes, data backfills):
```bash
# Create empty migration, then edit the SQL manually
npx prisma migrate dev --create-only --name add_email_index
```
```sql
-- migrations/20240115_add_email_index/migration.sql
-- Prisma cannot generate CONCURRENTLY, so we write it manually
CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_users_email ON users (email);
```
## Drizzle (TypeScript/Node.js)
### Workflow
```bash
# Generate migration from schema changes
npx drizzle-kit generate
# Apply migrations
npx drizzle-kit migrate
# Push schema directly (dev only, no migration file)
npx drizzle-kit push
```
### Schema Example
```typescript
import { pgTable, text, timestamp, uuid, boolean } from "drizzle-orm/pg-core";
export const users = pgTable("users", {
id: uuid("id").primaryKey().defaultRandom(),
email: text("email").notNull().unique(),
name: text("name"),
isActive: boolean("is_active").notNull().default(true),
createdAt: timestamp("created_at").notNull().defaultNow(),
updatedAt: timestamp("updated_at").notNull().defaultNow(),
});
```
## Django (Python)
### Workflow
```bash
# Generate migration from model changes
python manage.py makemigrations
# Apply migrations
python manage.py migrate
# Show migration status
python manage.py showmigrations
# Generate empty migration for custom SQL
python manage.py makemigrations --empty app_name -n description
```
### Data Migration
```python
from django.db import migrations
def backfill_display_names(apps, schema_editor):
User = apps.get_model("accounts", "User")
batch_size = 5000
users = User.objects.filter(display_name="")
while users.exists():
batch = list(users[:batch_size])
for user in batch:
user.display_name = user.username
User.objects.bulk_update(batch, ["display_name"], batch_size=batch_size)
def reverse_backfill(apps, schema_editor):
pass # Data migration, no reverse needed
class Migration(migrations.Migration):
dependencies = [("accounts", "0015_add_display_name")]
operations = [
migrations.RunPython(backfill_display_names, reverse_backfill),
]
```
### SeparateDatabaseAndState
Remove a column from the Django model without dropping it from the database immediately:
```python
class Migration(migrations.Migration):
operations = [
migrations.SeparateDatabaseAndState(
state_operations=[
migrations.RemoveField(model_name="user", name="legacy_field"),
],
database_operations=[], # Don't touch the DB yet
),
]
```
## golang-migrate (Go)
### Workflow
```bash
# Create migration pair
migrate create -ext sql -dir migrations -seq add_user_avatar
# Apply all pending migrations
migrate -path migrations -database "$DATABASE_URL" up
# Rollback last migration
migrate -path migrations -database "$DATABASE_URL" down 1
# Force version (fix dirty state)
migrate -path migrations -database "$DATABASE_URL" force VERSION
```
### Migration Files
```sql
-- migrations/000003_add_user_avatar.up.sql
ALTER TABLE users ADD COLUMN avatar_url TEXT;
CREATE INDEX CONCURRENTLY idx_users_avatar ON users (avatar_url) WHERE avatar_url IS NOT NULL;
-- migrations/000003_add_user_avatar.down.sql
DROP INDEX IF EXISTS idx_users_avatar;
ALTER TABLE users DROP COLUMN IF EXISTS avatar_url;
```
## Zero-Downtime Migration Strategy
For critical production changes, follow the expand-contract pattern:
```
Phase 1: EXPAND
- Add new column/table (nullable or with default)
- Deploy: app writes to BOTH old and new
- Backfill existing data
Phase 2: MIGRATE
- Deploy: app reads from NEW, writes to BOTH
- Verify data consistency
Phase 3: CONTRACT
- Deploy: app only uses NEW
- Drop old column/table in separate migration
```
### Timeline Example
```
Day 1: Migration adds new_status column (nullable)
Day 1: Deploy app v2 — writes to both status and new_status
Day 2: Run backfill migration for existing rows
Day 3: Deploy app v3 — reads from new_status only
Day 7: Migration drops old status column
```
## Anti-Patterns
| Anti-Pattern | Why It Fails | Better Approach |
|-------------|-------------|-----------------|
| Manual SQL in production | No audit trail, unrepeatable | Always use migration files |
| Editing deployed migrations | Causes drift between environments | Create new migration instead |
| NOT NULL without default | Locks table, rewrites all rows | Add nullable, backfill, then add constraint |
| Inline index on large table | Blocks writes during build | CREATE INDEX CONCURRENTLY |
| Schema + data in one migration | Hard to rollback, long transactions | Separate migrations |
| Dropping column before removing code | Application errors on missing column | Remove code first, drop column next deploy |

View File

@@ -0,0 +1,426 @@
---
name: deployment-patterns
description: Deployment workflows, CI/CD pipeline patterns, Docker containerization, health checks, rollback strategies, and production readiness checklists for web applications.
---
# Deployment Patterns
Production deployment workflows and CI/CD best practices.
## When to Activate
- Setting up CI/CD pipelines
- Dockerizing an application
- Planning deployment strategy (blue-green, canary, rolling)
- Implementing health checks and readiness probes
- Preparing for a production release
- Configuring environment-specific settings
## Deployment Strategies
### Rolling Deployment (Default)
Replace instances gradually — old and new versions run simultaneously during rollout.
```
Instance 1: v1 → v2 (update first)
Instance 2: v1 (still running v1)
Instance 3: v1 (still running v1)
Instance 1: v2
Instance 2: v1 → v2 (update second)
Instance 3: v1
Instance 1: v2
Instance 2: v2
Instance 3: v1 → v2 (update last)
```
**Pros:** Zero downtime, gradual rollout
**Cons:** Two versions run simultaneously — requires backward-compatible changes
**Use when:** Standard deployments, backward-compatible changes
### Blue-Green Deployment
Run two identical environments. Switch traffic atomically.
```
Blue (v1) ← traffic
Green (v2) idle, running new version
# After verification:
Blue (v1) idle (becomes standby)
Green (v2) ← traffic
```
**Pros:** Instant rollback (switch back to blue), clean cutover
**Cons:** Requires 2x infrastructure during deployment
**Use when:** Critical services, zero-tolerance for issues
### Canary Deployment
Route a small percentage of traffic to the new version first.
```
v1: 95% of traffic
v2: 5% of traffic (canary)
# If metrics look good:
v1: 50% of traffic
v2: 50% of traffic
# Final:
v2: 100% of traffic
```
**Pros:** Catches issues with real traffic before full rollout
**Cons:** Requires traffic splitting infrastructure, monitoring
**Use when:** High-traffic services, risky changes, feature flags
## Docker
### Multi-Stage Dockerfile (Node.js)
```dockerfile
# Stage 1: Install dependencies
FROM node:22-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --production=false
# Stage 2: Build
FROM node:22-alpine AS builder
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build
RUN npm prune --production
# Stage 3: Production image
FROM node:22-alpine AS runner
WORKDIR /app
RUN addgroup -g 1001 -S appgroup && adduser -S appuser -u 1001
USER appuser
COPY --from=builder --chown=appuser:appgroup /app/node_modules ./node_modules
COPY --from=builder --chown=appuser:appgroup /app/dist ./dist
COPY --from=builder --chown=appuser:appgroup /app/package.json ./
ENV NODE_ENV=production
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
CMD ["node", "dist/server.js"]
```
### Multi-Stage Dockerfile (Go)
```dockerfile
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags="-s -w" -o /server ./cmd/server
FROM alpine:3.19 AS runner
RUN apk --no-cache add ca-certificates
RUN adduser -D -u 1001 appuser
USER appuser
COPY --from=builder /server /server
EXPOSE 8080
HEALTHCHECK --interval=30s --timeout=3s CMD wget -qO- http://localhost:8080/health || exit 1
CMD ["/server"]
```
### Multi-Stage Dockerfile (Python/Django)
```dockerfile
FROM python:3.12-slim AS builder
WORKDIR /app
RUN pip install --no-cache-dir uv
COPY requirements.txt .
RUN uv pip install --system --no-cache -r requirements.txt
FROM python:3.12-slim AS runner
WORKDIR /app
RUN useradd -r -u 1001 appuser
USER appuser
COPY --from=builder /usr/local/lib/python3.12/site-packages /usr/local/lib/python3.12/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
COPY . .
ENV PYTHONUNBUFFERED=1
EXPOSE 8000
HEALTHCHECK --interval=30s --timeout=3s CMD python -c "import urllib.request; urllib.request.urlopen('http://localhost:8000/health/')" || exit 1
CMD ["gunicorn", "config.wsgi:application", "--bind", "0.0.0.0:8000", "--workers", "4"]
```
### Docker Best Practices
```
# GOOD practices
- Use specific version tags (node:22-alpine, not node:latest)
- Multi-stage builds to minimize image size
- Run as non-root user
- Copy dependency files first (layer caching)
- Use .dockerignore to exclude node_modules, .git, tests
- Add HEALTHCHECK instruction
- Set resource limits in docker-compose or k8s
# BAD practices
- Running as root
- Using :latest tags
- Copying entire repo in one COPY layer
- Installing dev dependencies in production image
- Storing secrets in image (use env vars or secrets manager)
```
## CI/CD Pipeline
### GitHub Actions (Standard Pipeline)
```yaml
name: CI/CD
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 22
cache: npm
- run: npm ci
- run: npm run lint
- run: npm run typecheck
- run: npm test -- --coverage
- uses: actions/upload-artifact@v4
if: always()
with:
name: coverage
path: coverage/
build:
needs: test
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- uses: docker/build-push-action@v5
with:
push: true
tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
deploy:
needs: build
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
environment: production
steps:
- name: Deploy to production
run: |
# Platform-specific deployment command
# Railway: railway up
# Vercel: vercel --prod
# K8s: kubectl set image deployment/app app=ghcr.io/${{ github.repository }}:${{ github.sha }}
echo "Deploying ${{ github.sha }}"
```
### Pipeline Stages
```
PR opened:
lint → typecheck → unit tests → integration tests → preview deploy
Merged to main:
lint → typecheck → unit tests → integration tests → build image → deploy staging → smoke tests → deploy production
```
## Health Checks
### Health Check Endpoint
```typescript
// Simple health check
app.get("/health", (req, res) => {
res.status(200).json({ status: "ok" });
});
// Detailed health check (for internal monitoring)
app.get("/health/detailed", async (req, res) => {
const checks = {
database: await checkDatabase(),
redis: await checkRedis(),
externalApi: await checkExternalApi(),
};
const allHealthy = Object.values(checks).every(c => c.status === "ok");
res.status(allHealthy ? 200 : 503).json({
status: allHealthy ? "ok" : "degraded",
timestamp: new Date().toISOString(),
version: process.env.APP_VERSION || "unknown",
uptime: process.uptime(),
checks,
});
});
async function checkDatabase(): Promise<HealthCheck> {
try {
await db.query("SELECT 1");
return { status: "ok", latency_ms: 2 };
} catch (err) {
return { status: "error", message: "Database unreachable" };
}
}
```
### Kubernetes Probes
```yaml
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 10
periodSeconds: 30
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 2
startupProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 0
periodSeconds: 5
failureThreshold: 30 # 30 * 5s = 150s max startup time
```
## Environment Configuration
### Twelve-Factor App Pattern
```bash
# All config via environment variables — never in code
DATABASE_URL=postgres://user:pass@host:5432/db
REDIS_URL=redis://host:6379/0
API_KEY=${API_KEY} # injected by secrets manager
LOG_LEVEL=info
PORT=3000
# Environment-specific behavior
NODE_ENV=production # or staging, development
APP_ENV=production # explicit app environment
```
### Configuration Validation
```typescript
import { z } from "zod";
const envSchema = z.object({
NODE_ENV: z.enum(["development", "staging", "production"]),
PORT: z.coerce.number().default(3000),
DATABASE_URL: z.string().url(),
REDIS_URL: z.string().url(),
JWT_SECRET: z.string().min(32),
LOG_LEVEL: z.enum(["debug", "info", "warn", "error"]).default("info"),
});
// Validate at startup — fail fast if config is wrong
export const env = envSchema.parse(process.env);
```
## Rollback Strategy
### Instant Rollback
```bash
# Docker/Kubernetes: point to previous image
kubectl rollout undo deployment/app
# Vercel: promote previous deployment
vercel rollback
# Railway: redeploy previous commit
railway up --commit <previous-sha>
# Database: rollback migration (if reversible)
npx prisma migrate resolve --rolled-back <migration-name>
```
### Rollback Checklist
- [ ] Previous image/artifact is available and tagged
- [ ] Database migrations are backward-compatible (no destructive changes)
- [ ] Feature flags can disable new features without deploy
- [ ] Monitoring alerts configured for error rate spikes
- [ ] Rollback tested in staging before production release
## Production Readiness Checklist
Before any production deployment:
### Application
- [ ] All tests pass (unit, integration, E2E)
- [ ] No hardcoded secrets in code or config files
- [ ] Error handling covers all edge cases
- [ ] Logging is structured (JSON) and does not contain PII
- [ ] Health check endpoint returns meaningful status
### Infrastructure
- [ ] Docker image builds reproducibly (pinned versions)
- [ ] Environment variables documented and validated at startup
- [ ] Resource limits set (CPU, memory)
- [ ] Horizontal scaling configured (min/max instances)
- [ ] SSL/TLS enabled on all endpoints
### Monitoring
- [ ] Application metrics exported (request rate, latency, errors)
- [ ] Alerts configured for error rate > threshold
- [ ] Log aggregation set up (structured logs, searchable)
- [ ] Uptime monitoring on health endpoint
### Security
- [ ] Dependencies scanned for CVEs
- [ ] CORS configured for allowed origins only
- [ ] Rate limiting enabled on public endpoints
- [ ] Authentication and authorization verified
- [ ] Security headers set (CSP, HSTS, X-Frame-Options)
### Operations
- [ ] Rollback plan documented and tested
- [ ] Database migration tested against production-sized data
- [ ] Runbook for common failure scenarios
- [ ] On-call rotation and escalation path defined

View File

@@ -0,0 +1,733 @@
---
name: django-patterns
description: Django architecture patterns, REST API design with DRF, ORM best practices, caching, signals, middleware, and production-grade Django apps.
---
# Django Development Patterns
Production-grade Django architecture patterns for scalable, maintainable applications.
## When to Activate
- Building Django web applications
- Designing Django REST Framework APIs
- Working with Django ORM and models
- Setting up Django project structure
- Implementing caching, signals, middleware
## Project Structure
### Recommended Layout
```
myproject/
├── config/
│ ├── __init__.py
│ ├── settings/
│ │ ├── __init__.py
│ │ ├── base.py # Base settings
│ │ ├── development.py # Dev settings
│ │ ├── production.py # Production settings
│ │ └── test.py # Test settings
│ ├── urls.py
│ ├── wsgi.py
│ └── asgi.py
├── manage.py
└── apps/
├── __init__.py
├── users/
│ ├── __init__.py
│ ├── models.py
│ ├── views.py
│ ├── serializers.py
│ ├── urls.py
│ ├── permissions.py
│ ├── filters.py
│ ├── services.py
│ └── tests/
└── products/
└── ...
```
### Split Settings Pattern
```python
# config/settings/base.py
from pathlib import Path
BASE_DIR = Path(__file__).resolve().parent.parent.parent
SECRET_KEY = env('DJANGO_SECRET_KEY')
DEBUG = False
ALLOWED_HOSTS = []
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'rest_framework',
'rest_framework.authtoken',
'corsheaders',
# Local apps
'apps.users',
'apps.products',
]
MIDDLEWARE = [
'django.middleware.security.SecurityMiddleware',
'whitenoise.middleware.WhiteNoiseMiddleware',
'django.contrib.sessions.middleware.SessionMiddleware',
'corsheaders.middleware.CorsMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
]
ROOT_URLCONF = 'config.urls'
WSGI_APPLICATION = 'config.wsgi.application'
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql',
'NAME': env('DB_NAME'),
'USER': env('DB_USER'),
'PASSWORD': env('DB_PASSWORD'),
'HOST': env('DB_HOST'),
'PORT': env('DB_PORT', default='5432'),
}
}
# config/settings/development.py
from .base import *
DEBUG = True
ALLOWED_HOSTS = ['localhost', '127.0.0.1']
DATABASES['default']['NAME'] = 'myproject_dev'
INSTALLED_APPS += ['debug_toolbar']
MIDDLEWARE += ['debug_toolbar.middleware.DebugToolbarMiddleware']
EMAIL_BACKEND = 'django.core.mail.backends.console.EmailBackend'
# config/settings/production.py
from .base import *
DEBUG = False
ALLOWED_HOSTS = env.list('ALLOWED_HOSTS')
SECURE_SSL_REDIRECT = True
SESSION_COOKIE_SECURE = True
CSRF_COOKIE_SECURE = True
SECURE_HSTS_SECONDS = 31536000
SECURE_HSTS_INCLUDE_SUBDOMAINS = True
SECURE_HSTS_PRELOAD = True
# Logging
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'handlers': {
'file': {
'level': 'WARNING',
'class': 'logging.FileHandler',
'filename': '/var/log/django/django.log',
},
},
'loggers': {
'django': {
'handlers': ['file'],
'level': 'WARNING',
'propagate': True,
},
},
}
```
## Model Design Patterns
### Model Best Practices
```python
from django.db import models
from django.contrib.auth.models import AbstractUser
from django.core.validators import MinValueValidator, MaxValueValidator
class User(AbstractUser):
"""Custom user model extending AbstractUser."""
email = models.EmailField(unique=True)
phone = models.CharField(max_length=20, blank=True)
birth_date = models.DateField(null=True, blank=True)
USERNAME_FIELD = 'email'
REQUIRED_FIELDS = ['username']
class Meta:
db_table = 'users'
verbose_name = 'user'
verbose_name_plural = 'users'
ordering = ['-date_joined']
def __str__(self):
return self.email
def get_full_name(self):
return f"{self.first_name} {self.last_name}".strip()
class Product(models.Model):
"""Product model with proper field configuration."""
name = models.CharField(max_length=200)
slug = models.SlugField(unique=True, max_length=250)
description = models.TextField(blank=True)
price = models.DecimalField(
max_digits=10,
decimal_places=2,
validators=[MinValueValidator(0)]
)
stock = models.PositiveIntegerField(default=0)
is_active = models.BooleanField(default=True)
category = models.ForeignKey(
'Category',
on_delete=models.CASCADE,
related_name='products'
)
tags = models.ManyToManyField('Tag', blank=True, related_name='products')
created_at = models.DateTimeField(auto_now_add=True)
updated_at = models.DateTimeField(auto_now=True)
class Meta:
db_table = 'products'
ordering = ['-created_at']
indexes = [
models.Index(fields=['slug']),
models.Index(fields=['-created_at']),
models.Index(fields=['category', 'is_active']),
]
constraints = [
models.CheckConstraint(
check=models.Q(price__gte=0),
name='price_non_negative'
)
]
def __str__(self):
return self.name
def save(self, *args, **kwargs):
if not self.slug:
self.slug = slugify(self.name)
super().save(*args, **kwargs)
```
### QuerySet Best Practices
```python
from django.db import models
class ProductQuerySet(models.QuerySet):
"""Custom QuerySet for Product model."""
def active(self):
"""Return only active products."""
return self.filter(is_active=True)
def with_category(self):
"""Select related category to avoid N+1 queries."""
return self.select_related('category')
def with_tags(self):
"""Prefetch tags for many-to-many relationship."""
return self.prefetch_related('tags')
def in_stock(self):
"""Return products with stock > 0."""
return self.filter(stock__gt=0)
def search(self, query):
"""Search products by name or description."""
return self.filter(
models.Q(name__icontains=query) |
models.Q(description__icontains=query)
)
class Product(models.Model):
# ... fields ...
objects = ProductQuerySet.as_manager() # Use custom QuerySet
# Usage
Product.objects.active().with_category().in_stock()
```
### Manager Methods
```python
class ProductManager(models.Manager):
"""Custom manager for complex queries."""
def get_or_none(self, **kwargs):
"""Return object or None instead of DoesNotExist."""
try:
return self.get(**kwargs)
except self.model.DoesNotExist:
return None
def create_with_tags(self, name, price, tag_names):
"""Create product with associated tags."""
product = self.create(name=name, price=price)
tags = [Tag.objects.get_or_create(name=name)[0] for name in tag_names]
product.tags.set(tags)
return product
def bulk_update_stock(self, product_ids, quantity):
"""Bulk update stock for multiple products."""
return self.filter(id__in=product_ids).update(stock=quantity)
# In model
class Product(models.Model):
# ... fields ...
custom = ProductManager()
```
## Django REST Framework Patterns
### Serializer Patterns
```python
from rest_framework import serializers
from django.contrib.auth.password_validation import validate_password
from .models import Product, User
class ProductSerializer(serializers.ModelSerializer):
"""Serializer for Product model."""
category_name = serializers.CharField(source='category.name', read_only=True)
average_rating = serializers.FloatField(read_only=True)
discount_price = serializers.SerializerMethodField()
class Meta:
model = Product
fields = [
'id', 'name', 'slug', 'description', 'price',
'discount_price', 'stock', 'category_name',
'average_rating', 'created_at'
]
read_only_fields = ['id', 'slug', 'created_at']
def get_discount_price(self, obj):
"""Calculate discount price if applicable."""
if hasattr(obj, 'discount') and obj.discount:
return obj.price * (1 - obj.discount.percent / 100)
return obj.price
def validate_price(self, value):
"""Ensure price is non-negative."""
if value < 0:
raise serializers.ValidationError("Price cannot be negative.")
return value
class ProductCreateSerializer(serializers.ModelSerializer):
"""Serializer for creating products."""
class Meta:
model = Product
fields = ['name', 'description', 'price', 'stock', 'category']
def validate(self, data):
"""Custom validation for multiple fields."""
if data['price'] > 10000 and data['stock'] > 100:
raise serializers.ValidationError(
"Cannot have high-value products with large stock."
)
return data
class UserRegistrationSerializer(serializers.ModelSerializer):
"""Serializer for user registration."""
password = serializers.CharField(
write_only=True,
required=True,
validators=[validate_password],
style={'input_type': 'password'}
)
password_confirm = serializers.CharField(write_only=True, style={'input_type': 'password'})
class Meta:
model = User
fields = ['email', 'username', 'password', 'password_confirm']
def validate(self, data):
"""Validate passwords match."""
if data['password'] != data['password_confirm']:
raise serializers.ValidationError({
"password_confirm": "Password fields didn't match."
})
return data
def create(self, validated_data):
"""Create user with hashed password."""
validated_data.pop('password_confirm')
password = validated_data.pop('password')
user = User.objects.create(**validated_data)
user.set_password(password)
user.save()
return user
```
### ViewSet Patterns
```python
from rest_framework import viewsets, status, filters
from rest_framework.decorators import action
from rest_framework.response import Response
from rest_framework.permissions import IsAuthenticated, IsAdminUser
from django_filters.rest_framework import DjangoFilterBackend
from .models import Product
from .serializers import ProductSerializer, ProductCreateSerializer
from .permissions import IsOwnerOrReadOnly
from .filters import ProductFilter
from .services import ProductService
class ProductViewSet(viewsets.ModelViewSet):
"""ViewSet for Product model."""
queryset = Product.objects.select_related('category').prefetch_related('tags')
permission_classes = [IsAuthenticated, IsOwnerOrReadOnly]
filter_backends = [DjangoFilterBackend, filters.SearchFilter, filters.OrderingFilter]
filterset_class = ProductFilter
search_fields = ['name', 'description']
ordering_fields = ['price', 'created_at', 'name']
ordering = ['-created_at']
def get_serializer_class(self):
"""Return appropriate serializer based on action."""
if self.action == 'create':
return ProductCreateSerializer
return ProductSerializer
def perform_create(self, serializer):
"""Save with user context."""
serializer.save(created_by=self.request.user)
@action(detail=False, methods=['get'])
def featured(self, request):
"""Return featured products."""
featured = self.queryset.filter(is_featured=True)[:10]
serializer = self.get_serializer(featured, many=True)
return Response(serializer.data)
@action(detail=True, methods=['post'])
def purchase(self, request, pk=None):
"""Purchase a product."""
product = self.get_object()
service = ProductService()
result = service.purchase(product, request.user)
return Response(result, status=status.HTTP_201_CREATED)
@action(detail=False, methods=['get'], permission_classes=[IsAuthenticated])
def my_products(self, request):
"""Return products created by current user."""
products = self.queryset.filter(created_by=request.user)
page = self.paginate_queryset(products)
serializer = self.get_serializer(page, many=True)
return self.get_paginated_response(serializer.data)
```
### Custom Actions
```python
from rest_framework.decorators import api_view, permission_classes
from rest_framework.permissions import IsAuthenticated
from rest_framework.response import Response
@api_view(['POST'])
@permission_classes([IsAuthenticated])
def add_to_cart(request):
"""Add product to user cart."""
product_id = request.data.get('product_id')
quantity = request.data.get('quantity', 1)
try:
product = Product.objects.get(id=product_id)
except Product.DoesNotExist:
return Response(
{'error': 'Product not found'},
status=status.HTTP_404_NOT_FOUND
)
cart, _ = Cart.objects.get_or_create(user=request.user)
CartItem.objects.create(
cart=cart,
product=product,
quantity=quantity
)
return Response({'message': 'Added to cart'}, status=status.HTTP_201_CREATED)
```
## Service Layer Pattern
```python
# apps/orders/services.py
from typing import Optional
from django.db import transaction
from .models import Order, OrderItem
class OrderService:
"""Service layer for order-related business logic."""
@staticmethod
@transaction.atomic
def create_order(user, cart: Cart) -> Order:
"""Create order from cart."""
order = Order.objects.create(
user=user,
total_price=cart.total_price
)
for item in cart.items.all():
OrderItem.objects.create(
order=order,
product=item.product,
quantity=item.quantity,
price=item.product.price
)
# Clear cart
cart.items.all().delete()
return order
@staticmethod
def process_payment(order: Order, payment_data: dict) -> bool:
"""Process payment for order."""
# Integration with payment gateway
payment = PaymentGateway.charge(
amount=order.total_price,
token=payment_data['token']
)
if payment.success:
order.status = Order.Status.PAID
order.save()
# Send confirmation email
OrderService.send_confirmation_email(order)
return True
return False
@staticmethod
def send_confirmation_email(order: Order):
"""Send order confirmation email."""
# Email sending logic
pass
```
## Caching Strategies
### View-Level Caching
```python
from django.views.decorators.cache import cache_page
from django.utils.decorators import method_decorator
@method_decorator(cache_page(60 * 15), name='dispatch') # 15 minutes
class ProductListView(generic.ListView):
model = Product
template_name = 'products/list.html'
context_object_name = 'products'
```
### Template Fragment Caching
```django
{% load cache %}
{% cache 500 sidebar %}
... expensive sidebar content ...
{% endcache %}
```
### Low-Level Caching
```python
from django.core.cache import cache
def get_featured_products():
"""Get featured products with caching."""
cache_key = 'featured_products'
products = cache.get(cache_key)
if products is None:
products = list(Product.objects.filter(is_featured=True))
cache.set(cache_key, products, timeout=60 * 15) # 15 minutes
return products
```
### QuerySet Caching
```python
from django.core.cache import cache
def get_popular_categories():
cache_key = 'popular_categories'
categories = cache.get(cache_key)
if categories is None:
categories = list(Category.objects.annotate(
product_count=Count('products')
).filter(product_count__gt=10).order_by('-product_count')[:20])
cache.set(cache_key, categories, timeout=60 * 60) # 1 hour
return categories
```
## Signals
### Signal Patterns
```python
# apps/users/signals.py
from django.db.models.signals import post_save
from django.dispatch import receiver
from django.contrib.auth import get_user_model
from .models import Profile
User = get_user_model()
@receiver(post_save, sender=User)
def create_user_profile(sender, instance, created, **kwargs):
"""Create profile when user is created."""
if created:
Profile.objects.create(user=instance)
@receiver(post_save, sender=User)
def save_user_profile(sender, instance, **kwargs):
"""Save profile when user is saved."""
instance.profile.save()
# apps/users/apps.py
from django.apps import AppConfig
class UsersConfig(AppConfig):
default_auto_field = 'django.db.models.BigAutoField'
name = 'apps.users'
def ready(self):
"""Import signals when app is ready."""
import apps.users.signals
```
## Middleware
### Custom Middleware
```python
# middleware/active_user_middleware.py
import time
from django.utils.deprecation import MiddlewareMixin
class ActiveUserMiddleware(MiddlewareMixin):
"""Middleware to track active users."""
def process_request(self, request):
"""Process incoming request."""
if request.user.is_authenticated:
# Update last active time
request.user.last_active = timezone.now()
request.user.save(update_fields=['last_active'])
class RequestLoggingMiddleware(MiddlewareMixin):
"""Middleware for logging requests."""
def process_request(self, request):
"""Log request start time."""
request.start_time = time.time()
def process_response(self, request, response):
"""Log request duration."""
if hasattr(request, 'start_time'):
duration = time.time() - request.start_time
logger.info(f'{request.method} {request.path} - {response.status_code} - {duration:.3f}s')
return response
```
## Performance Optimization
### N+1 Query Prevention
```python
# Bad - N+1 queries
products = Product.objects.all()
for product in products:
print(product.category.name) # Separate query for each product
# Good - Single query with select_related
products = Product.objects.select_related('category').all()
for product in products:
print(product.category.name)
# Good - Prefetch for many-to-many
products = Product.objects.prefetch_related('tags').all()
for product in products:
for tag in product.tags.all():
print(tag.name)
```
### Database Indexing
```python
class Product(models.Model):
name = models.CharField(max_length=200, db_index=True)
slug = models.SlugField(unique=True)
category = models.ForeignKey('Category', on_delete=models.CASCADE)
created_at = models.DateTimeField(auto_now_add=True)
class Meta:
indexes = [
models.Index(fields=['name']),
models.Index(fields=['-created_at']),
models.Index(fields=['category', 'created_at']),
]
```
### Bulk Operations
```python
# Bulk create
Product.objects.bulk_create([
Product(name=f'Product {i}', price=10.00)
for i in range(1000)
])
# Bulk update
products = Product.objects.all()[:100]
for product in products:
product.is_active = True
Product.objects.bulk_update(products, ['is_active'])
# Bulk delete
Product.objects.filter(stock=0).delete()
```
## Quick Reference
| Pattern | Description |
|---------|-------------|
| Split settings | Separate dev/prod/test settings |
| Custom QuerySet | Reusable query methods |
| Service Layer | Business logic separation |
| ViewSet | REST API endpoints |
| Serializer validation | Request/response transformation |
| select_related | Foreign key optimization |
| prefetch_related | Many-to-many optimization |
| Cache first | Cache expensive operations |
| Signals | Event-driven actions |
| Middleware | Request/response processing |
Remember: Django provides many shortcuts, but for production applications, structure and organization matter more than concise code. Build for maintainability.

View File

@@ -0,0 +1,592 @@
---
name: django-security
description: Django security best practices, authentication, authorization, CSRF protection, SQL injection prevention, XSS prevention, and secure deployment configurations.
---
# Django Security Best Practices
Comprehensive security guidelines for Django applications to protect against common vulnerabilities.
## When to Activate
- Setting up Django authentication and authorization
- Implementing user permissions and roles
- Configuring production security settings
- Reviewing Django application for security issues
- Deploying Django applications to production
## Core Security Settings
### Production Settings Configuration
```python
# settings/production.py
import os
DEBUG = False # CRITICAL: Never use True in production
ALLOWED_HOSTS = os.environ.get('ALLOWED_HOSTS', '').split(',')
# Security headers
SECURE_SSL_REDIRECT = True
SESSION_COOKIE_SECURE = True
CSRF_COOKIE_SECURE = True
SECURE_HSTS_SECONDS = 31536000 # 1 year
SECURE_HSTS_INCLUDE_SUBDOMAINS = True
SECURE_HSTS_PRELOAD = True
SECURE_CONTENT_TYPE_NOSNIFF = True
SECURE_BROWSER_XSS_FILTER = True
X_FRAME_OPTIONS = 'DENY'
# HTTPS and Cookies
SESSION_COOKIE_HTTPONLY = True
CSRF_COOKIE_HTTPONLY = True
SESSION_COOKIE_SAMESITE = 'Lax'
CSRF_COOKIE_SAMESITE = 'Lax'
# Secret key (must be set via environment variable)
SECRET_KEY = os.environ.get('DJANGO_SECRET_KEY')
if not SECRET_KEY:
raise ImproperlyConfigured('DJANGO_SECRET_KEY environment variable is required')
# Password validation
AUTH_PASSWORD_VALIDATORS = [
{
'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
'OPTIONS': {
'min_length': 12,
}
},
{
'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
},
{
'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
},
]
```
## Authentication
### Custom User Model
```python
# apps/users/models.py
from django.contrib.auth.models import AbstractUser
from django.db import models
class User(AbstractUser):
"""Custom user model for better security."""
email = models.EmailField(unique=True)
phone = models.CharField(max_length=20, blank=True)
USERNAME_FIELD = 'email' # Use email as username
REQUIRED_FIELDS = ['username']
class Meta:
db_table = 'users'
verbose_name = 'User'
verbose_name_plural = 'Users'
def __str__(self):
return self.email
# settings/base.py
AUTH_USER_MODEL = 'users.User'
```
### Password Hashing
```python
# Django uses PBKDF2 by default. For stronger security:
PASSWORD_HASHERS = [
'django.contrib.auth.hashers.Argon2PasswordHasher',
'django.contrib.auth.hashers.PBKDF2PasswordHasher',
'django.contrib.auth.hashers.PBKDF2SHA1PasswordHasher',
'django.contrib.auth.hashers.BCryptSHA256PasswordHasher',
]
```
### Session Management
```python
# Session configuration
SESSION_ENGINE = 'django.contrib.sessions.backends.cache' # Or 'db'
SESSION_CACHE_ALIAS = 'default'
SESSION_COOKIE_AGE = 3600 * 24 * 7 # 1 week
SESSION_SAVE_EVERY_REQUEST = False
SESSION_EXPIRE_AT_BROWSER_CLOSE = False # Better UX, but less secure
```
## Authorization
### Permissions
```python
# models.py
from django.db import models
from django.contrib.auth.models import Permission
class Post(models.Model):
title = models.CharField(max_length=200)
content = models.TextField()
author = models.ForeignKey(User, on_delete=models.CASCADE)
class Meta:
permissions = [
('can_publish', 'Can publish posts'),
('can_edit_others', 'Can edit posts of others'),
]
def user_can_edit(self, user):
"""Check if user can edit this post."""
return self.author == user or user.has_perm('app.can_edit_others')
# views.py
from django.contrib.auth.mixins import LoginRequiredMixin, PermissionRequiredMixin
from django.views.generic import UpdateView
class PostUpdateView(LoginRequiredMixin, PermissionRequiredMixin, UpdateView):
model = Post
permission_required = 'app.can_edit_others'
raise_exception = True # Return 403 instead of redirect
def get_queryset(self):
"""Only allow users to edit their own posts."""
return Post.objects.filter(author=self.request.user)
```
### Custom Permissions
```python
# permissions.py
from rest_framework import permissions
class IsOwnerOrReadOnly(permissions.BasePermission):
"""Allow only owners to edit objects."""
def has_object_permission(self, request, view, obj):
# Read permissions allowed for any request
if request.method in permissions.SAFE_METHODS:
return True
# Write permissions only for owner
return obj.author == request.user
class IsAdminOrReadOnly(permissions.BasePermission):
"""Allow admins to do anything, others read-only."""
def has_permission(self, request, view):
if request.method in permissions.SAFE_METHODS:
return True
return request.user and request.user.is_staff
class IsVerifiedUser(permissions.BasePermission):
"""Allow only verified users."""
def has_permission(self, request, view):
return request.user and request.user.is_authenticated and request.user.is_verified
```
### Role-Based Access Control (RBAC)
```python
# models.py
from django.contrib.auth.models import AbstractUser, Group
class User(AbstractUser):
ROLE_CHOICES = [
('admin', 'Administrator'),
('moderator', 'Moderator'),
('user', 'Regular User'),
]
role = models.CharField(max_length=20, choices=ROLE_CHOICES, default='user')
def is_admin(self):
return self.role == 'admin' or self.is_superuser
def is_moderator(self):
return self.role in ['admin', 'moderator']
# Mixins
class AdminRequiredMixin:
"""Mixin to require admin role."""
def dispatch(self, request, *args, **kwargs):
if not request.user.is_authenticated or not request.user.is_admin():
from django.core.exceptions import PermissionDenied
raise PermissionDenied
return super().dispatch(request, *args, **kwargs)
```
## SQL Injection Prevention
### Django ORM Protection
```python
# GOOD: Django ORM automatically escapes parameters
def get_user(username):
return User.objects.get(username=username) # Safe
# GOOD: Using parameters with raw()
def search_users(query):
return User.objects.raw('SELECT * FROM users WHERE username = %s', [query])
# BAD: Never directly interpolate user input
def get_user_bad(username):
return User.objects.raw(f'SELECT * FROM users WHERE username = {username}') # VULNERABLE!
# GOOD: Using filter with proper escaping
def get_users_by_email(email):
return User.objects.filter(email__iexact=email) # Safe
# GOOD: Using Q objects for complex queries
from django.db.models import Q
def search_users_complex(query):
return User.objects.filter(
Q(username__icontains=query) |
Q(email__icontains=query)
) # Safe
```
### Extra Security with raw()
```python
# If you must use raw SQL, always use parameters
User.objects.raw(
'SELECT * FROM users WHERE email = %s AND status = %s',
[user_input_email, status]
)
```
## XSS Prevention
### Template Escaping
```django
{# Django auto-escapes variables by default - SAFE #}
{{ user_input }} {# Escaped HTML #}
{# Explicitly mark safe only for trusted content #}
{{ trusted_html|safe }} {# Not escaped #}
{# Use template filters for safe HTML #}
{{ user_input|escape }} {# Same as default #}
{{ user_input|striptags }} {# Remove all HTML tags #}
{# JavaScript escaping #}
<script>
var username = {{ username|escapejs }};
</script>
```
### Safe String Handling
```python
from django.utils.safestring import mark_safe
from django.utils.html import escape
# BAD: Never mark user input as safe without escaping
def render_bad(user_input):
return mark_safe(user_input) # VULNERABLE!
# GOOD: Escape first, then mark safe
def render_good(user_input):
return mark_safe(escape(user_input))
# GOOD: Use format_html for HTML with variables
from django.utils.html import format_html
def greet_user(username):
return format_html('<span class="user">{}</span>', escape(username))
```
### HTTP Headers
```python
# settings.py
SECURE_CONTENT_TYPE_NOSNIFF = True # Prevent MIME sniffing
SECURE_BROWSER_XSS_FILTER = True # Enable XSS filter
X_FRAME_OPTIONS = 'DENY' # Prevent clickjacking
# Custom middleware
from django.conf import settings
class SecurityHeaderMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
response = self.get_response(request)
response['X-Content-Type-Options'] = 'nosniff'
response['X-Frame-Options'] = 'DENY'
response['X-XSS-Protection'] = '1; mode=block'
response['Content-Security-Policy'] = "default-src 'self'"
return response
```
## CSRF Protection
### Default CSRF Protection
```python
# settings.py - CSRF is enabled by default
CSRF_COOKIE_SECURE = True # Only send over HTTPS
CSRF_COOKIE_HTTPONLY = True # Prevent JavaScript access
CSRF_COOKIE_SAMESITE = 'Lax' # Prevent CSRF in some cases
CSRF_TRUSTED_ORIGINS = ['https://example.com'] # Trusted domains
# Template usage
<form method="post">
{% csrf_token %}
{{ form.as_p }}
<button type="submit">Submit</button>
</form>
# AJAX requests
function getCookie(name) {
let cookieValue = null;
if (document.cookie && document.cookie !== '') {
const cookies = document.cookie.split(';');
for (let i = 0; i < cookies.length; i++) {
const cookie = cookies[i].trim();
if (cookie.substring(0, name.length + 1) === (name + '=')) {
cookieValue = decodeURIComponent(cookie.substring(name.length + 1));
break;
}
}
}
return cookieValue;
}
fetch('/api/endpoint/', {
method: 'POST',
headers: {
'X-CSRFToken': getCookie('csrftoken'),
'Content-Type': 'application/json',
},
body: JSON.stringify(data)
});
```
### Exempting Views (Use Carefully)
```python
from django.views.decorators.csrf import csrf_exempt
@csrf_exempt # Only use when absolutely necessary!
def webhook_view(request):
# Webhook from external service
pass
```
## File Upload Security
### File Validation
```python
import os
from django.core.exceptions import ValidationError
def validate_file_extension(value):
"""Validate file extension."""
ext = os.path.splitext(value.name)[1]
valid_extensions = ['.jpg', '.jpeg', '.png', '.gif', '.pdf']
if not ext.lower() in valid_extensions:
raise ValidationError('Unsupported file extension.')
def validate_file_size(value):
"""Validate file size (max 5MB)."""
filesize = value.size
if filesize > 5 * 1024 * 1024:
raise ValidationError('File too large. Max size is 5MB.')
# models.py
class Document(models.Model):
file = models.FileField(
upload_to='documents/',
validators=[validate_file_extension, validate_file_size]
)
```
### Secure File Storage
```python
# settings.py
MEDIA_ROOT = '/var/www/media/'
MEDIA_URL = '/media/'
# Use a separate domain for media in production
MEDIA_DOMAIN = 'https://media.example.com'
# Don't serve user uploads directly
# Use whitenoise or a CDN for static files
# Use a separate server or S3 for media files
```
## API Security
### Rate Limiting
```python
# settings.py
REST_FRAMEWORK = {
'DEFAULT_THROTTLE_CLASSES': [
'rest_framework.throttling.AnonRateThrottle',
'rest_framework.throttling.UserRateThrottle'
],
'DEFAULT_THROTTLE_RATES': {
'anon': '100/day',
'user': '1000/day',
'upload': '10/hour',
}
}
# Custom throttle
from rest_framework.throttling import UserRateThrottle
class BurstRateThrottle(UserRateThrottle):
scope = 'burst'
rate = '60/min'
class SustainedRateThrottle(UserRateThrottle):
scope = 'sustained'
rate = '1000/day'
```
### Authentication for APIs
```python
# settings.py
REST_FRAMEWORK = {
'DEFAULT_AUTHENTICATION_CLASSES': [
'rest_framework.authentication.TokenAuthentication',
'rest_framework.authentication.SessionAuthentication',
'rest_framework_simplejwt.authentication.JWTAuthentication',
],
'DEFAULT_PERMISSION_CLASSES': [
'rest_framework.permissions.IsAuthenticated',
],
}
# views.py
from rest_framework.decorators import api_view, permission_classes
from rest_framework.permissions import IsAuthenticated
@api_view(['GET', 'POST'])
@permission_classes([IsAuthenticated])
def protected_view(request):
return Response({'message': 'You are authenticated'})
```
## Security Headers
### Content Security Policy
```python
# settings.py
CSP_DEFAULT_SRC = "'self'"
CSP_SCRIPT_SRC = "'self' https://cdn.example.com"
CSP_STYLE_SRC = "'self' 'unsafe-inline'"
CSP_IMG_SRC = "'self' data: https:"
CSP_CONNECT_SRC = "'self' https://api.example.com"
# Middleware
class CSPMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
response = self.get_response(request)
response['Content-Security-Policy'] = (
f"default-src {CSP_DEFAULT_SRC}; "
f"script-src {CSP_SCRIPT_SRC}; "
f"style-src {CSP_STYLE_SRC}; "
f"img-src {CSP_IMG_SRC}; "
f"connect-src {CSP_CONNECT_SRC}"
)
return response
```
## Environment Variables
### Managing Secrets
```python
# Use python-decouple or django-environ
import environ
env = environ.Env(
# set casting, default value
DEBUG=(bool, False)
)
# reading .env file
environ.Env.read_env()
SECRET_KEY = env('DJANGO_SECRET_KEY')
DATABASE_URL = env('DATABASE_URL')
ALLOWED_HOSTS = env.list('ALLOWED_HOSTS')
# .env file (never commit this)
DEBUG=False
SECRET_KEY=your-secret-key-here
DATABASE_URL=postgresql://user:password@localhost:5432/dbname
ALLOWED_HOSTS=example.com,www.example.com
```
## Logging Security Events
```python
# settings.py
LOGGING = {
'version': 1,
'disable_existing_loggers': False,
'handlers': {
'file': {
'level': 'WARNING',
'class': 'logging.FileHandler',
'filename': '/var/log/django/security.log',
},
'console': {
'level': 'INFO',
'class': 'logging.StreamHandler',
},
},
'loggers': {
'django.security': {
'handlers': ['file', 'console'],
'level': 'WARNING',
'propagate': True,
},
'django.request': {
'handlers': ['file'],
'level': 'ERROR',
'propagate': False,
},
},
}
```
## Quick Security Checklist
| Check | Description |
|-------|-------------|
| `DEBUG = False` | Never run with DEBUG in production |
| HTTPS only | Force SSL, secure cookies |
| Strong secrets | Use environment variables for SECRET_KEY |
| Password validation | Enable all password validators |
| CSRF protection | Enabled by default, don't disable |
| XSS prevention | Django auto-escapes, don't use `&#124;safe` with user input |
| SQL injection | Use ORM, never concatenate strings in queries |
| File uploads | Validate file type and size |
| Rate limiting | Throttle API endpoints |
| Security headers | CSP, X-Frame-Options, HSTS |
| Logging | Log security events |
| Updates | Keep Django and dependencies updated |
Remember: Security is a process, not a product. Regularly review and update your security practices.

728
skills/django-tdd/SKILL.md Normal file
View File

@@ -0,0 +1,728 @@
---
name: django-tdd
description: Django testing strategies with pytest-django, TDD methodology, factory_boy, mocking, coverage, and testing Django REST Framework APIs.
---
# Django Testing with TDD
Test-driven development for Django applications using pytest, factory_boy, and Django REST Framework.
## When to Activate
- Writing new Django applications
- Implementing Django REST Framework APIs
- Testing Django models, views, and serializers
- Setting up testing infrastructure for Django projects
## TDD Workflow for Django
### Red-Green-Refactor Cycle
```python
# Step 1: RED - Write failing test
def test_user_creation():
user = User.objects.create_user(email='test@example.com', password='testpass123')
assert user.email == 'test@example.com'
assert user.check_password('testpass123')
assert not user.is_staff
# Step 2: GREEN - Make test pass
# Create User model or factory
# Step 3: REFACTOR - Improve while keeping tests green
```
## Setup
### pytest Configuration
```ini
# pytest.ini
[pytest]
DJANGO_SETTINGS_MODULE = config.settings.test
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
addopts =
--reuse-db
--nomigrations
--cov=apps
--cov-report=html
--cov-report=term-missing
--strict-markers
markers =
slow: marks tests as slow
integration: marks tests as integration tests
```
### Test Settings
```python
# config/settings/test.py
from .base import *
DEBUG = True
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.sqlite3',
'NAME': ':memory:',
}
}
# Disable migrations for speed
class DisableMigrations:
def __contains__(self, item):
return True
def __getitem__(self, item):
return None
MIGRATION_MODULES = DisableMigrations()
# Faster password hashing
PASSWORD_HASHERS = [
'django.contrib.auth.hashers.MD5PasswordHasher',
]
# Email backend
EMAIL_BACKEND = 'django.core.mail.backends.console.EmailBackend'
# Celery always eager
CELERY_TASK_ALWAYS_EAGER = True
CELERY_TASK_EAGER_PROPAGATES = True
```
### conftest.py
```python
# tests/conftest.py
import pytest
from django.utils import timezone
from django.contrib.auth import get_user_model
User = get_user_model()
@pytest.fixture(autouse=True)
def timezone_settings(settings):
"""Ensure consistent timezone."""
settings.TIME_ZONE = 'UTC'
@pytest.fixture
def user(db):
"""Create a test user."""
return User.objects.create_user(
email='test@example.com',
password='testpass123',
username='testuser'
)
@pytest.fixture
def admin_user(db):
"""Create an admin user."""
return User.objects.create_superuser(
email='admin@example.com',
password='adminpass123',
username='admin'
)
@pytest.fixture
def authenticated_client(client, user):
"""Return authenticated client."""
client.force_login(user)
return client
@pytest.fixture
def api_client():
"""Return DRF API client."""
from rest_framework.test import APIClient
return APIClient()
@pytest.fixture
def authenticated_api_client(api_client, user):
"""Return authenticated API client."""
api_client.force_authenticate(user=user)
return api_client
```
## Factory Boy
### Factory Setup
```python
# tests/factories.py
import factory
from factory import fuzzy
from datetime import datetime, timedelta
from django.contrib.auth import get_user_model
from apps.products.models import Product, Category
User = get_user_model()
class UserFactory(factory.django.DjangoModelFactory):
"""Factory for User model."""
class Meta:
model = User
email = factory.Sequence(lambda n: f"user{n}@example.com")
username = factory.Sequence(lambda n: f"user{n}")
password = factory.PostGenerationMethodCall('set_password', 'testpass123')
first_name = factory.Faker('first_name')
last_name = factory.Faker('last_name')
is_active = True
class CategoryFactory(factory.django.DjangoModelFactory):
"""Factory for Category model."""
class Meta:
model = Category
name = factory.Faker('word')
slug = factory.LazyAttribute(lambda obj: obj.name.lower())
description = factory.Faker('text')
class ProductFactory(factory.django.DjangoModelFactory):
"""Factory for Product model."""
class Meta:
model = Product
name = factory.Faker('sentence', nb_words=3)
slug = factory.LazyAttribute(lambda obj: obj.name.lower().replace(' ', '-'))
description = factory.Faker('text')
price = fuzzy.FuzzyDecimal(10.00, 1000.00, 2)
stock = fuzzy.FuzzyInteger(0, 100)
is_active = True
category = factory.SubFactory(CategoryFactory)
created_by = factory.SubFactory(UserFactory)
@factory.post_generation
def tags(self, create, extracted, **kwargs):
"""Add tags to product."""
if not create:
return
if extracted:
for tag in extracted:
self.tags.add(tag)
```
### Using Factories
```python
# tests/test_models.py
import pytest
from tests.factories import ProductFactory, UserFactory
def test_product_creation():
"""Test product creation using factory."""
product = ProductFactory(price=100.00, stock=50)
assert product.price == 100.00
assert product.stock == 50
assert product.is_active is True
def test_product_with_tags():
"""Test product with tags."""
tags = [TagFactory(name='electronics'), TagFactory(name='new')]
product = ProductFactory(tags=tags)
assert product.tags.count() == 2
def test_multiple_products():
"""Test creating multiple products."""
products = ProductFactory.create_batch(10)
assert len(products) == 10
```
## Model Testing
### Model Tests
```python
# tests/test_models.py
import pytest
from django.core.exceptions import ValidationError
from tests.factories import UserFactory, ProductFactory
class TestUserModel:
"""Test User model."""
def test_create_user(self, db):
"""Test creating a regular user."""
user = UserFactory(email='test@example.com')
assert user.email == 'test@example.com'
assert user.check_password('testpass123')
assert not user.is_staff
assert not user.is_superuser
def test_create_superuser(self, db):
"""Test creating a superuser."""
user = UserFactory(
email='admin@example.com',
is_staff=True,
is_superuser=True
)
assert user.is_staff
assert user.is_superuser
def test_user_str(self, db):
"""Test user string representation."""
user = UserFactory(email='test@example.com')
assert str(user) == 'test@example.com'
class TestProductModel:
"""Test Product model."""
def test_product_creation(self, db):
"""Test creating a product."""
product = ProductFactory()
assert product.id is not None
assert product.is_active is True
assert product.created_at is not None
def test_product_slug_generation(self, db):
"""Test automatic slug generation."""
product = ProductFactory(name='Test Product')
assert product.slug == 'test-product'
def test_product_price_validation(self, db):
"""Test price cannot be negative."""
product = ProductFactory(price=-10)
with pytest.raises(ValidationError):
product.full_clean()
def test_product_manager_active(self, db):
"""Test active manager method."""
ProductFactory.create_batch(5, is_active=True)
ProductFactory.create_batch(3, is_active=False)
active_count = Product.objects.active().count()
assert active_count == 5
def test_product_stock_management(self, db):
"""Test stock management."""
product = ProductFactory(stock=10)
product.reduce_stock(5)
product.refresh_from_db()
assert product.stock == 5
with pytest.raises(ValueError):
product.reduce_stock(10) # Not enough stock
```
## View Testing
### Django View Testing
```python
# tests/test_views.py
import pytest
from django.urls import reverse
from tests.factories import ProductFactory, UserFactory
class TestProductViews:
"""Test product views."""
def test_product_list(self, client, db):
"""Test product list view."""
ProductFactory.create_batch(10)
response = client.get(reverse('products:list'))
assert response.status_code == 200
assert len(response.context['products']) == 10
def test_product_detail(self, client, db):
"""Test product detail view."""
product = ProductFactory()
response = client.get(reverse('products:detail', kwargs={'slug': product.slug}))
assert response.status_code == 200
assert response.context['product'] == product
def test_product_create_requires_login(self, client, db):
"""Test product creation requires authentication."""
response = client.get(reverse('products:create'))
assert response.status_code == 302
assert response.url.startswith('/accounts/login/')
def test_product_create_authenticated(self, authenticated_client, db):
"""Test product creation as authenticated user."""
response = authenticated_client.get(reverse('products:create'))
assert response.status_code == 200
def test_product_create_post(self, authenticated_client, db, category):
"""Test creating a product via POST."""
data = {
'name': 'Test Product',
'description': 'A test product',
'price': '99.99',
'stock': 10,
'category': category.id,
}
response = authenticated_client.post(reverse('products:create'), data)
assert response.status_code == 302
assert Product.objects.filter(name='Test Product').exists()
```
## DRF API Testing
### Serializer Testing
```python
# tests/test_serializers.py
import pytest
from rest_framework.exceptions import ValidationError
from apps.products.serializers import ProductSerializer
from tests.factories import ProductFactory
class TestProductSerializer:
"""Test ProductSerializer."""
def test_serialize_product(self, db):
"""Test serializing a product."""
product = ProductFactory()
serializer = ProductSerializer(product)
data = serializer.data
assert data['id'] == product.id
assert data['name'] == product.name
assert data['price'] == str(product.price)
def test_deserialize_product(self, db):
"""Test deserializing product data."""
data = {
'name': 'Test Product',
'description': 'Test description',
'price': '99.99',
'stock': 10,
'category': 1,
}
serializer = ProductSerializer(data=data)
assert serializer.is_valid()
product = serializer.save()
assert product.name == 'Test Product'
assert float(product.price) == 99.99
def test_price_validation(self, db):
"""Test price validation."""
data = {
'name': 'Test Product',
'price': '-10.00',
'stock': 10,
}
serializer = ProductSerializer(data=data)
assert not serializer.is_valid()
assert 'price' in serializer.errors
def test_stock_validation(self, db):
"""Test stock cannot be negative."""
data = {
'name': 'Test Product',
'price': '99.99',
'stock': -5,
}
serializer = ProductSerializer(data=data)
assert not serializer.is_valid()
assert 'stock' in serializer.errors
```
### API ViewSet Testing
```python
# tests/test_api.py
import pytest
from rest_framework.test import APIClient
from rest_framework import status
from django.urls import reverse
from tests.factories import ProductFactory, UserFactory
class TestProductAPI:
"""Test Product API endpoints."""
@pytest.fixture
def api_client(self):
"""Return API client."""
return APIClient()
def test_list_products(self, api_client, db):
"""Test listing products."""
ProductFactory.create_batch(10)
url = reverse('api:product-list')
response = api_client.get(url)
assert response.status_code == status.HTTP_200_OK
assert response.data['count'] == 10
def test_retrieve_product(self, api_client, db):
"""Test retrieving a product."""
product = ProductFactory()
url = reverse('api:product-detail', kwargs={'pk': product.id})
response = api_client.get(url)
assert response.status_code == status.HTTP_200_OK
assert response.data['id'] == product.id
def test_create_product_unauthorized(self, api_client, db):
"""Test creating product without authentication."""
url = reverse('api:product-list')
data = {'name': 'Test Product', 'price': '99.99'}
response = api_client.post(url, data)
assert response.status_code == status.HTTP_401_UNAUTHORIZED
def test_create_product_authorized(self, authenticated_api_client, db):
"""Test creating product as authenticated user."""
url = reverse('api:product-list')
data = {
'name': 'Test Product',
'description': 'Test',
'price': '99.99',
'stock': 10,
}
response = authenticated_api_client.post(url, data)
assert response.status_code == status.HTTP_201_CREATED
assert response.data['name'] == 'Test Product'
def test_update_product(self, authenticated_api_client, db):
"""Test updating a product."""
product = ProductFactory(created_by=authenticated_api_client.user)
url = reverse('api:product-detail', kwargs={'pk': product.id})
data = {'name': 'Updated Product'}
response = authenticated_api_client.patch(url, data)
assert response.status_code == status.HTTP_200_OK
assert response.data['name'] == 'Updated Product'
def test_delete_product(self, authenticated_api_client, db):
"""Test deleting a product."""
product = ProductFactory(created_by=authenticated_api_client.user)
url = reverse('api:product-detail', kwargs={'pk': product.id})
response = authenticated_api_client.delete(url)
assert response.status_code == status.HTTP_204_NO_CONTENT
def test_filter_products_by_price(self, api_client, db):
"""Test filtering products by price."""
ProductFactory(price=50)
ProductFactory(price=150)
url = reverse('api:product-list')
response = api_client.get(url, {'price_min': 100})
assert response.status_code == status.HTTP_200_OK
assert response.data['count'] == 1
def test_search_products(self, api_client, db):
"""Test searching products."""
ProductFactory(name='Apple iPhone')
ProductFactory(name='Samsung Galaxy')
url = reverse('api:product-list')
response = api_client.get(url, {'search': 'Apple'})
assert response.status_code == status.HTTP_200_OK
assert response.data['count'] == 1
```
## Mocking and Patching
### Mocking External Services
```python
# tests/test_views.py
from unittest.mock import patch, Mock
import pytest
class TestPaymentView:
"""Test payment view with mocked payment gateway."""
@patch('apps.payments.services.stripe')
def test_successful_payment(self, mock_stripe, client, user, product):
"""Test successful payment with mocked Stripe."""
# Configure mock
mock_stripe.Charge.create.return_value = {
'id': 'ch_123',
'status': 'succeeded',
'amount': 9999,
}
client.force_login(user)
response = client.post(reverse('payments:process'), {
'product_id': product.id,
'token': 'tok_visa',
})
assert response.status_code == 302
mock_stripe.Charge.create.assert_called_once()
@patch('apps.payments.services.stripe')
def test_failed_payment(self, mock_stripe, client, user, product):
"""Test failed payment."""
mock_stripe.Charge.create.side_effect = Exception('Card declined')
client.force_login(user)
response = client.post(reverse('payments:process'), {
'product_id': product.id,
'token': 'tok_visa',
})
assert response.status_code == 302
assert 'error' in response.url
```
### Mocking Email Sending
```python
# tests/test_email.py
from django.core import mail
from django.test import override_settings
@override_settings(EMAIL_BACKEND='django.core.mail.backends.locmem.EmailBackend')
def test_order_confirmation_email(db, order):
"""Test order confirmation email."""
order.send_confirmation_email()
assert len(mail.outbox) == 1
assert order.user.email in mail.outbox[0].to
assert 'Order Confirmation' in mail.outbox[0].subject
```
## Integration Testing
### Full Flow Testing
```python
# tests/test_integration.py
import pytest
from django.urls import reverse
from tests.factories import UserFactory, ProductFactory
class TestCheckoutFlow:
"""Test complete checkout flow."""
def test_guest_to_purchase_flow(self, client, db):
"""Test complete flow from guest to purchase."""
# Step 1: Register
response = client.post(reverse('users:register'), {
'email': 'test@example.com',
'password': 'testpass123',
'password_confirm': 'testpass123',
})
assert response.status_code == 302
# Step 2: Login
response = client.post(reverse('users:login'), {
'email': 'test@example.com',
'password': 'testpass123',
})
assert response.status_code == 302
# Step 3: Browse products
product = ProductFactory(price=100)
response = client.get(reverse('products:detail', kwargs={'slug': product.slug}))
assert response.status_code == 200
# Step 4: Add to cart
response = client.post(reverse('cart:add'), {
'product_id': product.id,
'quantity': 1,
})
assert response.status_code == 302
# Step 5: Checkout
response = client.get(reverse('checkout:review'))
assert response.status_code == 200
assert product.name in response.content.decode()
# Step 6: Complete purchase
with patch('apps.checkout.services.process_payment') as mock_payment:
mock_payment.return_value = True
response = client.post(reverse('checkout:complete'))
assert response.status_code == 302
assert Order.objects.filter(user__email='test@example.com').exists()
```
## Testing Best Practices
### DO
- **Use factories**: Instead of manual object creation
- **One assertion per test**: Keep tests focused
- **Descriptive test names**: `test_user_cannot_delete_others_post`
- **Test edge cases**: Empty inputs, None values, boundary conditions
- **Mock external services**: Don't depend on external APIs
- **Use fixtures**: Eliminate duplication
- **Test permissions**: Ensure authorization works
- **Keep tests fast**: Use `--reuse-db` and `--nomigrations`
### DON'T
- **Don't test Django internals**: Trust Django to work
- **Don't test third-party code**: Trust libraries to work
- **Don't ignore failing tests**: All tests must pass
- **Don't make tests dependent**: Tests should run in any order
- **Don't over-mock**: Mock only external dependencies
- **Don't test private methods**: Test public interface
- **Don't use production database**: Always use test database
## Coverage
### Coverage Configuration
```bash
# Run tests with coverage
pytest --cov=apps --cov-report=html --cov-report=term-missing
# Generate HTML report
open htmlcov/index.html
```
### Coverage Goals
| Component | Target Coverage |
|-----------|-----------------|
| Models | 90%+ |
| Serializers | 85%+ |
| Views | 80%+ |
| Services | 90%+ |
| Utilities | 80%+ |
| Overall | 80%+ |
## Quick Reference
| Pattern | Usage |
|---------|-------|
| `@pytest.mark.django_db` | Enable database access |
| `client` | Django test client |
| `api_client` | DRF API client |
| `factory.create_batch(n)` | Create multiple objects |
| `patch('module.function')` | Mock external dependencies |
| `override_settings` | Temporarily change settings |
| `force_authenticate()` | Bypass authentication in tests |
| `assertRedirects` | Check for redirects |
| `assertTemplateUsed` | Verify template usage |
| `mail.outbox` | Check sent emails |
Remember: Tests are documentation. Good tests explain how your code should work. Keep them simple, readable, and maintainable.

View File

@@ -0,0 +1,468 @@
---
name: django-verification
description: "Verification loop for Django projects: migrations, linting, tests with coverage, security scans, and deployment readiness checks before release or PR."
---
# Django Verification Loop
Run before PRs, after major changes, and pre-deploy to ensure Django application quality and security.
## When to Activate
- Before opening a pull request for a Django project
- After major model changes, migration updates, or dependency upgrades
- Pre-deployment verification for staging or production
- Running full environment → lint → test → security → deploy readiness pipeline
- Validating migration safety and test coverage
## Phase 1: Environment Check
```bash
# Verify Python version
python --version # Should match project requirements
# Check virtual environment
which python
pip list --outdated
# Verify environment variables
python -c "import os; import environ; print('DJANGO_SECRET_KEY set' if os.environ.get('DJANGO_SECRET_KEY') else 'MISSING: DJANGO_SECRET_KEY')"
```
If environment is misconfigured, stop and fix.
## Phase 2: Code Quality & Formatting
```bash
# Type checking
mypy . --config-file pyproject.toml
# Linting with ruff
ruff check . --fix
# Formatting with black
black . --check
black . # Auto-fix
# Import sorting
isort . --check-only
isort . # Auto-fix
# Django-specific checks
python manage.py check --deploy
```
Common issues:
- Missing type hints on public functions
- PEP 8 formatting violations
- Unsorted imports
- Debug settings left in production configuration
## Phase 3: Migrations
```bash
# Check for unapplied migrations
python manage.py showmigrations
# Create missing migrations
python manage.py makemigrations --check
# Dry-run migration application
python manage.py migrate --plan
# Apply migrations (test environment)
python manage.py migrate
# Check for migration conflicts
python manage.py makemigrations --merge # Only if conflicts exist
```
Report:
- Number of pending migrations
- Any migration conflicts
- Model changes without migrations
## Phase 4: Tests + Coverage
```bash
# Run all tests with pytest
pytest --cov=apps --cov-report=html --cov-report=term-missing --reuse-db
# Run specific app tests
pytest apps/users/tests/
# Run with markers
pytest -m "not slow" # Skip slow tests
pytest -m integration # Only integration tests
# Coverage report
open htmlcov/index.html
```
Report:
- Total tests: X passed, Y failed, Z skipped
- Overall coverage: XX%
- Per-app coverage breakdown
Coverage targets:
| Component | Target |
|-----------|--------|
| Models | 90%+ |
| Serializers | 85%+ |
| Views | 80%+ |
| Services | 90%+ |
| Overall | 80%+ |
## Phase 5: Security Scan
```bash
# Dependency vulnerabilities
pip-audit
safety check --full-report
# Django security checks
python manage.py check --deploy
# Bandit security linter
bandit -r . -f json -o bandit-report.json
# Secret scanning (if gitleaks is installed)
gitleaks detect --source . --verbose
# Environment variable check
python -c "from django.core.exceptions import ImproperlyConfigured; from django.conf import settings; settings.DEBUG"
```
Report:
- Vulnerable dependencies found
- Security configuration issues
- Hardcoded secrets detected
- DEBUG mode status (should be False in production)
## Phase 6: Django Management Commands
```bash
# Check for model issues
python manage.py check
# Collect static files
python manage.py collectstatic --noinput --clear
# Create superuser (if needed for tests)
echo "from apps.users.models import User; User.objects.create_superuser('admin@example.com', 'admin')" | python manage.py shell
# Database integrity
python manage.py check --database default
# Cache verification (if using Redis)
python -c "from django.core.cache import cache; cache.set('test', 'value', 10); print(cache.get('test'))"
```
## Phase 7: Performance Checks
```bash
# Django Debug Toolbar output (check for N+1 queries)
# Run in dev mode with DEBUG=True and access a page
# Look for duplicate queries in SQL panel
# Query count analysis
django-admin debugsqlshell # If django-debug-sqlshell installed
# Check for missing indexes
python manage.py shell << EOF
from django.db import connection
with connection.cursor() as cursor:
cursor.execute("SELECT table_name, index_name FROM information_schema.statistics WHERE table_schema = 'public'")
print(cursor.fetchall())
EOF
```
Report:
- Number of queries per page (should be < 50 for typical pages)
- Missing database indexes
- Duplicate queries detected
## Phase 8: Static Assets
```bash
# Check for npm dependencies (if using npm)
npm audit
npm audit fix
# Build static files (if using webpack/vite)
npm run build
# Verify static files
ls -la staticfiles/
python manage.py findstatic css/style.css
```
## Phase 9: Configuration Review
```python
# Run in Python shell to verify settings
python manage.py shell << EOF
from django.conf import settings
import os
# Critical checks
checks = {
'DEBUG is False': not settings.DEBUG,
'SECRET_KEY set': bool(settings.SECRET_KEY and len(settings.SECRET_KEY) > 30),
'ALLOWED_HOSTS set': len(settings.ALLOWED_HOSTS) > 0,
'HTTPS enabled': getattr(settings, 'SECURE_SSL_REDIRECT', False),
'HSTS enabled': getattr(settings, 'SECURE_HSTS_SECONDS', 0) > 0,
'Database configured': settings.DATABASES['default']['ENGINE'] != 'django.db.backends.sqlite3',
}
for check, result in checks.items():
status = '' if result else ''
print(f"{status} {check}")
EOF
```
## Phase 10: Logging Configuration
```bash
# Test logging output
python manage.py shell << EOF
import logging
logger = logging.getLogger('django')
logger.warning('Test warning message')
logger.error('Test error message')
EOF
# Check log files (if configured)
tail -f /var/log/django/django.log
```
## Phase 11: API Documentation (if DRF)
```bash
# Generate schema
python manage.py generateschema --format openapi-json > schema.json
# Validate schema
# Check if schema.json is valid JSON
python -c "import json; json.load(open('schema.json'))"
# Access Swagger UI (if using drf-yasg)
# Visit http://localhost:8000/swagger/ in browser
```
## Phase 12: Diff Review
```bash
# Show diff statistics
git diff --stat
# Show actual changes
git diff
# Show changed files
git diff --name-only
# Check for common issues
git diff | grep -i "todo\|fixme\|hack\|xxx"
git diff | grep "print(" # Debug statements
git diff | grep "DEBUG = True" # Debug mode
git diff | grep "import pdb" # Debugger
```
Checklist:
- No debugging statements (print, pdb, breakpoint())
- No TODO/FIXME comments in critical code
- No hardcoded secrets or credentials
- Database migrations included for model changes
- Configuration changes documented
- Error handling present for external calls
- Transaction management where needed
## Output Template
```
DJANGO VERIFICATION REPORT
==========================
Phase 1: Environment Check
✓ Python 3.11.5
✓ Virtual environment active
✓ All environment variables set
Phase 2: Code Quality
✓ mypy: No type errors
✗ ruff: 3 issues found (auto-fixed)
✓ black: No formatting issues
✓ isort: Imports properly sorted
✓ manage.py check: No issues
Phase 3: Migrations
✓ No unapplied migrations
✓ No migration conflicts
✓ All models have migrations
Phase 4: Tests + Coverage
Tests: 247 passed, 0 failed, 5 skipped
Coverage:
Overall: 87%
users: 92%
products: 89%
orders: 85%
payments: 91%
Phase 5: Security Scan
✗ pip-audit: 2 vulnerabilities found (fix required)
✓ safety check: No issues
✓ bandit: No security issues
✓ No secrets detected
✓ DEBUG = False
Phase 6: Django Commands
✓ collectstatic completed
✓ Database integrity OK
✓ Cache backend reachable
Phase 7: Performance
✓ No N+1 queries detected
✓ Database indexes configured
✓ Query count acceptable
Phase 8: Static Assets
✓ npm audit: No vulnerabilities
✓ Assets built successfully
✓ Static files collected
Phase 9: Configuration
✓ DEBUG = False
✓ SECRET_KEY configured
✓ ALLOWED_HOSTS set
✓ HTTPS enabled
✓ HSTS enabled
✓ Database configured
Phase 10: Logging
✓ Logging configured
✓ Log files writable
Phase 11: API Documentation
✓ Schema generated
✓ Swagger UI accessible
Phase 12: Diff Review
Files changed: 12
+450, -120 lines
✓ No debug statements
✓ No hardcoded secrets
✓ Migrations included
RECOMMENDATION: ⚠️ Fix pip-audit vulnerabilities before deploying
NEXT STEPS:
1. Update vulnerable dependencies
2. Re-run security scan
3. Deploy to staging for final testing
```
## Pre-Deployment Checklist
- [ ] All tests passing
- [ ] Coverage ≥ 80%
- [ ] No security vulnerabilities
- [ ] No unapplied migrations
- [ ] DEBUG = False in production settings
- [ ] SECRET_KEY properly configured
- [ ] ALLOWED_HOSTS set correctly
- [ ] Database backups enabled
- [ ] Static files collected and served
- [ ] Logging configured and working
- [ ] Error monitoring (Sentry, etc.) configured
- [ ] CDN configured (if applicable)
- [ ] Redis/cache backend configured
- [ ] Celery workers running (if applicable)
- [ ] HTTPS/SSL configured
- [ ] Environment variables documented
## Continuous Integration
### GitHub Actions Example
```yaml
# .github/workflows/django-verification.yml
name: Django Verification
on: [push, pull_request]
jobs:
verify:
runs-on: ubuntu-latest
services:
postgres:
image: postgres:14
env:
POSTGRES_PASSWORD: postgres
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Cache pip
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
- name: Install dependencies
run: |
pip install -r requirements.txt
pip install ruff black mypy pytest pytest-django pytest-cov bandit safety pip-audit
- name: Code quality checks
run: |
ruff check .
black . --check
isort . --check-only
mypy .
- name: Security scan
run: |
bandit -r . -f json -o bandit-report.json
safety check --full-report
pip-audit
- name: Run tests
env:
DATABASE_URL: postgres://postgres:postgres@localhost:5432/test
DJANGO_SECRET_KEY: test-secret-key
run: |
pytest --cov=apps --cov-report=xml --cov-report=term-missing
- name: Upload coverage
uses: codecov/codecov-action@v3
```
## Quick Reference
| Check | Command |
|-------|---------|
| Environment | `python --version` |
| Type checking | `mypy .` |
| Linting | `ruff check .` |
| Formatting | `black . --check` |
| Migrations | `python manage.py makemigrations --check` |
| Tests | `pytest --cov=apps` |
| Security | `pip-audit && bandit -r .` |
| Django check | `python manage.py check --deploy` |
| Collectstatic | `python manage.py collectstatic --noinput` |
| Diff stats | `git diff --stat` |
Remember: Automated verification catches common issues but doesn't replace manual code review and testing in staging environment.

View File

@@ -0,0 +1,363 @@
---
name: docker-patterns
description: Docker and Docker Compose patterns for local development, container security, networking, volume strategies, and multi-service orchestration.
---
# Docker Patterns
Docker and Docker Compose best practices for containerized development.
## When to Activate
- Setting up Docker Compose for local development
- Designing multi-container architectures
- Troubleshooting container networking or volume issues
- Reviewing Dockerfiles for security and size
- Migrating from local dev to containerized workflow
## Docker Compose for Local Development
### Standard Web App Stack
```yaml
# docker-compose.yml
services:
app:
build:
context: .
target: dev # Use dev stage of multi-stage Dockerfile
ports:
- "3000:3000"
volumes:
- .:/app # Bind mount for hot reload
- /app/node_modules # Anonymous volume -- preserves container deps
environment:
- DATABASE_URL=postgres://postgres:postgres@db:5432/app_dev
- REDIS_URL=redis://redis:6379/0
- NODE_ENV=development
depends_on:
db:
condition: service_healthy
redis:
condition: service_started
command: npm run dev
db:
image: postgres:16-alpine
ports:
- "5432:5432"
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: app_dev
volumes:
- pgdata:/var/lib/postgresql/data
- ./scripts/init-db.sql:/docker-entrypoint-initdb.d/init.sql
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 3s
retries: 5
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redisdata:/data
mailpit: # Local email testing
image: axllent/mailpit
ports:
- "8025:8025" # Web UI
- "1025:1025" # SMTP
volumes:
pgdata:
redisdata:
```
### Development vs Production Dockerfile
```dockerfile
# Stage: dependencies
FROM node:22-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
# Stage: dev (hot reload, debug tools)
FROM node:22-alpine AS dev
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
EXPOSE 3000
CMD ["npm", "run", "dev"]
# Stage: build
FROM node:22-alpine AS build
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build && npm prune --production
# Stage: production (minimal image)
FROM node:22-alpine AS production
WORKDIR /app
RUN addgroup -g 1001 -S appgroup && adduser -S appuser -u 1001
USER appuser
COPY --from=build --chown=appuser:appgroup /app/dist ./dist
COPY --from=build --chown=appuser:appgroup /app/node_modules ./node_modules
COPY --from=build --chown=appuser:appgroup /app/package.json ./
ENV NODE_ENV=production
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s CMD wget -qO- http://localhost:3000/health || exit 1
CMD ["node", "dist/server.js"]
```
### Override Files
```yaml
# docker-compose.override.yml (auto-loaded, dev-only settings)
services:
app:
environment:
- DEBUG=app:*
- LOG_LEVEL=debug
ports:
- "9229:9229" # Node.js debugger
# docker-compose.prod.yml (explicit for production)
services:
app:
build:
target: production
restart: always
deploy:
resources:
limits:
cpus: "1.0"
memory: 512M
```
```bash
# Development (auto-loads override)
docker compose up
# Production
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
```
## Networking
### Service Discovery
Services in the same Compose network resolve by service name:
```
# From "app" container:
postgres://postgres:postgres@db:5432/app_dev # "db" resolves to the db container
redis://redis:6379/0 # "redis" resolves to the redis container
```
### Custom Networks
```yaml
services:
frontend:
networks:
- frontend-net
api:
networks:
- frontend-net
- backend-net
db:
networks:
- backend-net # Only reachable from api, not frontend
networks:
frontend-net:
backend-net:
```
### Exposing Only What's Needed
```yaml
services:
db:
ports:
- "127.0.0.1:5432:5432" # Only accessible from host, not network
# Omit ports entirely in production -- accessible only within Docker network
```
## Volume Strategies
```yaml
volumes:
# Named volume: persists across container restarts, managed by Docker
pgdata:
# Bind mount: maps host directory into container (for development)
# - ./src:/app/src
# Anonymous volume: preserves container-generated content from bind mount override
# - /app/node_modules
```
### Common Patterns
```yaml
services:
app:
volumes:
- .:/app # Source code (bind mount for hot reload)
- /app/node_modules # Protect container's node_modules from host
- /app/.next # Protect build cache
db:
volumes:
- pgdata:/var/lib/postgresql/data # Persistent data
- ./scripts/init.sql:/docker-entrypoint-initdb.d/init.sql # Init scripts
```
## Container Security
### Dockerfile Hardening
```dockerfile
# 1. Use specific tags (never :latest)
FROM node:22.12-alpine3.20
# 2. Run as non-root
RUN addgroup -g 1001 -S app && adduser -S app -u 1001
USER app
# 3. Drop capabilities (in compose)
# 4. Read-only root filesystem where possible
# 5. No secrets in image layers
```
### Compose Security
```yaml
services:
app:
security_opt:
- no-new-privileges:true
read_only: true
tmpfs:
- /tmp
- /app/.cache
cap_drop:
- ALL
cap_add:
- NET_BIND_SERVICE # Only if binding to ports < 1024
```
### Secret Management
```yaml
# GOOD: Use environment variables (injected at runtime)
services:
app:
env_file:
- .env # Never commit .env to git
environment:
- API_KEY # Inherits from host environment
# GOOD: Docker secrets (Swarm mode)
secrets:
db_password:
file: ./secrets/db_password.txt
services:
db:
secrets:
- db_password
# BAD: Hardcoded in image
# ENV API_KEY=sk-proj-xxxxx # NEVER DO THIS
```
## .dockerignore
```
node_modules
.git
.env
.env.*
dist
coverage
*.log
.next
.cache
docker-compose*.yml
Dockerfile*
README.md
tests/
```
## Debugging
### Common Commands
```bash
# View logs
docker compose logs -f app # Follow app logs
docker compose logs --tail=50 db # Last 50 lines from db
# Execute commands in running container
docker compose exec app sh # Shell into app
docker compose exec db psql -U postgres # Connect to postgres
# Inspect
docker compose ps # Running services
docker compose top # Processes in each container
docker stats # Resource usage
# Rebuild
docker compose up --build # Rebuild images
docker compose build --no-cache app # Force full rebuild
# Clean up
docker compose down # Stop and remove containers
docker compose down -v # Also remove volumes (DESTRUCTIVE)
docker system prune # Remove unused images/containers
```
### Debugging Network Issues
```bash
# Check DNS resolution inside container
docker compose exec app nslookup db
# Check connectivity
docker compose exec app wget -qO- http://api:3000/health
# Inspect network
docker network ls
docker network inspect <project>_default
```
## Anti-Patterns
```
# BAD: Using docker compose in production without orchestration
# Use Kubernetes, ECS, or Docker Swarm for production multi-container workloads
# BAD: Storing data in containers without volumes
# Containers are ephemeral -- all data lost on restart without volumes
# BAD: Running as root
# Always create and use a non-root user
# BAD: Using :latest tag
# Pin to specific versions for reproducible builds
# BAD: One giant container with all services
# Separate concerns: one process per container
# BAD: Putting secrets in docker-compose.yml
# Use .env files (gitignored) or Docker secrets
```

View File

@@ -0,0 +1,673 @@
---
name: golang-patterns
description: Idiomatic Go patterns, best practices, and conventions for building robust, efficient, and maintainable Go applications.
---
# Go Development Patterns
Idiomatic Go patterns and best practices for building robust, efficient, and maintainable applications.
## When to Activate
- Writing new Go code
- Reviewing Go code
- Refactoring existing Go code
- Designing Go packages/modules
## Core Principles
### 1. Simplicity and Clarity
Go favors simplicity over cleverness. Code should be obvious and easy to read.
```go
// Good: Clear and direct
func GetUser(id string) (*User, error) {
user, err := db.FindUser(id)
if err != nil {
return nil, fmt.Errorf("get user %s: %w", id, err)
}
return user, nil
}
// Bad: Overly clever
func GetUser(id string) (*User, error) {
return func() (*User, error) {
if u, e := db.FindUser(id); e == nil {
return u, nil
} else {
return nil, e
}
}()
}
```
### 2. Make the Zero Value Useful
Design types so their zero value is immediately usable without initialization.
```go
// Good: Zero value is useful
type Counter struct {
mu sync.Mutex
count int // zero value is 0, ready to use
}
func (c *Counter) Inc() {
c.mu.Lock()
c.count++
c.mu.Unlock()
}
// Good: bytes.Buffer works with zero value
var buf bytes.Buffer
buf.WriteString("hello")
// Bad: Requires initialization
type BadCounter struct {
counts map[string]int // nil map will panic
}
```
### 3. Accept Interfaces, Return Structs
Functions should accept interface parameters and return concrete types.
```go
// Good: Accepts interface, returns concrete type
func ProcessData(r io.Reader) (*Result, error) {
data, err := io.ReadAll(r)
if err != nil {
return nil, err
}
return &Result{Data: data}, nil
}
// Bad: Returns interface (hides implementation details unnecessarily)
func ProcessData(r io.Reader) (io.Reader, error) {
// ...
}
```
## Error Handling Patterns
### Error Wrapping with Context
```go
// Good: Wrap errors with context
func LoadConfig(path string) (*Config, error) {
data, err := os.ReadFile(path)
if err != nil {
return nil, fmt.Errorf("load config %s: %w", path, err)
}
var cfg Config
if err := json.Unmarshal(data, &cfg); err != nil {
return nil, fmt.Errorf("parse config %s: %w", path, err)
}
return &cfg, nil
}
```
### Custom Error Types
```go
// Define domain-specific errors
type ValidationError struct {
Field string
Message string
}
func (e *ValidationError) Error() string {
return fmt.Sprintf("validation failed on %s: %s", e.Field, e.Message)
}
// Sentinel errors for common cases
var (
ErrNotFound = errors.New("resource not found")
ErrUnauthorized = errors.New("unauthorized")
ErrInvalidInput = errors.New("invalid input")
)
```
### Error Checking with errors.Is and errors.As
```go
func HandleError(err error) {
// Check for specific error
if errors.Is(err, sql.ErrNoRows) {
log.Println("No records found")
return
}
// Check for error type
var validationErr *ValidationError
if errors.As(err, &validationErr) {
log.Printf("Validation error on field %s: %s",
validationErr.Field, validationErr.Message)
return
}
// Unknown error
log.Printf("Unexpected error: %v", err)
}
```
### Never Ignore Errors
```go
// Bad: Ignoring error with blank identifier
result, _ := doSomething()
// Good: Handle or explicitly document why it's safe to ignore
result, err := doSomething()
if err != nil {
return err
}
// Acceptable: When error truly doesn't matter (rare)
_ = writer.Close() // Best-effort cleanup, error logged elsewhere
```
## Concurrency Patterns
### Worker Pool
```go
func WorkerPool(jobs <-chan Job, results chan<- Result, numWorkers int) {
var wg sync.WaitGroup
for i := 0; i < numWorkers; i++ {
wg.Add(1)
go func() {
defer wg.Done()
for job := range jobs {
results <- process(job)
}
}()
}
wg.Wait()
close(results)
}
```
### Context for Cancellation and Timeouts
```go
func FetchWithTimeout(ctx context.Context, url string) ([]byte, error) {
ctx, cancel := context.WithTimeout(ctx, 5*time.Second)
defer cancel()
req, err := http.NewRequestWithContext(ctx, "GET", url, nil)
if err != nil {
return nil, fmt.Errorf("create request: %w", err)
}
resp, err := http.DefaultClient.Do(req)
if err != nil {
return nil, fmt.Errorf("fetch %s: %w", url, err)
}
defer resp.Body.Close()
return io.ReadAll(resp.Body)
}
```
### Graceful Shutdown
```go
func GracefulShutdown(server *http.Server) {
quit := make(chan os.Signal, 1)
signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM)
<-quit
log.Println("Shutting down server...")
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
if err := server.Shutdown(ctx); err != nil {
log.Fatalf("Server forced to shutdown: %v", err)
}
log.Println("Server exited")
}
```
### errgroup for Coordinated Goroutines
```go
import "golang.org/x/sync/errgroup"
func FetchAll(ctx context.Context, urls []string) ([][]byte, error) {
g, ctx := errgroup.WithContext(ctx)
results := make([][]byte, len(urls))
for i, url := range urls {
i, url := i, url // Capture loop variables
g.Go(func() error {
data, err := FetchWithTimeout(ctx, url)
if err != nil {
return err
}
results[i] = data
return nil
})
}
if err := g.Wait(); err != nil {
return nil, err
}
return results, nil
}
```
### Avoiding Goroutine Leaks
```go
// Bad: Goroutine leak if context is cancelled
func leakyFetch(ctx context.Context, url string) <-chan []byte {
ch := make(chan []byte)
go func() {
data, _ := fetch(url)
ch <- data // Blocks forever if no receiver
}()
return ch
}
// Good: Properly handles cancellation
func safeFetch(ctx context.Context, url string) <-chan []byte {
ch := make(chan []byte, 1) // Buffered channel
go func() {
data, err := fetch(url)
if err != nil {
return
}
select {
case ch <- data:
case <-ctx.Done():
}
}()
return ch
}
```
## Interface Design
### Small, Focused Interfaces
```go
// Good: Single-method interfaces
type Reader interface {
Read(p []byte) (n int, err error)
}
type Writer interface {
Write(p []byte) (n int, err error)
}
type Closer interface {
Close() error
}
// Compose interfaces as needed
type ReadWriteCloser interface {
Reader
Writer
Closer
}
```
### Define Interfaces Where They're Used
```go
// In the consumer package, not the provider
package service
// UserStore defines what this service needs
type UserStore interface {
GetUser(id string) (*User, error)
SaveUser(user *User) error
}
type Service struct {
store UserStore
}
// Concrete implementation can be in another package
// It doesn't need to know about this interface
```
### Optional Behavior with Type Assertions
```go
type Flusher interface {
Flush() error
}
func WriteAndFlush(w io.Writer, data []byte) error {
if _, err := w.Write(data); err != nil {
return err
}
// Flush if supported
if f, ok := w.(Flusher); ok {
return f.Flush()
}
return nil
}
```
## Package Organization
### Standard Project Layout
```text
myproject/
├── cmd/
│ └── myapp/
│ └── main.go # Entry point
├── internal/
│ ├── handler/ # HTTP handlers
│ ├── service/ # Business logic
│ ├── repository/ # Data access
│ └── config/ # Configuration
├── pkg/
│ └── client/ # Public API client
├── api/
│ └── v1/ # API definitions (proto, OpenAPI)
├── testdata/ # Test fixtures
├── go.mod
├── go.sum
└── Makefile
```
### Package Naming
```go
// Good: Short, lowercase, no underscores
package http
package json
package user
// Bad: Verbose, mixed case, or redundant
package httpHandler
package json_parser
package userService // Redundant 'Service' suffix
```
### Avoid Package-Level State
```go
// Bad: Global mutable state
var db *sql.DB
func init() {
db, _ = sql.Open("postgres", os.Getenv("DATABASE_URL"))
}
// Good: Dependency injection
type Server struct {
db *sql.DB
}
func NewServer(db *sql.DB) *Server {
return &Server{db: db}
}
```
## Struct Design
### Functional Options Pattern
```go
type Server struct {
addr string
timeout time.Duration
logger *log.Logger
}
type Option func(*Server)
func WithTimeout(d time.Duration) Option {
return func(s *Server) {
s.timeout = d
}
}
func WithLogger(l *log.Logger) Option {
return func(s *Server) {
s.logger = l
}
}
func NewServer(addr string, opts ...Option) *Server {
s := &Server{
addr: addr,
timeout: 30 * time.Second, // default
logger: log.Default(), // default
}
for _, opt := range opts {
opt(s)
}
return s
}
// Usage
server := NewServer(":8080",
WithTimeout(60*time.Second),
WithLogger(customLogger),
)
```
### Embedding for Composition
```go
type Logger struct {
prefix string
}
func (l *Logger) Log(msg string) {
fmt.Printf("[%s] %s\n", l.prefix, msg)
}
type Server struct {
*Logger // Embedding - Server gets Log method
addr string
}
func NewServer(addr string) *Server {
return &Server{
Logger: &Logger{prefix: "SERVER"},
addr: addr,
}
}
// Usage
s := NewServer(":8080")
s.Log("Starting...") // Calls embedded Logger.Log
```
## Memory and Performance
### Preallocate Slices When Size is Known
```go
// Bad: Grows slice multiple times
func processItems(items []Item) []Result {
var results []Result
for _, item := range items {
results = append(results, process(item))
}
return results
}
// Good: Single allocation
func processItems(items []Item) []Result {
results := make([]Result, 0, len(items))
for _, item := range items {
results = append(results, process(item))
}
return results
}
```
### Use sync.Pool for Frequent Allocations
```go
var bufferPool = sync.Pool{
New: func() interface{} {
return new(bytes.Buffer)
},
}
func ProcessRequest(data []byte) []byte {
buf := bufferPool.Get().(*bytes.Buffer)
defer func() {
buf.Reset()
bufferPool.Put(buf)
}()
buf.Write(data)
// Process...
return buf.Bytes()
}
```
### Avoid String Concatenation in Loops
```go
// Bad: Creates many string allocations
func join(parts []string) string {
var result string
for _, p := range parts {
result += p + ","
}
return result
}
// Good: Single allocation with strings.Builder
func join(parts []string) string {
var sb strings.Builder
for i, p := range parts {
if i > 0 {
sb.WriteString(",")
}
sb.WriteString(p)
}
return sb.String()
}
// Best: Use standard library
func join(parts []string) string {
return strings.Join(parts, ",")
}
```
## Go Tooling Integration
### Essential Commands
```bash
# Build and run
go build ./...
go run ./cmd/myapp
# Testing
go test ./...
go test -race ./...
go test -cover ./...
# Static analysis
go vet ./...
staticcheck ./...
golangci-lint run
# Module management
go mod tidy
go mod verify
# Formatting
gofmt -w .
goimports -w .
```
### Recommended Linter Configuration (.golangci.yml)
```yaml
linters:
enable:
- errcheck
- gosimple
- govet
- ineffassign
- staticcheck
- unused
- gofmt
- goimports
- misspell
- unconvert
- unparam
linters-settings:
errcheck:
check-type-assertions: true
govet:
check-shadowing: true
issues:
exclude-use-default: false
```
## Quick Reference: Go Idioms
| Idiom | Description |
|-------|-------------|
| Accept interfaces, return structs | Functions accept interface params, return concrete types |
| Errors are values | Treat errors as first-class values, not exceptions |
| Don't communicate by sharing memory | Use channels for coordination between goroutines |
| Make the zero value useful | Types should work without explicit initialization |
| A little copying is better than a little dependency | Avoid unnecessary external dependencies |
| Clear is better than clever | Prioritize readability over cleverness |
| gofmt is no one's favorite but everyone's friend | Always format with gofmt/goimports |
| Return early | Handle errors first, keep happy path unindented |
## Anti-Patterns to Avoid
```go
// Bad: Naked returns in long functions
func process() (result int, err error) {
// ... 50 lines ...
return // What is being returned?
}
// Bad: Using panic for control flow
func GetUser(id string) *User {
user, err := db.Find(id)
if err != nil {
panic(err) // Don't do this
}
return user
}
// Bad: Passing context in struct
type Request struct {
ctx context.Context // Context should be first param
ID string
}
// Good: Context as first parameter
func ProcessRequest(ctx context.Context, id string) error {
// ...
}
// Bad: Mixing value and pointer receivers
type Counter struct{ n int }
func (c Counter) Value() int { return c.n } // Value receiver
func (c *Counter) Increment() { c.n++ } // Pointer receiver
// Pick one style and be consistent
```
**Remember**: Go code should be boring in the best way - predictable, consistent, and easy to understand. When in doubt, keep it simple.

View File

@@ -0,0 +1,719 @@
---
name: golang-testing
description: Go testing patterns including table-driven tests, subtests, benchmarks, fuzzing, and test coverage. Follows TDD methodology with idiomatic Go practices.
---
# Go Testing Patterns
Comprehensive Go testing patterns for writing reliable, maintainable tests following TDD methodology.
## When to Activate
- Writing new Go functions or methods
- Adding test coverage to existing code
- Creating benchmarks for performance-critical code
- Implementing fuzz tests for input validation
- Following TDD workflow in Go projects
## TDD Workflow for Go
### The RED-GREEN-REFACTOR Cycle
```
RED → Write a failing test first
GREEN → Write minimal code to pass the test
REFACTOR → Improve code while keeping tests green
REPEAT → Continue with next requirement
```
### Step-by-Step TDD in Go
```go
// Step 1: Define the interface/signature
// calculator.go
package calculator
func Add(a, b int) int {
panic("not implemented") // Placeholder
}
// Step 2: Write failing test (RED)
// calculator_test.go
package calculator
import "testing"
func TestAdd(t *testing.T) {
got := Add(2, 3)
want := 5
if got != want {
t.Errorf("Add(2, 3) = %d; want %d", got, want)
}
}
// Step 3: Run test - verify FAIL
// $ go test
// --- FAIL: TestAdd (0.00s)
// panic: not implemented
// Step 4: Implement minimal code (GREEN)
func Add(a, b int) int {
return a + b
}
// Step 5: Run test - verify PASS
// $ go test
// PASS
// Step 6: Refactor if needed, verify tests still pass
```
## Table-Driven Tests
The standard pattern for Go tests. Enables comprehensive coverage with minimal code.
```go
func TestAdd(t *testing.T) {
tests := []struct {
name string
a, b int
expected int
}{
{"positive numbers", 2, 3, 5},
{"negative numbers", -1, -2, -3},
{"zero values", 0, 0, 0},
{"mixed signs", -1, 1, 0},
{"large numbers", 1000000, 2000000, 3000000},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := Add(tt.a, tt.b)
if got != tt.expected {
t.Errorf("Add(%d, %d) = %d; want %d",
tt.a, tt.b, got, tt.expected)
}
})
}
}
```
### Table-Driven Tests with Error Cases
```go
func TestParseConfig(t *testing.T) {
tests := []struct {
name string
input string
want *Config
wantErr bool
}{
{
name: "valid config",
input: `{"host": "localhost", "port": 8080}`,
want: &Config{Host: "localhost", Port: 8080},
},
{
name: "invalid JSON",
input: `{invalid}`,
wantErr: true,
},
{
name: "empty input",
input: "",
wantErr: true,
},
{
name: "minimal config",
input: `{}`,
want: &Config{}, // Zero value config
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got, err := ParseConfig(tt.input)
if tt.wantErr {
if err == nil {
t.Error("expected error, got nil")
}
return
}
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if !reflect.DeepEqual(got, tt.want) {
t.Errorf("got %+v; want %+v", got, tt.want)
}
})
}
}
```
## Subtests and Sub-benchmarks
### Organizing Related Tests
```go
func TestUser(t *testing.T) {
// Setup shared by all subtests
db := setupTestDB(t)
t.Run("Create", func(t *testing.T) {
user := &User{Name: "Alice"}
err := db.CreateUser(user)
if err != nil {
t.Fatalf("CreateUser failed: %v", err)
}
if user.ID == "" {
t.Error("expected user ID to be set")
}
})
t.Run("Get", func(t *testing.T) {
user, err := db.GetUser("alice-id")
if err != nil {
t.Fatalf("GetUser failed: %v", err)
}
if user.Name != "Alice" {
t.Errorf("got name %q; want %q", user.Name, "Alice")
}
})
t.Run("Update", func(t *testing.T) {
// ...
})
t.Run("Delete", func(t *testing.T) {
// ...
})
}
```
### Parallel Subtests
```go
func TestParallel(t *testing.T) {
tests := []struct {
name string
input string
}{
{"case1", "input1"},
{"case2", "input2"},
{"case3", "input3"},
}
for _, tt := range tests {
tt := tt // Capture range variable
t.Run(tt.name, func(t *testing.T) {
t.Parallel() // Run subtests in parallel
result := Process(tt.input)
// assertions...
_ = result
})
}
}
```
## Test Helpers
### Helper Functions
```go
func setupTestDB(t *testing.T) *sql.DB {
t.Helper() // Marks this as a helper function
db, err := sql.Open("sqlite3", ":memory:")
if err != nil {
t.Fatalf("failed to open database: %v", err)
}
// Cleanup when test finishes
t.Cleanup(func() {
db.Close()
})
// Run migrations
if _, err := db.Exec(schema); err != nil {
t.Fatalf("failed to create schema: %v", err)
}
return db
}
func assertNoError(t *testing.T, err error) {
t.Helper()
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
}
func assertEqual[T comparable](t *testing.T, got, want T) {
t.Helper()
if got != want {
t.Errorf("got %v; want %v", got, want)
}
}
```
### Temporary Files and Directories
```go
func TestFileProcessing(t *testing.T) {
// Create temp directory - automatically cleaned up
tmpDir := t.TempDir()
// Create test file
testFile := filepath.Join(tmpDir, "test.txt")
err := os.WriteFile(testFile, []byte("test content"), 0644)
if err != nil {
t.Fatalf("failed to create test file: %v", err)
}
// Run test
result, err := ProcessFile(testFile)
if err != nil {
t.Fatalf("ProcessFile failed: %v", err)
}
// Assert...
_ = result
}
```
## Golden Files
Testing against expected output files stored in `testdata/`.
```go
var update = flag.Bool("update", false, "update golden files")
func TestRender(t *testing.T) {
tests := []struct {
name string
input Template
}{
{"simple", Template{Name: "test"}},
{"complex", Template{Name: "test", Items: []string{"a", "b"}}},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := Render(tt.input)
golden := filepath.Join("testdata", tt.name+".golden")
if *update {
// Update golden file: go test -update
err := os.WriteFile(golden, got, 0644)
if err != nil {
t.Fatalf("failed to update golden file: %v", err)
}
}
want, err := os.ReadFile(golden)
if err != nil {
t.Fatalf("failed to read golden file: %v", err)
}
if !bytes.Equal(got, want) {
t.Errorf("output mismatch:\ngot:\n%s\nwant:\n%s", got, want)
}
})
}
}
```
## Mocking with Interfaces
### Interface-Based Mocking
```go
// Define interface for dependencies
type UserRepository interface {
GetUser(id string) (*User, error)
SaveUser(user *User) error
}
// Production implementation
type PostgresUserRepository struct {
db *sql.DB
}
func (r *PostgresUserRepository) GetUser(id string) (*User, error) {
// Real database query
}
// Mock implementation for tests
type MockUserRepository struct {
GetUserFunc func(id string) (*User, error)
SaveUserFunc func(user *User) error
}
func (m *MockUserRepository) GetUser(id string) (*User, error) {
return m.GetUserFunc(id)
}
func (m *MockUserRepository) SaveUser(user *User) error {
return m.SaveUserFunc(user)
}
// Test using mock
func TestUserService(t *testing.T) {
mock := &MockUserRepository{
GetUserFunc: func(id string) (*User, error) {
if id == "123" {
return &User{ID: "123", Name: "Alice"}, nil
}
return nil, ErrNotFound
},
}
service := NewUserService(mock)
user, err := service.GetUserProfile("123")
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if user.Name != "Alice" {
t.Errorf("got name %q; want %q", user.Name, "Alice")
}
}
```
## Benchmarks
### Basic Benchmarks
```go
func BenchmarkProcess(b *testing.B) {
data := generateTestData(1000)
b.ResetTimer() // Don't count setup time
for i := 0; i < b.N; i++ {
Process(data)
}
}
// Run: go test -bench=BenchmarkProcess -benchmem
// Output: BenchmarkProcess-8 10000 105234 ns/op 4096 B/op 10 allocs/op
```
### Benchmark with Different Sizes
```go
func BenchmarkSort(b *testing.B) {
sizes := []int{100, 1000, 10000, 100000}
for _, size := range sizes {
b.Run(fmt.Sprintf("size=%d", size), func(b *testing.B) {
data := generateRandomSlice(size)
b.ResetTimer()
for i := 0; i < b.N; i++ {
// Make a copy to avoid sorting already sorted data
tmp := make([]int, len(data))
copy(tmp, data)
sort.Ints(tmp)
}
})
}
}
```
### Memory Allocation Benchmarks
```go
func BenchmarkStringConcat(b *testing.B) {
parts := []string{"hello", "world", "foo", "bar", "baz"}
b.Run("plus", func(b *testing.B) {
for i := 0; i < b.N; i++ {
var s string
for _, p := range parts {
s += p
}
_ = s
}
})
b.Run("builder", func(b *testing.B) {
for i := 0; i < b.N; i++ {
var sb strings.Builder
for _, p := range parts {
sb.WriteString(p)
}
_ = sb.String()
}
})
b.Run("join", func(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = strings.Join(parts, "")
}
})
}
```
## Fuzzing (Go 1.18+)
### Basic Fuzz Test
```go
func FuzzParseJSON(f *testing.F) {
// Add seed corpus
f.Add(`{"name": "test"}`)
f.Add(`{"count": 123}`)
f.Add(`[]`)
f.Add(`""`)
f.Fuzz(func(t *testing.T, input string) {
var result map[string]interface{}
err := json.Unmarshal([]byte(input), &result)
if err != nil {
// Invalid JSON is expected for random input
return
}
// If parsing succeeded, re-encoding should work
_, err = json.Marshal(result)
if err != nil {
t.Errorf("Marshal failed after successful Unmarshal: %v", err)
}
})
}
// Run: go test -fuzz=FuzzParseJSON -fuzztime=30s
```
### Fuzz Test with Multiple Inputs
```go
func FuzzCompare(f *testing.F) {
f.Add("hello", "world")
f.Add("", "")
f.Add("abc", "abc")
f.Fuzz(func(t *testing.T, a, b string) {
result := Compare(a, b)
// Property: Compare(a, a) should always equal 0
if a == b && result != 0 {
t.Errorf("Compare(%q, %q) = %d; want 0", a, b, result)
}
// Property: Compare(a, b) and Compare(b, a) should have opposite signs
reverse := Compare(b, a)
if (result > 0 && reverse >= 0) || (result < 0 && reverse <= 0) {
if result != 0 || reverse != 0 {
t.Errorf("Compare(%q, %q) = %d, Compare(%q, %q) = %d; inconsistent",
a, b, result, b, a, reverse)
}
}
})
}
```
## Test Coverage
### Running Coverage
```bash
# Basic coverage
go test -cover ./...
# Generate coverage profile
go test -coverprofile=coverage.out ./...
# View coverage in browser
go tool cover -html=coverage.out
# View coverage by function
go tool cover -func=coverage.out
# Coverage with race detection
go test -race -coverprofile=coverage.out ./...
```
### Coverage Targets
| Code Type | Target |
|-----------|--------|
| Critical business logic | 100% |
| Public APIs | 90%+ |
| General code | 80%+ |
| Generated code | Exclude |
### Excluding Generated Code from Coverage
```go
//go:generate mockgen -source=interface.go -destination=mock_interface.go
// In coverage profile, exclude with build tags:
// go test -cover -tags=!generate ./...
```
## HTTP Handler Testing
```go
func TestHealthHandler(t *testing.T) {
// Create request
req := httptest.NewRequest(http.MethodGet, "/health", nil)
w := httptest.NewRecorder()
// Call handler
HealthHandler(w, req)
// Check response
resp := w.Result()
defer resp.Body.Close()
if resp.StatusCode != http.StatusOK {
t.Errorf("got status %d; want %d", resp.StatusCode, http.StatusOK)
}
body, _ := io.ReadAll(resp.Body)
if string(body) != "OK" {
t.Errorf("got body %q; want %q", body, "OK")
}
}
func TestAPIHandler(t *testing.T) {
tests := []struct {
name string
method string
path string
body string
wantStatus int
wantBody string
}{
{
name: "get user",
method: http.MethodGet,
path: "/users/123",
wantStatus: http.StatusOK,
wantBody: `{"id":"123","name":"Alice"}`,
},
{
name: "not found",
method: http.MethodGet,
path: "/users/999",
wantStatus: http.StatusNotFound,
},
{
name: "create user",
method: http.MethodPost,
path: "/users",
body: `{"name":"Bob"}`,
wantStatus: http.StatusCreated,
},
}
handler := NewAPIHandler()
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
var body io.Reader
if tt.body != "" {
body = strings.NewReader(tt.body)
}
req := httptest.NewRequest(tt.method, tt.path, body)
req.Header.Set("Content-Type", "application/json")
w := httptest.NewRecorder()
handler.ServeHTTP(w, req)
if w.Code != tt.wantStatus {
t.Errorf("got status %d; want %d", w.Code, tt.wantStatus)
}
if tt.wantBody != "" && w.Body.String() != tt.wantBody {
t.Errorf("got body %q; want %q", w.Body.String(), tt.wantBody)
}
})
}
}
```
## Testing Commands
```bash
# Run all tests
go test ./...
# Run tests with verbose output
go test -v ./...
# Run specific test
go test -run TestAdd ./...
# Run tests matching pattern
go test -run "TestUser/Create" ./...
# Run tests with race detector
go test -race ./...
# Run tests with coverage
go test -cover -coverprofile=coverage.out ./...
# Run short tests only
go test -short ./...
# Run tests with timeout
go test -timeout 30s ./...
# Run benchmarks
go test -bench=. -benchmem ./...
# Run fuzzing
go test -fuzz=FuzzParse -fuzztime=30s ./...
# Count test runs (for flaky test detection)
go test -count=10 ./...
```
## Best Practices
**DO:**
- Write tests FIRST (TDD)
- Use table-driven tests for comprehensive coverage
- Test behavior, not implementation
- Use `t.Helper()` in helper functions
- Use `t.Parallel()` for independent tests
- Clean up resources with `t.Cleanup()`
- Use meaningful test names that describe the scenario
**DON'T:**
- Test private functions directly (test through public API)
- Use `time.Sleep()` in tests (use channels or conditions)
- Ignore flaky tests (fix or remove them)
- Mock everything (prefer integration tests when possible)
- Skip error path testing
## Integration with CI/CD
```yaml
# GitHub Actions example
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-go@v5
with:
go-version: '1.22'
- name: Run tests
run: go test -race -coverprofile=coverage.out ./...
- name: Check coverage
run: |
go tool cover -func=coverage.out | grep total | awk '{print $3}' | \
awk -F'%' '{if ($1 < 80) exit 1}'
```
**Remember**: Tests are documentation. They show how your code is meant to be used. Write them clearly and keep them up to date.

View File

@@ -0,0 +1,146 @@
---
name: java-coding-standards
description: "Java coding standards for Spring Boot services: naming, immutability, Optional usage, streams, exceptions, generics, and project layout."
---
# Java Coding Standards
Standards for readable, maintainable Java (17+) code in Spring Boot services.
## When to Activate
- Writing or reviewing Java code in Spring Boot projects
- Enforcing naming, immutability, or exception handling conventions
- Working with records, sealed classes, or pattern matching (Java 17+)
- Reviewing use of Optional, streams, or generics
- Structuring packages and project layout
## Core Principles
- Prefer clarity over cleverness
- Immutable by default; minimize shared mutable state
- Fail fast with meaningful exceptions
- Consistent naming and package structure
## Naming
```java
// ✅ Classes/Records: PascalCase
public class MarketService {}
public record Money(BigDecimal amount, Currency currency) {}
// ✅ Methods/fields: camelCase
private final MarketRepository marketRepository;
public Market findBySlug(String slug) {}
// ✅ Constants: UPPER_SNAKE_CASE
private static final int MAX_PAGE_SIZE = 100;
```
## Immutability
```java
// ✅ Favor records and final fields
public record MarketDto(Long id, String name, MarketStatus status) {}
public class Market {
private final Long id;
private final String name;
// getters only, no setters
}
```
## Optional Usage
```java
// ✅ Return Optional from find* methods
Optional<Market> market = marketRepository.findBySlug(slug);
// ✅ Map/flatMap instead of get()
return market
.map(MarketResponse::from)
.orElseThrow(() -> new EntityNotFoundException("Market not found"));
```
## Streams Best Practices
```java
// ✅ Use streams for transformations, keep pipelines short
List<String> names = markets.stream()
.map(Market::name)
.filter(Objects::nonNull)
.toList();
// ❌ Avoid complex nested streams; prefer loops for clarity
```
## Exceptions
- Use unchecked exceptions for domain errors; wrap technical exceptions with context
- Create domain-specific exceptions (e.g., `MarketNotFoundException`)
- Avoid broad `catch (Exception ex)` unless rethrowing/logging centrally
```java
throw new MarketNotFoundException(slug);
```
## Generics and Type Safety
- Avoid raw types; declare generic parameters
- Prefer bounded generics for reusable utilities
```java
public <T extends Identifiable> Map<Long, T> indexById(Collection<T> items) { ... }
```
## Project Structure (Maven/Gradle)
```
src/main/java/com/example/app/
config/
controller/
service/
repository/
domain/
dto/
util/
src/main/resources/
application.yml
src/test/java/... (mirrors main)
```
## Formatting and Style
- Use 2 or 4 spaces consistently (project standard)
- One public top-level type per file
- Keep methods short and focused; extract helpers
- Order members: constants, fields, constructors, public methods, protected, private
## Code Smells to Avoid
- Long parameter lists → use DTO/builders
- Deep nesting → early returns
- Magic numbers → named constants
- Static mutable state → prefer dependency injection
- Silent catch blocks → log and act or rethrow
## Logging
```java
private static final Logger log = LoggerFactory.getLogger(MarketService.class);
log.info("fetch_market slug={}", slug);
log.error("failed_fetch_market slug={}", slug, ex);
```
## Null Handling
- Accept `@Nullable` only when unavoidable; otherwise use `@NonNull`
- Use Bean Validation (`@NotNull`, `@NotBlank`) on inputs
## Testing Expectations
- JUnit 5 + AssertJ for fluent assertions
- Mockito for mocking; avoid partial mocks where possible
- Favor deterministic tests; no hidden sleeps
**Remember**: Keep code intentional, typed, and observable. Optimize for maintainability over micro-optimizations unless proven necessary.

View File

@@ -0,0 +1,150 @@
---
name: jpa-patterns
description: JPA/Hibernate patterns for entity design, relationships, query optimization, transactions, auditing, indexing, pagination, and pooling in Spring Boot.
---
# JPA/Hibernate Patterns
Use for data modeling, repositories, and performance tuning in Spring Boot.
## When to Activate
- Designing JPA entities and table mappings
- Defining relationships (@OneToMany, @ManyToOne, @ManyToMany)
- Optimizing queries (N+1 prevention, fetch strategies, projections)
- Configuring transactions, auditing, or soft deletes
- Setting up pagination, sorting, or custom repository methods
- Tuning connection pooling (HikariCP) or second-level caching
## Entity Design
```java
@Entity
@Table(name = "markets", indexes = {
@Index(name = "idx_markets_slug", columnList = "slug", unique = true)
})
@EntityListeners(AuditingEntityListener.class)
public class MarketEntity {
@Id @GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@Column(nullable = false, length = 200)
private String name;
@Column(nullable = false, unique = true, length = 120)
private String slug;
@Enumerated(EnumType.STRING)
private MarketStatus status = MarketStatus.ACTIVE;
@CreatedDate private Instant createdAt;
@LastModifiedDate private Instant updatedAt;
}
```
Enable auditing:
```java
@Configuration
@EnableJpaAuditing
class JpaConfig {}
```
## Relationships and N+1 Prevention
```java
@OneToMany(mappedBy = "market", cascade = CascadeType.ALL, orphanRemoval = true)
private List<PositionEntity> positions = new ArrayList<>();
```
- Default to lazy loading; use `JOIN FETCH` in queries when needed
- Avoid `EAGER` on collections; use DTO projections for read paths
```java
@Query("select m from MarketEntity m left join fetch m.positions where m.id = :id")
Optional<MarketEntity> findWithPositions(@Param("id") Long id);
```
## Repository Patterns
```java
public interface MarketRepository extends JpaRepository<MarketEntity, Long> {
Optional<MarketEntity> findBySlug(String slug);
@Query("select m from MarketEntity m where m.status = :status")
Page<MarketEntity> findByStatus(@Param("status") MarketStatus status, Pageable pageable);
}
```
- Use projections for lightweight queries:
```java
public interface MarketSummary {
Long getId();
String getName();
MarketStatus getStatus();
}
Page<MarketSummary> findAllBy(Pageable pageable);
```
## Transactions
- Annotate service methods with `@Transactional`
- Use `@Transactional(readOnly = true)` for read paths to optimize
- Choose propagation carefully; avoid long-running transactions
```java
@Transactional
public Market updateStatus(Long id, MarketStatus status) {
MarketEntity entity = repo.findById(id)
.orElseThrow(() -> new EntityNotFoundException("Market"));
entity.setStatus(status);
return Market.from(entity);
}
```
## Pagination
```java
PageRequest page = PageRequest.of(pageNumber, pageSize, Sort.by("createdAt").descending());
Page<MarketEntity> markets = repo.findByStatus(MarketStatus.ACTIVE, page);
```
For cursor-like pagination, include `id > :lastId` in JPQL with ordering.
## Indexing and Performance
- Add indexes for common filters (`status`, `slug`, foreign keys)
- Use composite indexes matching query patterns (`status, created_at`)
- Avoid `select *`; project only needed columns
- Batch writes with `saveAll` and `hibernate.jdbc.batch_size`
## Connection Pooling (HikariCP)
Recommended properties:
```
spring.datasource.hikari.maximum-pool-size=20
spring.datasource.hikari.minimum-idle=5
spring.datasource.hikari.connection-timeout=30000
spring.datasource.hikari.validation-timeout=5000
```
For PostgreSQL LOB handling, add:
```
spring.jpa.properties.hibernate.jdbc.lob.non_contextual_creation=true
```
## Caching
- 1st-level cache is per EntityManager; avoid keeping entities across transactions
- For read-heavy entities, consider second-level cache cautiously; validate eviction strategy
## Migrations
- Use Flyway or Liquibase; never rely on Hibernate auto DDL in production
- Keep migrations idempotent and additive; avoid dropping columns without plan
## Testing Data Access
- Prefer `@DataJpaTest` with Testcontainers to mirror production
- Assert SQL efficiency using logs: set `logging.level.org.hibernate.SQL=DEBUG` and `logging.level.org.hibernate.orm.jdbc.bind=TRACE` for parameter values
**Remember**: Keep entities lean, queries intentional, and transactions short. Prevent N+1 with fetch strategies and projections, and index for your read/write paths.

View File

@@ -0,0 +1,348 @@
---
name: project-guidelines-example
description: "Example project-specific skill template based on a real production application."
---
# Project Guidelines Skill (Example)
This is an example of a project-specific skill. Use this as a template for your own projects.
Based on a real production application: [Zenith](https://zenith.chat) - AI-powered customer discovery platform.
## When to Use
Reference this skill when working on the specific project it's designed for. Project skills contain:
- Architecture overview
- File structure
- Code patterns
- Testing requirements
- Deployment workflow
---
## Architecture Overview
**Tech Stack:**
- **Frontend**: Next.js 15 (App Router), TypeScript, React
- **Backend**: FastAPI (Python), Pydantic models
- **Database**: Supabase (PostgreSQL)
- **AI**: Claude API with tool calling and structured output
- **Deployment**: Google Cloud Run
- **Testing**: Playwright (E2E), pytest (backend), React Testing Library
**Services:**
```
┌─────────────────────────────────────────────────────────────┐
│ Frontend │
│ Next.js 15 + TypeScript + TailwindCSS │
│ Deployed: Vercel / Cloud Run │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Backend │
│ FastAPI + Python 3.11 + Pydantic │
│ Deployed: Cloud Run │
└─────────────────────────────────────────────────────────────┘
┌───────────────┼───────────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Supabase │ │ Claude │ │ Redis │
│ Database │ │ API │ │ Cache │
└──────────┘ └──────────┘ └──────────┘
```
---
## File Structure
```
project/
├── frontend/
│ └── src/
│ ├── app/ # Next.js app router pages
│ │ ├── api/ # API routes
│ │ ├── (auth)/ # Auth-protected routes
│ │ └── workspace/ # Main app workspace
│ ├── components/ # React components
│ │ ├── ui/ # Base UI components
│ │ ├── forms/ # Form components
│ │ └── layouts/ # Layout components
│ ├── hooks/ # Custom React hooks
│ ├── lib/ # Utilities
│ ├── types/ # TypeScript definitions
│ └── config/ # Configuration
├── backend/
│ ├── routers/ # FastAPI route handlers
│ ├── models.py # Pydantic models
│ ├── main.py # FastAPI app entry
│ ├── auth_system.py # Authentication
│ ├── database.py # Database operations
│ ├── services/ # Business logic
│ └── tests/ # pytest tests
├── deploy/ # Deployment configs
├── docs/ # Documentation
└── scripts/ # Utility scripts
```
---
## Code Patterns
### API Response Format (FastAPI)
```python
from pydantic import BaseModel
from typing import Generic, TypeVar, Optional
T = TypeVar('T')
class ApiResponse(BaseModel, Generic[T]):
success: bool
data: Optional[T] = None
error: Optional[str] = None
@classmethod
def ok(cls, data: T) -> "ApiResponse[T]":
return cls(success=True, data=data)
@classmethod
def fail(cls, error: str) -> "ApiResponse[T]":
return cls(success=False, error=error)
```
### Frontend API Calls (TypeScript)
```typescript
interface ApiResponse<T> {
success: boolean
data?: T
error?: string
}
async function fetchApi<T>(
endpoint: string,
options?: RequestInit
): Promise<ApiResponse<T>> {
try {
const response = await fetch(`/api${endpoint}`, {
...options,
headers: {
'Content-Type': 'application/json',
...options?.headers,
},
})
if (!response.ok) {
return { success: false, error: `HTTP ${response.status}` }
}
return await response.json()
} catch (error) {
return { success: false, error: String(error) }
}
}
```
### Claude AI Integration (Structured Output)
```python
from anthropic import Anthropic
from pydantic import BaseModel
class AnalysisResult(BaseModel):
summary: str
key_points: list[str]
confidence: float
async def analyze_with_claude(content: str) -> AnalysisResult:
client = Anthropic()
response = client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=1024,
messages=[{"role": "user", "content": content}],
tools=[{
"name": "provide_analysis",
"description": "Provide structured analysis",
"input_schema": AnalysisResult.model_json_schema()
}],
tool_choice={"type": "tool", "name": "provide_analysis"}
)
# Extract tool use result
tool_use = next(
block for block in response.content
if block.type == "tool_use"
)
return AnalysisResult(**tool_use.input)
```
### Custom Hooks (React)
```typescript
import { useState, useCallback } from 'react'
interface UseApiState<T> {
data: T | null
loading: boolean
error: string | null
}
export function useApi<T>(
fetchFn: () => Promise<ApiResponse<T>>
) {
const [state, setState] = useState<UseApiState<T>>({
data: null,
loading: false,
error: null,
})
const execute = useCallback(async () => {
setState(prev => ({ ...prev, loading: true, error: null }))
const result = await fetchFn()
if (result.success) {
setState({ data: result.data!, loading: false, error: null })
} else {
setState({ data: null, loading: false, error: result.error! })
}
}, [fetchFn])
return { ...state, execute }
}
```
---
## Testing Requirements
### Backend (pytest)
```bash
# Run all tests
poetry run pytest tests/
# Run with coverage
poetry run pytest tests/ --cov=. --cov-report=html
# Run specific test file
poetry run pytest tests/test_auth.py -v
```
**Test structure:**
```python
import pytest
from httpx import AsyncClient
from main import app
@pytest.fixture
async def client():
async with AsyncClient(app=app, base_url="http://test") as ac:
yield ac
@pytest.mark.asyncio
async def test_health_check(client: AsyncClient):
response = await client.get("/health")
assert response.status_code == 200
assert response.json()["status"] == "healthy"
```
### Frontend (React Testing Library)
```bash
# Run tests
npm run test
# Run with coverage
npm run test -- --coverage
# Run E2E tests
npm run test:e2e
```
**Test structure:**
```typescript
import { render, screen, fireEvent } from '@testing-library/react'
import { WorkspacePanel } from './WorkspacePanel'
describe('WorkspacePanel', () => {
it('renders workspace correctly', () => {
render(<WorkspacePanel />)
expect(screen.getByRole('main')).toBeInTheDocument()
})
it('handles session creation', async () => {
render(<WorkspacePanel />)
fireEvent.click(screen.getByText('New Session'))
expect(await screen.findByText('Session created')).toBeInTheDocument()
})
})
```
---
## Deployment Workflow
### Pre-Deployment Checklist
- [ ] All tests passing locally
- [ ] `npm run build` succeeds (frontend)
- [ ] `poetry run pytest` passes (backend)
- [ ] No hardcoded secrets
- [ ] Environment variables documented
- [ ] Database migrations ready
### Deployment Commands
```bash
# Build and deploy frontend
cd frontend && npm run build
gcloud run deploy frontend --source .
# Build and deploy backend
cd backend
gcloud run deploy backend --source .
```
### Environment Variables
```bash
# Frontend (.env.local)
NEXT_PUBLIC_API_URL=https://api.example.com
NEXT_PUBLIC_SUPABASE_URL=https://xxx.supabase.co
NEXT_PUBLIC_SUPABASE_ANON_KEY=eyJ...
# Backend (.env)
DATABASE_URL=postgresql://...
ANTHROPIC_API_KEY=sk-ant-...
SUPABASE_URL=https://xxx.supabase.co
SUPABASE_KEY=eyJ...
```
---
## Critical Rules
1. **No emojis** in code, comments, or documentation
2. **Immutability** - never mutate objects or arrays
3. **TDD** - write tests before implementation
4. **80% coverage** minimum
5. **Many small files** - 200-400 lines typical, 800 max
6. **No console.log** in production code
7. **Proper error handling** with try/catch
8. **Input validation** with Pydantic/Zod
---
## Related Skills
- `coding-standards.md` - General coding best practices
- `backend-patterns.md` - API and database patterns
- `frontend-patterns.md` - React and Next.js patterns
- `tdd-workflow/` - Test-driven development methodology

View File

@@ -0,0 +1,749 @@
---
name: python-patterns
description: Pythonic idioms, PEP 8 standards, type hints, and best practices for building robust, efficient, and maintainable Python applications.
---
# Python Development Patterns
Idiomatic Python patterns and best practices for building robust, efficient, and maintainable applications.
## When to Activate
- Writing new Python code
- Reviewing Python code
- Refactoring existing Python code
- Designing Python packages/modules
## Core Principles
### 1. Readability Counts
Python prioritizes readability. Code should be obvious and easy to understand.
```python
# Good: Clear and readable
def get_active_users(users: list[User]) -> list[User]:
"""Return only active users from the provided list."""
return [user for user in users if user.is_active]
# Bad: Clever but confusing
def get_active_users(u):
return [x for x in u if x.a]
```
### 2. Explicit is Better Than Implicit
Avoid magic; be clear about what your code does.
```python
# Good: Explicit configuration
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
# Bad: Hidden side effects
import some_module
some_module.setup() # What does this do?
```
### 3. EAFP - Easier to Ask Forgiveness Than Permission
Python prefers exception handling over checking conditions.
```python
# Good: EAFP style
def get_value(dictionary: dict, key: str) -> Any:
try:
return dictionary[key]
except KeyError:
return default_value
# Bad: LBYL (Look Before You Leap) style
def get_value(dictionary: dict, key: str) -> Any:
if key in dictionary:
return dictionary[key]
else:
return default_value
```
## Type Hints
### Basic Type Annotations
```python
from typing import Optional, List, Dict, Any
def process_user(
user_id: str,
data: Dict[str, Any],
active: bool = True
) -> Optional[User]:
"""Process a user and return the updated User or None."""
if not active:
return None
return User(user_id, data)
```
### Modern Type Hints (Python 3.9+)
```python
# Python 3.9+ - Use built-in types
def process_items(items: list[str]) -> dict[str, int]:
return {item: len(item) for item in items}
# Python 3.8 and earlier - Use typing module
from typing import List, Dict
def process_items(items: List[str]) -> Dict[str, int]:
return {item: len(item) for item in items}
```
### Type Aliases and TypeVar
```python
from typing import TypeVar, Union
# Type alias for complex types
JSON = Union[dict[str, Any], list[Any], str, int, float, bool, None]
def parse_json(data: str) -> JSON:
return json.loads(data)
# Generic types
T = TypeVar('T')
def first(items: list[T]) -> T | None:
"""Return the first item or None if list is empty."""
return items[0] if items else None
```
### Protocol-Based Duck Typing
```python
from typing import Protocol
class Renderable(Protocol):
def render(self) -> str:
"""Render the object to a string."""
def render_all(items: list[Renderable]) -> str:
"""Render all items that implement the Renderable protocol."""
return "\n".join(item.render() for item in items)
```
## Error Handling Patterns
### Specific Exception Handling
```python
# Good: Catch specific exceptions
def load_config(path: str) -> Config:
try:
with open(path) as f:
return Config.from_json(f.read())
except FileNotFoundError as e:
raise ConfigError(f"Config file not found: {path}") from e
except json.JSONDecodeError as e:
raise ConfigError(f"Invalid JSON in config: {path}") from e
# Bad: Bare except
def load_config(path: str) -> Config:
try:
with open(path) as f:
return Config.from_json(f.read())
except:
return None # Silent failure!
```
### Exception Chaining
```python
def process_data(data: str) -> Result:
try:
parsed = json.loads(data)
except json.JSONDecodeError as e:
# Chain exceptions to preserve the traceback
raise ValueError(f"Failed to parse data: {data}") from e
```
### Custom Exception Hierarchy
```python
class AppError(Exception):
"""Base exception for all application errors."""
pass
class ValidationError(AppError):
"""Raised when input validation fails."""
pass
class NotFoundError(AppError):
"""Raised when a requested resource is not found."""
pass
# Usage
def get_user(user_id: str) -> User:
user = db.find_user(user_id)
if not user:
raise NotFoundError(f"User not found: {user_id}")
return user
```
## Context Managers
### Resource Management
```python
# Good: Using context managers
def process_file(path: str) -> str:
with open(path, 'r') as f:
return f.read()
# Bad: Manual resource management
def process_file(path: str) -> str:
f = open(path, 'r')
try:
return f.read()
finally:
f.close()
```
### Custom Context Managers
```python
from contextlib import contextmanager
@contextmanager
def timer(name: str):
"""Context manager to time a block of code."""
start = time.perf_counter()
yield
elapsed = time.perf_counter() - start
print(f"{name} took {elapsed:.4f} seconds")
# Usage
with timer("data processing"):
process_large_dataset()
```
### Context Manager Classes
```python
class DatabaseTransaction:
def __init__(self, connection):
self.connection = connection
def __enter__(self):
self.connection.begin_transaction()
return self
def __exit__(self, exc_type, exc_val, exc_tb):
if exc_type is None:
self.connection.commit()
else:
self.connection.rollback()
return False # Don't suppress exceptions
# Usage
with DatabaseTransaction(conn):
user = conn.create_user(user_data)
conn.create_profile(user.id, profile_data)
```
## Comprehensions and Generators
### List Comprehensions
```python
# Good: List comprehension for simple transformations
names = [user.name for user in users if user.is_active]
# Bad: Manual loop
names = []
for user in users:
if user.is_active:
names.append(user.name)
# Complex comprehensions should be expanded
# Bad: Too complex
result = [x * 2 for x in items if x > 0 if x % 2 == 0]
# Good: Use a generator function
def filter_and_transform(items: Iterable[int]) -> list[int]:
result = []
for x in items:
if x > 0 and x % 2 == 0:
result.append(x * 2)
return result
```
### Generator Expressions
```python
# Good: Generator for lazy evaluation
total = sum(x * x for x in range(1_000_000))
# Bad: Creates large intermediate list
total = sum([x * x for x in range(1_000_000)])
```
### Generator Functions
```python
def read_large_file(path: str) -> Iterator[str]:
"""Read a large file line by line."""
with open(path) as f:
for line in f:
yield line.strip()
# Usage
for line in read_large_file("huge.txt"):
process(line)
```
## Data Classes and Named Tuples
### Data Classes
```python
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class User:
"""User entity with automatic __init__, __repr__, and __eq__."""
id: str
name: str
email: str
created_at: datetime = field(default_factory=datetime.now)
is_active: bool = True
# Usage
user = User(
id="123",
name="Alice",
email="alice@example.com"
)
```
### Data Classes with Validation
```python
@dataclass
class User:
email: str
age: int
def __post_init__(self):
# Validate email format
if "@" not in self.email:
raise ValueError(f"Invalid email: {self.email}")
# Validate age range
if self.age < 0 or self.age > 150:
raise ValueError(f"Invalid age: {self.age}")
```
### Named Tuples
```python
from typing import NamedTuple
class Point(NamedTuple):
"""Immutable 2D point."""
x: float
y: float
def distance(self, other: 'Point') -> float:
return ((self.x - other.x) ** 2 + (self.y - other.y) ** 2) ** 0.5
# Usage
p1 = Point(0, 0)
p2 = Point(3, 4)
print(p1.distance(p2)) # 5.0
```
## Decorators
### Function Decorators
```python
import functools
import time
def timer(func: Callable) -> Callable:
"""Decorator to time function execution."""
@functools.wraps(func)
def wrapper(*args, **kwargs):
start = time.perf_counter()
result = func(*args, **kwargs)
elapsed = time.perf_counter() - start
print(f"{func.__name__} took {elapsed:.4f}s")
return result
return wrapper
@timer
def slow_function():
time.sleep(1)
# slow_function() prints: slow_function took 1.0012s
```
### Parameterized Decorators
```python
def repeat(times: int):
"""Decorator to repeat a function multiple times."""
def decorator(func: Callable) -> Callable:
@functools.wraps(func)
def wrapper(*args, **kwargs):
results = []
for _ in range(times):
results.append(func(*args, **kwargs))
return results
return wrapper
return decorator
@repeat(times=3)
def greet(name: str) -> str:
return f"Hello, {name}!"
# greet("Alice") returns ["Hello, Alice!", "Hello, Alice!", "Hello, Alice!"]
```
### Class-Based Decorators
```python
class CountCalls:
"""Decorator that counts how many times a function is called."""
def __init__(self, func: Callable):
functools.update_wrapper(self, func)
self.func = func
self.count = 0
def __call__(self, *args, **kwargs):
self.count += 1
print(f"{self.func.__name__} has been called {self.count} times")
return self.func(*args, **kwargs)
@CountCalls
def process():
pass
# Each call to process() prints the call count
```
## Concurrency Patterns
### Threading for I/O-Bound Tasks
```python
import concurrent.futures
import threading
def fetch_url(url: str) -> str:
"""Fetch a URL (I/O-bound operation)."""
import urllib.request
with urllib.request.urlopen(url) as response:
return response.read().decode()
def fetch_all_urls(urls: list[str]) -> dict[str, str]:
"""Fetch multiple URLs concurrently using threads."""
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
future_to_url = {executor.submit(fetch_url, url): url for url in urls}
results = {}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
results[url] = future.result()
except Exception as e:
results[url] = f"Error: {e}"
return results
```
### Multiprocessing for CPU-Bound Tasks
```python
def process_data(data: list[int]) -> int:
"""CPU-intensive computation."""
return sum(x ** 2 for x in data)
def process_all(datasets: list[list[int]]) -> list[int]:
"""Process multiple datasets using multiple processes."""
with concurrent.futures.ProcessPoolExecutor() as executor:
results = list(executor.map(process_data, datasets))
return results
```
### Async/Await for Concurrent I/O
```python
import asyncio
async def fetch_async(url: str) -> str:
"""Fetch a URL asynchronously."""
import aiohttp
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.text()
async def fetch_all(urls: list[str]) -> dict[str, str]:
"""Fetch multiple URLs concurrently."""
tasks = [fetch_async(url) for url in urls]
results = await asyncio.gather(*tasks, return_exceptions=True)
return dict(zip(urls, results))
```
## Package Organization
### Standard Project Layout
```
myproject/
├── src/
│ └── mypackage/
│ ├── __init__.py
│ ├── main.py
│ ├── api/
│ │ ├── __init__.py
│ │ └── routes.py
│ ├── models/
│ │ ├── __init__.py
│ │ └── user.py
│ └── utils/
│ ├── __init__.py
│ └── helpers.py
├── tests/
│ ├── __init__.py
│ ├── conftest.py
│ ├── test_api.py
│ └── test_models.py
├── pyproject.toml
├── README.md
└── .gitignore
```
### Import Conventions
```python
# Good: Import order - stdlib, third-party, local
import os
import sys
from pathlib import Path
import requests
from fastapi import FastAPI
from mypackage.models import User
from mypackage.utils import format_name
# Good: Use isort for automatic import sorting
# pip install isort
```
### __init__.py for Package Exports
```python
# mypackage/__init__.py
"""mypackage - A sample Python package."""
__version__ = "1.0.0"
# Export main classes/functions at package level
from mypackage.models import User, Post
from mypackage.utils import format_name
__all__ = ["User", "Post", "format_name"]
```
## Memory and Performance
### Using __slots__ for Memory Efficiency
```python
# Bad: Regular class uses __dict__ (more memory)
class Point:
def __init__(self, x: float, y: float):
self.x = x
self.y = y
# Good: __slots__ reduces memory usage
class Point:
__slots__ = ['x', 'y']
def __init__(self, x: float, y: float):
self.x = x
self.y = y
```
### Generator for Large Data
```python
# Bad: Returns full list in memory
def read_lines(path: str) -> list[str]:
with open(path) as f:
return [line.strip() for line in f]
# Good: Yields lines one at a time
def read_lines(path: str) -> Iterator[str]:
with open(path) as f:
for line in f:
yield line.strip()
```
### Avoid String Concatenation in Loops
```python
# Bad: O(n²) due to string immutability
result = ""
for item in items:
result += str(item)
# Good: O(n) using join
result = "".join(str(item) for item in items)
# Good: Using StringIO for building
from io import StringIO
buffer = StringIO()
for item in items:
buffer.write(str(item))
result = buffer.getvalue()
```
## Python Tooling Integration
### Essential Commands
```bash
# Code formatting
black .
isort .
# Linting
ruff check .
pylint mypackage/
# Type checking
mypy .
# Testing
pytest --cov=mypackage --cov-report=html
# Security scanning
bandit -r .
# Dependency management
pip-audit
safety check
```
### pyproject.toml Configuration
```toml
[project]
name = "mypackage"
version = "1.0.0"
requires-python = ">=3.9"
dependencies = [
"requests>=2.31.0",
"pydantic>=2.0.0",
]
[project.optional-dependencies]
dev = [
"pytest>=7.4.0",
"pytest-cov>=4.1.0",
"black>=23.0.0",
"ruff>=0.1.0",
"mypy>=1.5.0",
]
[tool.black]
line-length = 88
target-version = ['py39']
[tool.ruff]
line-length = 88
select = ["E", "F", "I", "N", "W"]
[tool.mypy]
python_version = "3.9"
warn_return_any = true
warn_unused_configs = true
disallow_untyped_defs = true
[tool.pytest.ini_options]
testpaths = ["tests"]
addopts = "--cov=mypackage --cov-report=term-missing"
```
## Quick Reference: Python Idioms
| Idiom | Description |
|-------|-------------|
| EAFP | Easier to Ask Forgiveness than Permission |
| Context managers | Use `with` for resource management |
| List comprehensions | For simple transformations |
| Generators | For lazy evaluation and large datasets |
| Type hints | Annotate function signatures |
| Dataclasses | For data containers with auto-generated methods |
| `__slots__` | For memory optimization |
| f-strings | For string formatting (Python 3.6+) |
| `pathlib.Path` | For path operations (Python 3.4+) |
| `enumerate` | For index-element pairs in loops |
## Anti-Patterns to Avoid
```python
# Bad: Mutable default arguments
def append_to(item, items=[]):
items.append(item)
return items
# Good: Use None and create new list
def append_to(item, items=None):
if items is None:
items = []
items.append(item)
return items
# Bad: Checking type with type()
if type(obj) == list:
process(obj)
# Good: Use isinstance
if isinstance(obj, list):
process(obj)
# Bad: Comparing to None with ==
if value == None:
process()
# Good: Use is
if value is None:
process()
# Bad: from module import *
from os.path import *
# Good: Explicit imports
from os.path import join, exists
# Bad: Bare except
try:
risky_operation()
except:
pass
# Good: Specific exception
try:
risky_operation()
except SpecificError as e:
logger.error(f"Operation failed: {e}")
```
__Remember__: Python code should be readable, explicit, and follow the principle of least surprise. When in doubt, prioritize clarity over cleverness.

View File

@@ -0,0 +1,815 @@
---
name: python-testing
description: Python testing strategies using pytest, TDD methodology, fixtures, mocking, parametrization, and coverage requirements.
---
# Python Testing Patterns
Comprehensive testing strategies for Python applications using pytest, TDD methodology, and best practices.
## When to Activate
- Writing new Python code (follow TDD: red, green, refactor)
- Designing test suites for Python projects
- Reviewing Python test coverage
- Setting up testing infrastructure
## Core Testing Philosophy
### Test-Driven Development (TDD)
Always follow the TDD cycle:
1. **RED**: Write a failing test for the desired behavior
2. **GREEN**: Write minimal code to make the test pass
3. **REFACTOR**: Improve code while keeping tests green
```python
# Step 1: Write failing test (RED)
def test_add_numbers():
result = add(2, 3)
assert result == 5
# Step 2: Write minimal implementation (GREEN)
def add(a, b):
return a + b
# Step 3: Refactor if needed (REFACTOR)
```
### Coverage Requirements
- **Target**: 80%+ code coverage
- **Critical paths**: 100% coverage required
- Use `pytest --cov` to measure coverage
```bash
pytest --cov=mypackage --cov-report=term-missing --cov-report=html
```
## pytest Fundamentals
### Basic Test Structure
```python
import pytest
def test_addition():
"""Test basic addition."""
assert 2 + 2 == 4
def test_string_uppercase():
"""Test string uppercasing."""
text = "hello"
assert text.upper() == "HELLO"
def test_list_append():
"""Test list append."""
items = [1, 2, 3]
items.append(4)
assert 4 in items
assert len(items) == 4
```
### Assertions
```python
# Equality
assert result == expected
# Inequality
assert result != unexpected
# Truthiness
assert result # Truthy
assert not result # Falsy
assert result is True # Exactly True
assert result is False # Exactly False
assert result is None # Exactly None
# Membership
assert item in collection
assert item not in collection
# Comparisons
assert result > 0
assert 0 <= result <= 100
# Type checking
assert isinstance(result, str)
# Exception testing (preferred approach)
with pytest.raises(ValueError):
raise ValueError("error message")
# Check exception message
with pytest.raises(ValueError, match="invalid input"):
raise ValueError("invalid input provided")
# Check exception attributes
with pytest.raises(ValueError) as exc_info:
raise ValueError("error message")
assert str(exc_info.value) == "error message"
```
## Fixtures
### Basic Fixture Usage
```python
import pytest
@pytest.fixture
def sample_data():
"""Fixture providing sample data."""
return {"name": "Alice", "age": 30}
def test_sample_data(sample_data):
"""Test using the fixture."""
assert sample_data["name"] == "Alice"
assert sample_data["age"] == 30
```
### Fixture with Setup/Teardown
```python
@pytest.fixture
def database():
"""Fixture with setup and teardown."""
# Setup
db = Database(":memory:")
db.create_tables()
db.insert_test_data()
yield db # Provide to test
# Teardown
db.close()
def test_database_query(database):
"""Test database operations."""
result = database.query("SELECT * FROM users")
assert len(result) > 0
```
### Fixture Scopes
```python
# Function scope (default) - runs for each test
@pytest.fixture
def temp_file():
with open("temp.txt", "w") as f:
yield f
os.remove("temp.txt")
# Module scope - runs once per module
@pytest.fixture(scope="module")
def module_db():
db = Database(":memory:")
db.create_tables()
yield db
db.close()
# Session scope - runs once per test session
@pytest.fixture(scope="session")
def shared_resource():
resource = ExpensiveResource()
yield resource
resource.cleanup()
```
### Fixture with Parameters
```python
@pytest.fixture(params=[1, 2, 3])
def number(request):
"""Parameterized fixture."""
return request.param
def test_numbers(number):
"""Test runs 3 times, once for each parameter."""
assert number > 0
```
### Using Multiple Fixtures
```python
@pytest.fixture
def user():
return User(id=1, name="Alice")
@pytest.fixture
def admin():
return User(id=2, name="Admin", role="admin")
def test_user_admin_interaction(user, admin):
"""Test using multiple fixtures."""
assert admin.can_manage(user)
```
### Autouse Fixtures
```python
@pytest.fixture(autouse=True)
def reset_config():
"""Automatically runs before every test."""
Config.reset()
yield
Config.cleanup()
def test_without_fixture_call():
# reset_config runs automatically
assert Config.get_setting("debug") is False
```
### Conftest.py for Shared Fixtures
```python
# tests/conftest.py
import pytest
@pytest.fixture
def client():
"""Shared fixture for all tests."""
app = create_app(testing=True)
with app.test_client() as client:
yield client
@pytest.fixture
def auth_headers(client):
"""Generate auth headers for API testing."""
response = client.post("/api/login", json={
"username": "test",
"password": "test"
})
token = response.json["token"]
return {"Authorization": f"Bearer {token}"}
```
## Parametrization
### Basic Parametrization
```python
@pytest.mark.parametrize("input,expected", [
("hello", "HELLO"),
("world", "WORLD"),
("PyThOn", "PYTHON"),
])
def test_uppercase(input, expected):
"""Test runs 3 times with different inputs."""
assert input.upper() == expected
```
### Multiple Parameters
```python
@pytest.mark.parametrize("a,b,expected", [
(2, 3, 5),
(0, 0, 0),
(-1, 1, 0),
(100, 200, 300),
])
def test_add(a, b, expected):
"""Test addition with multiple inputs."""
assert add(a, b) == expected
```
### Parametrize with IDs
```python
@pytest.mark.parametrize("input,expected", [
("valid@email.com", True),
("invalid", False),
("@no-domain.com", False),
], ids=["valid-email", "missing-at", "missing-domain"])
def test_email_validation(input, expected):
"""Test email validation with readable test IDs."""
assert is_valid_email(input) is expected
```
### Parametrized Fixtures
```python
@pytest.fixture(params=["sqlite", "postgresql", "mysql"])
def db(request):
"""Test against multiple database backends."""
if request.param == "sqlite":
return Database(":memory:")
elif request.param == "postgresql":
return Database("postgresql://localhost/test")
elif request.param == "mysql":
return Database("mysql://localhost/test")
def test_database_operations(db):
"""Test runs 3 times, once for each database."""
result = db.query("SELECT 1")
assert result is not None
```
## Markers and Test Selection
### Custom Markers
```python
# Mark slow tests
@pytest.mark.slow
def test_slow_operation():
time.sleep(5)
# Mark integration tests
@pytest.mark.integration
def test_api_integration():
response = requests.get("https://api.example.com")
assert response.status_code == 200
# Mark unit tests
@pytest.mark.unit
def test_unit_logic():
assert calculate(2, 3) == 5
```
### Run Specific Tests
```bash
# Run only fast tests
pytest -m "not slow"
# Run only integration tests
pytest -m integration
# Run integration or slow tests
pytest -m "integration or slow"
# Run tests marked as unit but not slow
pytest -m "unit and not slow"
```
### Configure Markers in pytest.ini
```ini
[pytest]
markers =
slow: marks tests as slow
integration: marks tests as integration tests
unit: marks tests as unit tests
django: marks tests as requiring Django
```
## Mocking and Patching
### Mocking Functions
```python
from unittest.mock import patch, Mock
@patch("mypackage.external_api_call")
def test_with_mock(api_call_mock):
"""Test with mocked external API."""
api_call_mock.return_value = {"status": "success"}
result = my_function()
api_call_mock.assert_called_once()
assert result["status"] == "success"
```
### Mocking Return Values
```python
@patch("mypackage.Database.connect")
def test_database_connection(connect_mock):
"""Test with mocked database connection."""
connect_mock.return_value = MockConnection()
db = Database()
db.connect()
connect_mock.assert_called_once_with("localhost")
```
### Mocking Exceptions
```python
@patch("mypackage.api_call")
def test_api_error_handling(api_call_mock):
"""Test error handling with mocked exception."""
api_call_mock.side_effect = ConnectionError("Network error")
with pytest.raises(ConnectionError):
api_call()
api_call_mock.assert_called_once()
```
### Mocking Context Managers
```python
@patch("builtins.open", new_callable=mock_open)
def test_file_reading(mock_file):
"""Test file reading with mocked open."""
mock_file.return_value.read.return_value = "file content"
result = read_file("test.txt")
mock_file.assert_called_once_with("test.txt", "r")
assert result == "file content"
```
### Using Autospec
```python
@patch("mypackage.DBConnection", autospec=True)
def test_autospec(db_mock):
"""Test with autospec to catch API misuse."""
db = db_mock.return_value
db.query("SELECT * FROM users")
# This would fail if DBConnection doesn't have query method
db_mock.assert_called_once()
```
### Mock Class Instances
```python
class TestUserService:
@patch("mypackage.UserRepository")
def test_create_user(self, repo_mock):
"""Test user creation with mocked repository."""
repo_mock.return_value.save.return_value = User(id=1, name="Alice")
service = UserService(repo_mock.return_value)
user = service.create_user(name="Alice")
assert user.name == "Alice"
repo_mock.return_value.save.assert_called_once()
```
### Mock Property
```python
@pytest.fixture
def mock_config():
"""Create a mock with a property."""
config = Mock()
type(config).debug = PropertyMock(return_value=True)
type(config).api_key = PropertyMock(return_value="test-key")
return config
def test_with_mock_config(mock_config):
"""Test with mocked config properties."""
assert mock_config.debug is True
assert mock_config.api_key == "test-key"
```
## Testing Async Code
### Async Tests with pytest-asyncio
```python
import pytest
@pytest.mark.asyncio
async def test_async_function():
"""Test async function."""
result = await async_add(2, 3)
assert result == 5
@pytest.mark.asyncio
async def test_async_with_fixture(async_client):
"""Test async with async fixture."""
response = await async_client.get("/api/users")
assert response.status_code == 200
```
### Async Fixture
```python
@pytest.fixture
async def async_client():
"""Async fixture providing async test client."""
app = create_app()
async with app.test_client() as client:
yield client
@pytest.mark.asyncio
async def test_api_endpoint(async_client):
"""Test using async fixture."""
response = await async_client.get("/api/data")
assert response.status_code == 200
```
### Mocking Async Functions
```python
@pytest.mark.asyncio
@patch("mypackage.async_api_call")
async def test_async_mock(api_call_mock):
"""Test async function with mock."""
api_call_mock.return_value = {"status": "ok"}
result = await my_async_function()
api_call_mock.assert_awaited_once()
assert result["status"] == "ok"
```
## Testing Exceptions
### Testing Expected Exceptions
```python
def test_divide_by_zero():
"""Test that dividing by zero raises ZeroDivisionError."""
with pytest.raises(ZeroDivisionError):
divide(10, 0)
def test_custom_exception():
"""Test custom exception with message."""
with pytest.raises(ValueError, match="invalid input"):
validate_input("invalid")
```
### Testing Exception Attributes
```python
def test_exception_with_details():
"""Test exception with custom attributes."""
with pytest.raises(CustomError) as exc_info:
raise CustomError("error", code=400)
assert exc_info.value.code == 400
assert "error" in str(exc_info.value)
```
## Testing Side Effects
### Testing File Operations
```python
import tempfile
import os
def test_file_processing():
"""Test file processing with temp file."""
with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.txt') as f:
f.write("test content")
temp_path = f.name
try:
result = process_file(temp_path)
assert result == "processed: test content"
finally:
os.unlink(temp_path)
```
### Testing with pytest's tmp_path Fixture
```python
def test_with_tmp_path(tmp_path):
"""Test using pytest's built-in temp path fixture."""
test_file = tmp_path / "test.txt"
test_file.write_text("hello world")
result = process_file(str(test_file))
assert result == "hello world"
# tmp_path automatically cleaned up
```
### Testing with tmpdir Fixture
```python
def test_with_tmpdir(tmpdir):
"""Test using pytest's tmpdir fixture."""
test_file = tmpdir.join("test.txt")
test_file.write("data")
result = process_file(str(test_file))
assert result == "data"
```
## Test Organization
### Directory Structure
```
tests/
├── conftest.py # Shared fixtures
├── __init__.py
├── unit/ # Unit tests
│ ├── __init__.py
│ ├── test_models.py
│ ├── test_utils.py
│ └── test_services.py
├── integration/ # Integration tests
│ ├── __init__.py
│ ├── test_api.py
│ └── test_database.py
└── e2e/ # End-to-end tests
├── __init__.py
└── test_user_flow.py
```
### Test Classes
```python
class TestUserService:
"""Group related tests in a class."""
@pytest.fixture(autouse=True)
def setup(self):
"""Setup runs before each test in this class."""
self.service = UserService()
def test_create_user(self):
"""Test user creation."""
user = self.service.create_user("Alice")
assert user.name == "Alice"
def test_delete_user(self):
"""Test user deletion."""
user = User(id=1, name="Bob")
self.service.delete_user(user)
assert not self.service.user_exists(1)
```
## Best Practices
### DO
- **Follow TDD**: Write tests before code (red-green-refactor)
- **Test one thing**: Each test should verify a single behavior
- **Use descriptive names**: `test_user_login_with_invalid_credentials_fails`
- **Use fixtures**: Eliminate duplication with fixtures
- **Mock external dependencies**: Don't depend on external services
- **Test edge cases**: Empty inputs, None values, boundary conditions
- **Aim for 80%+ coverage**: Focus on critical paths
- **Keep tests fast**: Use marks to separate slow tests
### DON'T
- **Don't test implementation**: Test behavior, not internals
- **Don't use complex conditionals in tests**: Keep tests simple
- **Don't ignore test failures**: All tests must pass
- **Don't test third-party code**: Trust libraries to work
- **Don't share state between tests**: Tests should be independent
- **Don't catch exceptions in tests**: Use `pytest.raises`
- **Don't use print statements**: Use assertions and pytest output
- **Don't write tests that are too brittle**: Avoid over-specific mocks
## Common Patterns
### Testing API Endpoints (FastAPI/Flask)
```python
@pytest.fixture
def client():
app = create_app(testing=True)
return app.test_client()
def test_get_user(client):
response = client.get("/api/users/1")
assert response.status_code == 200
assert response.json["id"] == 1
def test_create_user(client):
response = client.post("/api/users", json={
"name": "Alice",
"email": "alice@example.com"
})
assert response.status_code == 201
assert response.json["name"] == "Alice"
```
### Testing Database Operations
```python
@pytest.fixture
def db_session():
"""Create a test database session."""
session = Session(bind=engine)
session.begin_nested()
yield session
session.rollback()
session.close()
def test_create_user(db_session):
user = User(name="Alice", email="alice@example.com")
db_session.add(user)
db_session.commit()
retrieved = db_session.query(User).filter_by(name="Alice").first()
assert retrieved.email == "alice@example.com"
```
### Testing Class Methods
```python
class TestCalculator:
@pytest.fixture
def calculator(self):
return Calculator()
def test_add(self, calculator):
assert calculator.add(2, 3) == 5
def test_divide_by_zero(self, calculator):
with pytest.raises(ZeroDivisionError):
calculator.divide(10, 0)
```
## pytest Configuration
### pytest.ini
```ini
[pytest]
testpaths = tests
python_files = test_*.py
python_classes = Test*
python_functions = test_*
addopts =
--strict-markers
--disable-warnings
--cov=mypackage
--cov-report=term-missing
--cov-report=html
markers =
slow: marks tests as slow
integration: marks tests as integration tests
unit: marks tests as unit tests
```
### pyproject.toml
```toml
[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]
python_classes = ["Test*"]
python_functions = ["test_*"]
addopts = [
"--strict-markers",
"--cov=mypackage",
"--cov-report=term-missing",
"--cov-report=html",
]
markers = [
"slow: marks tests as slow",
"integration: marks tests as integration tests",
"unit: marks tests as unit tests",
]
```
## Running Tests
```bash
# Run all tests
pytest
# Run specific file
pytest tests/test_utils.py
# Run specific test
pytest tests/test_utils.py::test_function
# Run with verbose output
pytest -v
# Run with coverage
pytest --cov=mypackage --cov-report=html
# Run only fast tests
pytest -m "not slow"
# Run until first failure
pytest -x
# Run and stop on N failures
pytest --maxfail=3
# Run last failed tests
pytest --lf
# Run tests with pattern
pytest -k "test_user"
# Run with debugger on failure
pytest --pdb
```
## Quick Reference
| Pattern | Usage |
|---------|-------|
| `pytest.raises()` | Test expected exceptions |
| `@pytest.fixture()` | Create reusable test fixtures |
| `@pytest.mark.parametrize()` | Run tests with multiple inputs |
| `@pytest.mark.slow` | Mark slow tests |
| `pytest -m "not slow"` | Skip slow tests |
| `@patch()` | Mock functions and classes |
| `tmp_path` fixture | Automatic temp directory |
| `pytest --cov` | Generate coverage report |
| `assert` | Simple and readable assertions |
**Remember**: Tests are code too. Keep them clean, readable, and maintainable. Good tests catch bugs; great tests prevent them.

View File

@@ -0,0 +1,219 @@
---
name: regex-vs-llm-structured-text
description: Decision framework for choosing between regex and LLM when parsing structured text — start with regex, add LLM only for low-confidence edge cases.
---
# Regex vs LLM for Structured Text Parsing
A practical decision framework for parsing structured text (quizzes, forms, invoices, documents). The key insight: regex handles 95-98% of cases cheaply and deterministically. Reserve expensive LLM calls for the remaining edge cases.
## When to Activate
- Parsing structured text with repeating patterns (questions, forms, tables)
- Deciding between regex and LLM for text extraction
- Building hybrid pipelines that combine both approaches
- Optimizing cost/accuracy tradeoffs in text processing
## Decision Framework
```
Is the text format consistent and repeating?
├── Yes (>90% follows a pattern) → Start with Regex
│ ├── Regex handles 95%+ → Done, no LLM needed
│ └── Regex handles <95% → Add LLM for edge cases only
└── No (free-form, highly variable) → Use LLM directly
```
## Architecture Pattern
```
Source Text
[Regex Parser] ─── Extracts structure (95-98% accuracy)
[Text Cleaner] ─── Removes noise (markers, page numbers, artifacts)
[Confidence Scorer] ─── Flags low-confidence extractions
├── High confidence (≥0.95) → Direct output
└── Low confidence (<0.95) → [LLM Validator] → Output
```
## Implementation
### 1. Regex Parser (Handles the Majority)
```python
import re
from dataclasses import dataclass
@dataclass(frozen=True)
class ParsedItem:
id: str
text: str
choices: tuple[str, ...]
answer: str
confidence: float = 1.0
def parse_structured_text(content: str) -> list[ParsedItem]:
"""Parse structured text using regex patterns."""
pattern = re.compile(
r"(?P<id>\d+)\.\s*(?P<text>.+?)\n"
r"(?P<choices>(?:[A-D]\..+?\n)+)"
r"Answer:\s*(?P<answer>[A-D])",
re.MULTILINE | re.DOTALL,
)
items = []
for match in pattern.finditer(content):
choices = tuple(
c.strip() for c in re.findall(r"[A-D]\.\s*(.+)", match.group("choices"))
)
items.append(ParsedItem(
id=match.group("id"),
text=match.group("text").strip(),
choices=choices,
answer=match.group("answer"),
))
return items
```
### 2. Confidence Scoring
Flag items that may need LLM review:
```python
@dataclass(frozen=True)
class ConfidenceFlag:
item_id: str
score: float
reasons: tuple[str, ...]
def score_confidence(item: ParsedItem) -> ConfidenceFlag:
"""Score extraction confidence and flag issues."""
reasons = []
score = 1.0
if len(item.choices) < 3:
reasons.append("few_choices")
score -= 0.3
if not item.answer:
reasons.append("missing_answer")
score -= 0.5
if len(item.text) < 10:
reasons.append("short_text")
score -= 0.2
return ConfidenceFlag(
item_id=item.id,
score=max(0.0, score),
reasons=tuple(reasons),
)
def identify_low_confidence(
items: list[ParsedItem],
threshold: float = 0.95,
) -> list[ConfidenceFlag]:
"""Return items below confidence threshold."""
flags = [score_confidence(item) for item in items]
return [f for f in flags if f.score < threshold]
```
### 3. LLM Validator (Edge Cases Only)
```python
def validate_with_llm(
item: ParsedItem,
original_text: str,
client,
) -> ParsedItem:
"""Use LLM to fix low-confidence extractions."""
response = client.messages.create(
model="claude-haiku-4-5-20251001", # Cheapest model for validation
max_tokens=500,
messages=[{
"role": "user",
"content": (
f"Extract the question, choices, and answer from this text.\n\n"
f"Text: {original_text}\n\n"
f"Current extraction: {item}\n\n"
f"Return corrected JSON if needed, or 'CORRECT' if accurate."
),
}],
)
# Parse LLM response and return corrected item...
return corrected_item
```
### 4. Hybrid Pipeline
```python
def process_document(
content: str,
*,
llm_client=None,
confidence_threshold: float = 0.95,
) -> list[ParsedItem]:
"""Full pipeline: regex -> confidence check -> LLM for edge cases."""
# Step 1: Regex extraction (handles 95-98%)
items = parse_structured_text(content)
# Step 2: Confidence scoring
low_confidence = identify_low_confidence(items, confidence_threshold)
if not low_confidence or llm_client is None:
return items
# Step 3: LLM validation (only for flagged items)
low_conf_ids = {f.item_id for f in low_confidence}
result = []
for item in items:
if item.id in low_conf_ids:
result.append(validate_with_llm(item, content, llm_client))
else:
result.append(item)
return result
```
## Real-World Metrics
From a production quiz parsing pipeline (410 items):
| Metric | Value |
|--------|-------|
| Regex success rate | 98.0% |
| Low confidence items | 8 (2.0%) |
| LLM calls needed | ~5 |
| Cost savings vs all-LLM | ~95% |
| Test coverage | 93% |
## Best Practices
- **Start with regex** — even imperfect regex gives you a baseline to improve
- **Use confidence scoring** to programmatically identify what needs LLM help
- **Use the cheapest LLM** for validation (Haiku-class models are sufficient)
- **Never mutate** parsed items — return new instances from cleaning/validation steps
- **TDD works well** for parsers — write tests for known patterns first, then edge cases
- **Log metrics** (regex success rate, LLM call count) to track pipeline health
## Anti-Patterns to Avoid
- Sending all text to an LLM when regex handles 95%+ of cases (expensive and slow)
- Using regex for free-form, highly variable text (LLM is better here)
- Skipping confidence scoring and hoping regex "just works"
- Mutating parsed objects during cleaning/validation steps
- Not testing edge cases (malformed input, missing fields, encoding issues)
## When to Use
- Quiz/exam question parsing
- Form data extraction
- Invoice/receipt processing
- Document structure parsing (headers, sections, tables)
- Any structured text with repeating patterns where cost matters

View File

@@ -0,0 +1,313 @@
---
name: springboot-patterns
description: Spring Boot architecture patterns, REST API design, layered services, data access, caching, async processing, and logging. Use for Java Spring Boot backend work.
---
# Spring Boot Development Patterns
Spring Boot architecture and API patterns for scalable, production-grade services.
## When to Activate
- Building REST APIs with Spring MVC or WebFlux
- Structuring controller → service → repository layers
- Configuring Spring Data JPA, caching, or async processing
- Adding validation, exception handling, or pagination
- Setting up profiles for dev/staging/production environments
- Implementing event-driven patterns with Spring Events or Kafka
## REST API Structure
```java
@RestController
@RequestMapping("/api/markets")
@Validated
class MarketController {
private final MarketService marketService;
MarketController(MarketService marketService) {
this.marketService = marketService;
}
@GetMapping
ResponseEntity<Page<MarketResponse>> list(
@RequestParam(defaultValue = "0") int page,
@RequestParam(defaultValue = "20") int size) {
Page<Market> markets = marketService.list(PageRequest.of(page, size));
return ResponseEntity.ok(markets.map(MarketResponse::from));
}
@PostMapping
ResponseEntity<MarketResponse> create(@Valid @RequestBody CreateMarketRequest request) {
Market market = marketService.create(request);
return ResponseEntity.status(HttpStatus.CREATED).body(MarketResponse.from(market));
}
}
```
## Repository Pattern (Spring Data JPA)
```java
public interface MarketRepository extends JpaRepository<MarketEntity, Long> {
@Query("select m from MarketEntity m where m.status = :status order by m.volume desc")
List<MarketEntity> findActive(@Param("status") MarketStatus status, Pageable pageable);
}
```
## Service Layer with Transactions
```java
@Service
public class MarketService {
private final MarketRepository repo;
public MarketService(MarketRepository repo) {
this.repo = repo;
}
@Transactional
public Market create(CreateMarketRequest request) {
MarketEntity entity = MarketEntity.from(request);
MarketEntity saved = repo.save(entity);
return Market.from(saved);
}
}
```
## DTOs and Validation
```java
public record CreateMarketRequest(
@NotBlank @Size(max = 200) String name,
@NotBlank @Size(max = 2000) String description,
@NotNull @FutureOrPresent Instant endDate,
@NotEmpty List<@NotBlank String> categories) {}
public record MarketResponse(Long id, String name, MarketStatus status) {
static MarketResponse from(Market market) {
return new MarketResponse(market.id(), market.name(), market.status());
}
}
```
## Exception Handling
```java
@ControllerAdvice
class GlobalExceptionHandler {
@ExceptionHandler(MethodArgumentNotValidException.class)
ResponseEntity<ApiError> handleValidation(MethodArgumentNotValidException ex) {
String message = ex.getBindingResult().getFieldErrors().stream()
.map(e -> e.getField() + ": " + e.getDefaultMessage())
.collect(Collectors.joining(", "));
return ResponseEntity.badRequest().body(ApiError.validation(message));
}
@ExceptionHandler(AccessDeniedException.class)
ResponseEntity<ApiError> handleAccessDenied() {
return ResponseEntity.status(HttpStatus.FORBIDDEN).body(ApiError.of("Forbidden"));
}
@ExceptionHandler(Exception.class)
ResponseEntity<ApiError> handleGeneric(Exception ex) {
// Log unexpected errors with stack traces
return ResponseEntity.status(HttpStatus.INTERNAL_SERVER_ERROR)
.body(ApiError.of("Internal server error"));
}
}
```
## Caching
Requires `@EnableCaching` on a configuration class.
```java
@Service
public class MarketCacheService {
private final MarketRepository repo;
public MarketCacheService(MarketRepository repo) {
this.repo = repo;
}
@Cacheable(value = "market", key = "#id")
public Market getById(Long id) {
return repo.findById(id)
.map(Market::from)
.orElseThrow(() -> new EntityNotFoundException("Market not found"));
}
@CacheEvict(value = "market", key = "#id")
public void evict(Long id) {}
}
```
## Async Processing
Requires `@EnableAsync` on a configuration class.
```java
@Service
public class NotificationService {
@Async
public CompletableFuture<Void> sendAsync(Notification notification) {
// send email/SMS
return CompletableFuture.completedFuture(null);
}
}
```
## Logging (SLF4J)
```java
@Service
public class ReportService {
private static final Logger log = LoggerFactory.getLogger(ReportService.class);
public Report generate(Long marketId) {
log.info("generate_report marketId={}", marketId);
try {
// logic
} catch (Exception ex) {
log.error("generate_report_failed marketId={}", marketId, ex);
throw ex;
}
return new Report();
}
}
```
## Middleware / Filters
```java
@Component
public class RequestLoggingFilter extends OncePerRequestFilter {
private static final Logger log = LoggerFactory.getLogger(RequestLoggingFilter.class);
@Override
protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response,
FilterChain filterChain) throws ServletException, IOException {
long start = System.currentTimeMillis();
try {
filterChain.doFilter(request, response);
} finally {
long duration = System.currentTimeMillis() - start;
log.info("req method={} uri={} status={} durationMs={}",
request.getMethod(), request.getRequestURI(), response.getStatus(), duration);
}
}
}
```
## Pagination and Sorting
```java
PageRequest page = PageRequest.of(pageNumber, pageSize, Sort.by("createdAt").descending());
Page<Market> results = marketService.list(page);
```
## Error-Resilient External Calls
```java
public <T> T withRetry(Supplier<T> supplier, int maxRetries) {
int attempts = 0;
while (true) {
try {
return supplier.get();
} catch (Exception ex) {
attempts++;
if (attempts >= maxRetries) {
throw ex;
}
try {
Thread.sleep((long) Math.pow(2, attempts) * 100L);
} catch (InterruptedException ie) {
Thread.currentThread().interrupt();
throw ex;
}
}
}
}
```
## Rate Limiting (Filter + Bucket4j)
**Security Note**: The `X-Forwarded-For` header is untrusted by default because clients can spoof it.
Only use forwarded headers when:
1. Your app is behind a trusted reverse proxy (nginx, AWS ALB, etc.)
2. You have registered `ForwardedHeaderFilter` as a bean
3. You have configured `server.forward-headers-strategy=NATIVE` or `FRAMEWORK` in application properties
4. Your proxy is configured to overwrite (not append to) the `X-Forwarded-For` header
When `ForwardedHeaderFilter` is properly configured, `request.getRemoteAddr()` will automatically
return the correct client IP from the forwarded headers. Without this configuration, use
`request.getRemoteAddr()` directly—it returns the immediate connection IP, which is the only
trustworthy value.
```java
@Component
public class RateLimitFilter extends OncePerRequestFilter {
private final Map<String, Bucket> buckets = new ConcurrentHashMap<>();
/*
* SECURITY: This filter uses request.getRemoteAddr() to identify clients for rate limiting.
*
* If your application is behind a reverse proxy (nginx, AWS ALB, etc.), you MUST configure
* Spring to handle forwarded headers properly for accurate client IP detection:
*
* 1. Set server.forward-headers-strategy=NATIVE (for cloud platforms) or FRAMEWORK in
* application.properties/yaml
* 2. If using FRAMEWORK strategy, register ForwardedHeaderFilter:
*
* @Bean
* ForwardedHeaderFilter forwardedHeaderFilter() {
* return new ForwardedHeaderFilter();
* }
*
* 3. Ensure your proxy overwrites (not appends) the X-Forwarded-For header to prevent spoofing
* 4. Configure server.tomcat.remoteip.trusted-proxies or equivalent for your container
*
* Without this configuration, request.getRemoteAddr() returns the proxy IP, not the client IP.
* Do NOT read X-Forwarded-For directly—it is trivially spoofable without trusted proxy handling.
*/
@Override
protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response,
FilterChain filterChain) throws ServletException, IOException {
// Use getRemoteAddr() which returns the correct client IP when ForwardedHeaderFilter
// is configured, or the direct connection IP otherwise. Never trust X-Forwarded-For
// headers directly without proper proxy configuration.
String clientIp = request.getRemoteAddr();
Bucket bucket = buckets.computeIfAbsent(clientIp,
k -> Bucket.builder()
.addLimit(Bandwidth.classic(100, Refill.greedy(100, Duration.ofMinutes(1))))
.build());
if (bucket.tryConsume(1)) {
filterChain.doFilter(request, response);
} else {
response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
}
}
}
```
## Background Jobs
Use Springs `@Scheduled` or integrate with queues (e.g., Kafka, SQS, RabbitMQ). Keep handlers idempotent and observable.
## Observability
- Structured logging (JSON) via Logback encoder
- Metrics: Micrometer + Prometheus/OTel
- Tracing: Micrometer Tracing with OpenTelemetry or Brave backend
## Production Defaults
- Prefer constructor injection, avoid field injection
- Enable `spring.mvc.problemdetails.enabled=true` for RFC 7807 errors (Spring Boot 3+)
- Configure HikariCP pool sizes for workload, set timeouts
- Use `@Transactional(readOnly = true)` for queries
- Enforce null-safety via `@NonNull` and `Optional` where appropriate
**Remember**: Keep controllers thin, services focused, repositories simple, and errors handled centrally. Optimize for maintainability and testability.

View File

@@ -0,0 +1,271 @@
---
name: springboot-security
description: Spring Security best practices for authn/authz, validation, CSRF, secrets, headers, rate limiting, and dependency security in Java Spring Boot services.
---
# Spring Boot Security Review
Use when adding auth, handling input, creating endpoints, or dealing with secrets.
## When to Activate
- Adding authentication (JWT, OAuth2, session-based)
- Implementing authorization (@PreAuthorize, role-based access)
- Validating user input (Bean Validation, custom validators)
- Configuring CORS, CSRF, or security headers
- Managing secrets (Vault, environment variables)
- Adding rate limiting or brute-force protection
- Scanning dependencies for CVEs
## Authentication
- Prefer stateless JWT or opaque tokens with revocation list
- Use `httpOnly`, `Secure`, `SameSite=Strict` cookies for sessions
- Validate tokens with `OncePerRequestFilter` or resource server
```java
@Component
public class JwtAuthFilter extends OncePerRequestFilter {
private final JwtService jwtService;
public JwtAuthFilter(JwtService jwtService) {
this.jwtService = jwtService;
}
@Override
protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response,
FilterChain chain) throws ServletException, IOException {
String header = request.getHeader(HttpHeaders.AUTHORIZATION);
if (header != null && header.startsWith("Bearer ")) {
String token = header.substring(7);
Authentication auth = jwtService.authenticate(token);
SecurityContextHolder.getContext().setAuthentication(auth);
}
chain.doFilter(request, response);
}
}
```
## Authorization
- Enable method security: `@EnableMethodSecurity`
- Use `@PreAuthorize("hasRole('ADMIN')")` or `@PreAuthorize("@authz.canEdit(#id)")`
- Deny by default; expose only required scopes
```java
@RestController
@RequestMapping("/api/admin")
public class AdminController {
@PreAuthorize("hasRole('ADMIN')")
@GetMapping("/users")
public List<UserDto> listUsers() {
return userService.findAll();
}
@PreAuthorize("@authz.isOwner(#id, authentication)")
@DeleteMapping("/users/{id}")
public ResponseEntity<Void> deleteUser(@PathVariable Long id) {
userService.delete(id);
return ResponseEntity.noContent().build();
}
}
```
## Input Validation
- Use Bean Validation with `@Valid` on controllers
- Apply constraints on DTOs: `@NotBlank`, `@Email`, `@Size`, custom validators
- Sanitize any HTML with a whitelist before rendering
```java
// BAD: No validation
@PostMapping("/users")
public User createUser(@RequestBody UserDto dto) {
return userService.create(dto);
}
// GOOD: Validated DTO
public record CreateUserDto(
@NotBlank @Size(max = 100) String name,
@NotBlank @Email String email,
@NotNull @Min(0) @Max(150) Integer age
) {}
@PostMapping("/users")
public ResponseEntity<UserDto> createUser(@Valid @RequestBody CreateUserDto dto) {
return ResponseEntity.status(HttpStatus.CREATED)
.body(userService.create(dto));
}
```
## SQL Injection Prevention
- Use Spring Data repositories or parameterized queries
- For native queries, use `:param` bindings; never concatenate strings
```java
// BAD: String concatenation in native query
@Query(value = "SELECT * FROM users WHERE name = '" + name + "'", nativeQuery = true)
// GOOD: Parameterized native query
@Query(value = "SELECT * FROM users WHERE name = :name", nativeQuery = true)
List<User> findByName(@Param("name") String name);
// GOOD: Spring Data derived query (auto-parameterized)
List<User> findByEmailAndActiveTrue(String email);
```
## Password Encoding
- Always hash passwords with BCrypt or Argon2 — never store plaintext
- Use `PasswordEncoder` bean, not manual hashing
```java
@Bean
public PasswordEncoder passwordEncoder() {
return new BCryptPasswordEncoder(12); // cost factor 12
}
// In service
public User register(CreateUserDto dto) {
String hashedPassword = passwordEncoder.encode(dto.password());
return userRepository.save(new User(dto.email(), hashedPassword));
}
```
## CSRF Protection
- For browser session apps, keep CSRF enabled; include token in forms/headers
- For pure APIs with Bearer tokens, disable CSRF and rely on stateless auth
```java
http
.csrf(csrf -> csrf.disable())
.sessionManagement(sm -> sm.sessionCreationPolicy(SessionCreationPolicy.STATELESS));
```
## Secrets Management
- No secrets in source; load from env or vault
- Keep `application.yml` free of credentials; use placeholders
- Rotate tokens and DB credentials regularly
```yaml
# BAD: Hardcoded in application.yml
spring:
datasource:
password: mySecretPassword123
# GOOD: Environment variable placeholder
spring:
datasource:
password: ${DB_PASSWORD}
# GOOD: Spring Cloud Vault integration
spring:
cloud:
vault:
uri: https://vault.example.com
token: ${VAULT_TOKEN}
```
## Security Headers
```java
http
.headers(headers -> headers
.contentSecurityPolicy(csp -> csp
.policyDirectives("default-src 'self'"))
.frameOptions(HeadersConfigurer.FrameOptionsConfig::sameOrigin)
.xssProtection(Customizer.withDefaults())
.referrerPolicy(rp -> rp.policy(ReferrerPolicyHeaderWriter.ReferrerPolicy.NO_REFERRER)));
```
## CORS Configuration
- Configure CORS at the security filter level, not per-controller
- Restrict allowed origins — never use `*` in production
```java
@Bean
public CorsConfigurationSource corsConfigurationSource() {
CorsConfiguration config = new CorsConfiguration();
config.setAllowedOrigins(List.of("https://app.example.com"));
config.setAllowedMethods(List.of("GET", "POST", "PUT", "DELETE"));
config.setAllowedHeaders(List.of("Authorization", "Content-Type"));
config.setAllowCredentials(true);
config.setMaxAge(3600L);
UrlBasedCorsConfigurationSource source = new UrlBasedCorsConfigurationSource();
source.registerCorsConfiguration("/api/**", config);
return source;
}
// In SecurityFilterChain:
http.cors(cors -> cors.configurationSource(corsConfigurationSource()));
```
## Rate Limiting
- Apply Bucket4j or gateway-level limits on expensive endpoints
- Log and alert on bursts; return 429 with retry hints
```java
// Using Bucket4j for per-endpoint rate limiting
@Component
public class RateLimitFilter extends OncePerRequestFilter {
private final Map<String, Bucket> buckets = new ConcurrentHashMap<>();
private Bucket createBucket() {
return Bucket.builder()
.addLimit(Bandwidth.classic(100, Refill.intervally(100, Duration.ofMinutes(1))))
.build();
}
@Override
protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response,
FilterChain chain) throws ServletException, IOException {
String clientIp = request.getRemoteAddr();
Bucket bucket = buckets.computeIfAbsent(clientIp, k -> createBucket());
if (bucket.tryConsume(1)) {
chain.doFilter(request, response);
} else {
response.setStatus(HttpStatus.TOO_MANY_REQUESTS.value());
response.getWriter().write("{\"error\": \"Rate limit exceeded\"}");
}
}
}
```
## Dependency Security
- Run OWASP Dependency Check / Snyk in CI
- Keep Spring Boot and Spring Security on supported versions
- Fail builds on known CVEs
## Logging and PII
- Never log secrets, tokens, passwords, or full PAN data
- Redact sensitive fields; use structured JSON logging
## File Uploads
- Validate size, content type, and extension
- Store outside web root; scan if required
## Checklist Before Release
- [ ] Auth tokens validated and expired correctly
- [ ] Authorization guards on every sensitive path
- [ ] All inputs validated and sanitized
- [ ] No string-concatenated SQL
- [ ] CSRF posture correct for app type
- [ ] Secrets externalized; none committed
- [ ] Security headers configured
- [ ] Rate limiting on APIs
- [ ] Dependencies scanned and up to date
- [ ] Logs free of sensitive data
**Remember**: Deny by default, validate inputs, least privilege, and secure-by-configuration first.

View File

@@ -0,0 +1,157 @@
---
name: springboot-tdd
description: Test-driven development for Spring Boot using JUnit 5, Mockito, MockMvc, Testcontainers, and JaCoCo. Use when adding features, fixing bugs, or refactoring.
---
# Spring Boot TDD Workflow
TDD guidance for Spring Boot services with 80%+ coverage (unit + integration).
## When to Use
- New features or endpoints
- Bug fixes or refactors
- Adding data access logic or security rules
## Workflow
1) Write tests first (they should fail)
2) Implement minimal code to pass
3) Refactor with tests green
4) Enforce coverage (JaCoCo)
## Unit Tests (JUnit 5 + Mockito)
```java
@ExtendWith(MockitoExtension.class)
class MarketServiceTest {
@Mock MarketRepository repo;
@InjectMocks MarketService service;
@Test
void createsMarket() {
CreateMarketRequest req = new CreateMarketRequest("name", "desc", Instant.now(), List.of("cat"));
when(repo.save(any())).thenAnswer(inv -> inv.getArgument(0));
Market result = service.create(req);
assertThat(result.name()).isEqualTo("name");
verify(repo).save(any());
}
}
```
Patterns:
- Arrange-Act-Assert
- Avoid partial mocks; prefer explicit stubbing
- Use `@ParameterizedTest` for variants
## Web Layer Tests (MockMvc)
```java
@WebMvcTest(MarketController.class)
class MarketControllerTest {
@Autowired MockMvc mockMvc;
@MockBean MarketService marketService;
@Test
void returnsMarkets() throws Exception {
when(marketService.list(any())).thenReturn(Page.empty());
mockMvc.perform(get("/api/markets"))
.andExpect(status().isOk())
.andExpect(jsonPath("$.content").isArray());
}
}
```
## Integration Tests (SpringBootTest)
```java
@SpringBootTest
@AutoConfigureMockMvc
@ActiveProfiles("test")
class MarketIntegrationTest {
@Autowired MockMvc mockMvc;
@Test
void createsMarket() throws Exception {
mockMvc.perform(post("/api/markets")
.contentType(MediaType.APPLICATION_JSON)
.content("""
{"name":"Test","description":"Desc","endDate":"2030-01-01T00:00:00Z","categories":["general"]}
"""))
.andExpect(status().isCreated());
}
}
```
## Persistence Tests (DataJpaTest)
```java
@DataJpaTest
@AutoConfigureTestDatabase(replace = AutoConfigureTestDatabase.Replace.NONE)
@Import(TestContainersConfig.class)
class MarketRepositoryTest {
@Autowired MarketRepository repo;
@Test
void savesAndFinds() {
MarketEntity entity = new MarketEntity();
entity.setName("Test");
repo.save(entity);
Optional<MarketEntity> found = repo.findByName("Test");
assertThat(found).isPresent();
}
}
```
## Testcontainers
- Use reusable containers for Postgres/Redis to mirror production
- Wire via `@DynamicPropertySource` to inject JDBC URLs into Spring context
## Coverage (JaCoCo)
Maven snippet:
```xml
<plugin>
<groupId>org.jacoco</groupId>
<artifactId>jacoco-maven-plugin</artifactId>
<version>0.8.14</version>
<executions>
<execution>
<goals><goal>prepare-agent</goal></goals>
</execution>
<execution>
<id>report</id>
<phase>verify</phase>
<goals><goal>report</goal></goals>
</execution>
</executions>
</plugin>
```
## Assertions
- Prefer AssertJ (`assertThat`) for readability
- For JSON responses, use `jsonPath`
- For exceptions: `assertThatThrownBy(...)`
## Test Data Builders
```java
class MarketBuilder {
private String name = "Test";
MarketBuilder withName(String name) { this.name = name; return this; }
Market build() { return new Market(null, name, MarketStatus.ACTIVE); }
}
```
## CI Commands
- Maven: `mvn -T 4 test` or `mvn verify`
- Gradle: `./gradlew test jacocoTestReport`
**Remember**: Keep tests fast, isolated, and deterministic. Test behavior, not implementation details.

View File

@@ -0,0 +1,230 @@
---
name: springboot-verification
description: "Verification loop for Spring Boot projects: build, static analysis, tests with coverage, security scans, and diff review before release or PR."
---
# Spring Boot Verification Loop
Run before PRs, after major changes, and pre-deploy.
## When to Activate
- Before opening a pull request for a Spring Boot service
- After major refactoring or dependency upgrades
- Pre-deployment verification for staging or production
- Running full build → lint → test → security scan pipeline
- Validating test coverage meets thresholds
## Phase 1: Build
```bash
mvn -T 4 clean verify -DskipTests
# or
./gradlew clean assemble -x test
```
If build fails, stop and fix.
## Phase 2: Static Analysis
Maven (common plugins):
```bash
mvn -T 4 spotbugs:check pmd:check checkstyle:check
```
Gradle (if configured):
```bash
./gradlew checkstyleMain pmdMain spotbugsMain
```
## Phase 3: Tests + Coverage
```bash
mvn -T 4 test
mvn jacoco:report # verify 80%+ coverage
# or
./gradlew test jacocoTestReport
```
Report:
- Total tests, passed/failed
- Coverage % (lines/branches)
### Unit Tests
Test service logic in isolation with mocked dependencies:
```java
@ExtendWith(MockitoExtension.class)
class UserServiceTest {
@Mock private UserRepository userRepository;
@InjectMocks private UserService userService;
@Test
void createUser_validInput_returnsUser() {
var dto = new CreateUserDto("Alice", "alice@example.com");
var expected = new User(1L, "Alice", "alice@example.com");
when(userRepository.save(any(User.class))).thenReturn(expected);
var result = userService.create(dto);
assertThat(result.name()).isEqualTo("Alice");
verify(userRepository).save(any(User.class));
}
@Test
void createUser_duplicateEmail_throwsException() {
var dto = new CreateUserDto("Alice", "existing@example.com");
when(userRepository.existsByEmail(dto.email())).thenReturn(true);
assertThatThrownBy(() -> userService.create(dto))
.isInstanceOf(DuplicateEmailException.class);
}
}
```
### Integration Tests with Testcontainers
Test against a real database instead of H2:
```java
@SpringBootTest
@Testcontainers
class UserRepositoryIntegrationTest {
@Container
static PostgreSQLContainer<?> postgres = new PostgreSQLContainer<>("postgres:16-alpine")
.withDatabaseName("testdb");
@DynamicPropertySource
static void configureProperties(DynamicPropertyRegistry registry) {
registry.add("spring.datasource.url", postgres::getJdbcUrl);
registry.add("spring.datasource.username", postgres::getUsername);
registry.add("spring.datasource.password", postgres::getPassword);
}
@Autowired private UserRepository userRepository;
@Test
void findByEmail_existingUser_returnsUser() {
userRepository.save(new User("Alice", "alice@example.com"));
var found = userRepository.findByEmail("alice@example.com");
assertThat(found).isPresent();
assertThat(found.get().getName()).isEqualTo("Alice");
}
}
```
### API Tests with MockMvc
Test controller layer with full Spring context:
```java
@WebMvcTest(UserController.class)
class UserControllerTest {
@Autowired private MockMvc mockMvc;
@MockBean private UserService userService;
@Test
void createUser_validInput_returns201() throws Exception {
var user = new UserDto(1L, "Alice", "alice@example.com");
when(userService.create(any())).thenReturn(user);
mockMvc.perform(post("/api/users")
.contentType(MediaType.APPLICATION_JSON)
.content("""
{"name": "Alice", "email": "alice@example.com"}
"""))
.andExpect(status().isCreated())
.andExpect(jsonPath("$.name").value("Alice"));
}
@Test
void createUser_invalidEmail_returns400() throws Exception {
mockMvc.perform(post("/api/users")
.contentType(MediaType.APPLICATION_JSON)
.content("""
{"name": "Alice", "email": "not-an-email"}
"""))
.andExpect(status().isBadRequest());
}
}
```
## Phase 4: Security Scan
```bash
# Dependency CVEs
mvn org.owasp:dependency-check-maven:check
# or
./gradlew dependencyCheckAnalyze
# Secrets in source
grep -rn "password\s*=\s*\"" src/ --include="*.java" --include="*.yml" --include="*.properties"
grep -rn "sk-\|api_key\|secret" src/ --include="*.java" --include="*.yml"
# Secrets (git history)
git secrets --scan # if configured
```
### Common Security Findings
```
# Check for System.out.println (use logger instead)
grep -rn "System\.out\.print" src/main/ --include="*.java"
# Check for raw exception messages in responses
grep -rn "e\.getMessage()" src/main/ --include="*.java"
# Check for wildcard CORS
grep -rn "allowedOrigins.*\*" src/main/ --include="*.java"
```
## Phase 5: Lint/Format (optional gate)
```bash
mvn spotless:apply # if using Spotless plugin
./gradlew spotlessApply
```
## Phase 6: Diff Review
```bash
git diff --stat
git diff
```
Checklist:
- No debugging logs left (`System.out`, `log.debug` without guards)
- Meaningful errors and HTTP statuses
- Transactions and validation present where needed
- Config changes documented
## Output Template
```
VERIFICATION REPORT
===================
Build: [PASS/FAIL]
Static: [PASS/FAIL] (spotbugs/pmd/checkstyle)
Tests: [PASS/FAIL] (X/Y passed, Z% coverage)
Security: [PASS/FAIL] (CVE findings: N)
Diff: [X files changed]
Overall: [READY / NOT READY]
Issues to Fix:
1. ...
2. ...
```
## Continuous Mode
- Re-run phases on significant changes or every 3060 minutes in long sessions
- Keep a short loop: `mvn -T 4 test` + spotbugs for quick feedback
**Remember**: Fast feedback beats late surprises. Keep the gate strict—treat warnings as defects in production systems.

View File

@@ -0,0 +1,102 @@
---
name: strategic-compact
description: Suggests manual context compaction at logical intervals to preserve context through task phases rather than arbitrary auto-compaction.
---
# Strategic Compact Skill
Suggests manual `/compact` at strategic points in your workflow rather than relying on arbitrary auto-compaction.
## When to Activate
- Running long sessions that approach context limits (200K+ tokens)
- Working on multi-phase tasks (research → plan → implement → test)
- Switching between unrelated tasks within the same session
- After completing a major milestone and starting new work
- When responses slow down or become less coherent (context pressure)
## Why Strategic Compaction?
Auto-compaction triggers at arbitrary points:
- Often mid-task, losing important context
- No awareness of logical task boundaries
- Can interrupt complex multi-step operations
Strategic compaction at logical boundaries:
- **After exploration, before execution** — Compact research context, keep implementation plan
- **After completing a milestone** — Fresh start for next phase
- **Before major context shifts** — Clear exploration context before different task
## How It Works
The `suggest-compact.js` script runs on PreToolUse (Edit/Write) and:
1. **Tracks tool calls** — Counts tool invocations in session
2. **Threshold detection** — Suggests at configurable threshold (default: 50 calls)
3. **Periodic reminders** — Reminds every 25 calls after threshold
## Hook Setup
Add to your `~/.claude/settings.json`:
```json
{
"hooks": {
"PreToolUse": [
{
"matcher": "Edit",
"hooks": [{ "type": "command", "command": "node ~/.claude/skills/strategic-compact/suggest-compact.js" }]
},
{
"matcher": "Write",
"hooks": [{ "type": "command", "command": "node ~/.claude/skills/strategic-compact/suggest-compact.js" }]
}
]
}
}
```
## Configuration
Environment variables:
- `COMPACT_THRESHOLD` — Tool calls before first suggestion (default: 50)
## Compaction Decision Guide
Use this table to decide when to compact:
| Phase Transition | Compact? | Why |
|-----------------|----------|-----|
| Research → Planning | Yes | Research context is bulky; plan is the distilled output |
| Planning → Implementation | Yes | Plan is in TodoWrite or a file; free up context for code |
| Implementation → Testing | Maybe | Keep if tests reference recent code; compact if switching focus |
| Debugging → Next feature | Yes | Debug traces pollute context for unrelated work |
| Mid-implementation | No | Losing variable names, file paths, and partial state is costly |
| After a failed approach | Yes | Clear the dead-end reasoning before trying a new approach |
## What Survives Compaction
Understanding what persists helps you compact with confidence:
| Persists | Lost |
|----------|------|
| CLAUDE.md instructions | Intermediate reasoning and analysis |
| TodoWrite task list | File contents you previously read |
| Memory files (`~/.claude/memory/`) | Multi-step conversation context |
| Git state (commits, branches) | Tool call history and counts |
| Files on disk | Nuanced user preferences stated verbally |
## Best Practices
1. **Compact after planning** — Once plan is finalized in TodoWrite, compact to start fresh
2. **Compact after debugging** — Clear error-resolution context before continuing
3. **Don't compact mid-implementation** — Preserve context for related changes
4. **Read the suggestion** — The hook tells you *when*, you decide *if*
5. **Write before compacting** — Save important context to files or memory before compacting
6. **Use `/compact` with a summary** — Add a custom message: `/compact Focus on implementing auth middleware next`
## Related
- [The Longform Guide](https://x.com/affaanmustafa/status/2014040193557471352) — Token optimization section
- Memory persistence hooks — For state that survives compaction
- `continuous-learning` skill — Extracts patterns before session ends

View File

@@ -0,0 +1,54 @@
#!/bin/bash
# Strategic Compact Suggester
# Runs on PreToolUse or periodically to suggest manual compaction at logical intervals
#
# Why manual over auto-compact:
# - Auto-compact happens at arbitrary points, often mid-task
# - Strategic compacting preserves context through logical phases
# - Compact after exploration, before execution
# - Compact after completing a milestone, before starting next
#
# Hook config (in ~/.claude/settings.json):
# {
# "hooks": {
# "PreToolUse": [{
# "matcher": "Edit|Write",
# "hooks": [{
# "type": "command",
# "command": "~/.claude/skills/strategic-compact/suggest-compact.sh"
# }]
# }]
# }
# }
#
# Criteria for suggesting compact:
# - Session has been running for extended period
# - Large number of tool calls made
# - Transitioning from research/exploration to implementation
# - Plan has been finalized
# Track tool call count (increment in a temp file)
# Use CLAUDE_SESSION_ID for session-specific counter (not $$ which changes per invocation)
SESSION_ID="${CLAUDE_SESSION_ID:-${PPID:-default}}"
COUNTER_FILE="/tmp/claude-tool-count-${SESSION_ID}"
THRESHOLD=${COMPACT_THRESHOLD:-50}
# Initialize or increment counter
if [ -f "$COUNTER_FILE" ]; then
count=$(cat "$COUNTER_FILE")
count=$((count + 1))
echo "$count" > "$COUNTER_FILE"
else
echo "1" > "$COUNTER_FILE"
count=1
fi
# Suggest compact after threshold tool calls
if [ "$count" -eq "$THRESHOLD" ]; then
echo "[StrategicCompact] $THRESHOLD tool calls reached - consider /compact if transitioning phases" >&2
fi
# Suggest at regular intervals after threshold
if [ "$count" -gt "$THRESHOLD" ] && [ $((count % 25)) -eq 0 ]; then
echo "[StrategicCompact] $count tool calls - good checkpoint for /compact if context is stale" >&2
fi

View File

@@ -0,0 +1,142 @@
---
name: swift-actor-persistence
description: Thread-safe data persistence in Swift using actors — in-memory cache with file-backed storage, eliminating data races by design.
---
# Swift Actors for Thread-Safe Persistence
Patterns for building thread-safe data persistence layers using Swift actors. Combines in-memory caching with file-backed storage, leveraging the actor model to eliminate data races at compile time.
## When to Activate
- Building a data persistence layer in Swift 5.5+
- Need thread-safe access to shared mutable state
- Want to eliminate manual synchronization (locks, DispatchQueues)
- Building offline-first apps with local storage
## Core Pattern
### Actor-Based Repository
The actor model guarantees serialized access — no data races, enforced by the compiler.
```swift
public actor LocalRepository<T: Codable & Identifiable> where T.ID == String {
private var cache: [String: T] = [:]
private let fileURL: URL
public init(directory: URL = .documentsDirectory, filename: String = "data.json") {
self.fileURL = directory.appendingPathComponent(filename)
// Synchronous load during init (actor isolation not yet active)
self.cache = Self.loadSynchronously(from: fileURL)
}
// MARK: - Public API
public func save(_ item: T) throws {
cache[item.id] = item
try persistToFile()
}
public func delete(_ id: String) throws {
cache[id] = nil
try persistToFile()
}
public func find(by id: String) -> T? {
cache[id]
}
public func loadAll() -> [T] {
Array(cache.values)
}
// MARK: - Private
private func persistToFile() throws {
let data = try JSONEncoder().encode(Array(cache.values))
try data.write(to: fileURL, options: .atomic)
}
private static func loadSynchronously(from url: URL) -> [String: T] {
guard let data = try? Data(contentsOf: url),
let items = try? JSONDecoder().decode([T].self, from: data) else {
return [:]
}
return Dictionary(uniqueKeysWithValues: items.map { ($0.id, $0) })
}
}
```
### Usage
All calls are automatically async due to actor isolation:
```swift
let repository = LocalRepository<Question>()
// Read fast O(1) lookup from in-memory cache
let question = await repository.find(by: "q-001")
let allQuestions = await repository.loadAll()
// Write updates cache and persists to file atomically
try await repository.save(newQuestion)
try await repository.delete("q-001")
```
### Combining with @Observable ViewModel
```swift
@Observable
final class QuestionListViewModel {
private(set) var questions: [Question] = []
private let repository: LocalRepository<Question>
init(repository: LocalRepository<Question> = LocalRepository()) {
self.repository = repository
}
func load() async {
questions = await repository.loadAll()
}
func add(_ question: Question) async throws {
try await repository.save(question)
questions = await repository.loadAll()
}
}
```
## Key Design Decisions
| Decision | Rationale |
|----------|-----------|
| Actor (not class + lock) | Compiler-enforced thread safety, no manual synchronization |
| In-memory cache + file persistence | Fast reads from cache, durable writes to disk |
| Synchronous init loading | Avoids async initialization complexity |
| Dictionary keyed by ID | O(1) lookups by identifier |
| Generic over `Codable & Identifiable` | Reusable across any model type |
| Atomic file writes (`.atomic`) | Prevents partial writes on crash |
## Best Practices
- **Use `Sendable` types** for all data crossing actor boundaries
- **Keep the actor's public API minimal** — only expose domain operations, not persistence details
- **Use `.atomic` writes** to prevent data corruption if the app crashes mid-write
- **Load synchronously in `init`** — async initializers add complexity with minimal benefit for local files
- **Combine with `@Observable`** ViewModels for reactive UI updates
## Anti-Patterns to Avoid
- Using `DispatchQueue` or `NSLock` instead of actors for new Swift concurrency code
- Exposing the internal cache dictionary to external callers
- Making the file URL configurable without validation
- Forgetting that all actor method calls are `await` — callers must handle async context
- Using `nonisolated` to bypass actor isolation (defeats the purpose)
## When to Use
- Local data storage in iOS/macOS apps (user data, settings, cached content)
- Offline-first architectures that sync to a server later
- Any shared mutable state that multiple parts of the app access concurrently
- Replacing legacy `DispatchQueue`-based thread safety with modern Swift concurrency

View File

@@ -0,0 +1,189 @@
---
name: swift-protocol-di-testing
description: Protocol-based dependency injection for testable Swift code — mock file system, network, and external APIs using focused protocols and Swift Testing.
---
# Swift Protocol-Based Dependency Injection for Testing
Patterns for making Swift code testable by abstracting external dependencies (file system, network, iCloud) behind small, focused protocols. Enables deterministic tests without I/O.
## When to Activate
- Writing Swift code that accesses file system, network, or external APIs
- Need to test error handling paths without triggering real failures
- Building modules that work across environments (app, test, SwiftUI preview)
- Designing testable architecture with Swift concurrency (actors, Sendable)
## Core Pattern
### 1. Define Small, Focused Protocols
Each protocol handles exactly one external concern.
```swift
// File system access
public protocol FileSystemProviding: Sendable {
func containerURL(for purpose: Purpose) -> URL?
}
// File read/write operations
public protocol FileAccessorProviding: Sendable {
func read(from url: URL) throws -> Data
func write(_ data: Data, to url: URL) throws
func fileExists(at url: URL) -> Bool
}
// Bookmark storage (e.g., for sandboxed apps)
public protocol BookmarkStorageProviding: Sendable {
func saveBookmark(_ data: Data, for key: String) throws
func loadBookmark(for key: String) throws -> Data?
}
```
### 2. Create Default (Production) Implementations
```swift
public struct DefaultFileSystemProvider: FileSystemProviding {
public init() {}
public func containerURL(for purpose: Purpose) -> URL? {
FileManager.default.url(forUbiquityContainerIdentifier: nil)
}
}
public struct DefaultFileAccessor: FileAccessorProviding {
public init() {}
public func read(from url: URL) throws -> Data {
try Data(contentsOf: url)
}
public func write(_ data: Data, to url: URL) throws {
try data.write(to: url, options: .atomic)
}
public func fileExists(at url: URL) -> Bool {
FileManager.default.fileExists(atPath: url.path)
}
}
```
### 3. Create Mock Implementations for Testing
```swift
public final class MockFileAccessor: FileAccessorProviding, @unchecked Sendable {
public var files: [URL: Data] = [:]
public var readError: Error?
public var writeError: Error?
public init() {}
public func read(from url: URL) throws -> Data {
if let error = readError { throw error }
guard let data = files[url] else {
throw CocoaError(.fileReadNoSuchFile)
}
return data
}
public func write(_ data: Data, to url: URL) throws {
if let error = writeError { throw error }
files[url] = data
}
public func fileExists(at url: URL) -> Bool {
files[url] != nil
}
}
```
### 4. Inject Dependencies with Default Parameters
Production code uses defaults; tests inject mocks.
```swift
public actor SyncManager {
private let fileSystem: FileSystemProviding
private let fileAccessor: FileAccessorProviding
public init(
fileSystem: FileSystemProviding = DefaultFileSystemProvider(),
fileAccessor: FileAccessorProviding = DefaultFileAccessor()
) {
self.fileSystem = fileSystem
self.fileAccessor = fileAccessor
}
public func sync() async throws {
guard let containerURL = fileSystem.containerURL(for: .sync) else {
throw SyncError.containerNotAvailable
}
let data = try fileAccessor.read(
from: containerURL.appendingPathComponent("data.json")
)
// Process data...
}
}
```
### 5. Write Tests with Swift Testing
```swift
import Testing
@Test("Sync manager handles missing container")
func testMissingContainer() async {
let mockFileSystem = MockFileSystemProvider(containerURL: nil)
let manager = SyncManager(fileSystem: mockFileSystem)
await #expect(throws: SyncError.containerNotAvailable) {
try await manager.sync()
}
}
@Test("Sync manager reads data correctly")
func testReadData() async throws {
let mockFileAccessor = MockFileAccessor()
mockFileAccessor.files[testURL] = testData
let manager = SyncManager(fileAccessor: mockFileAccessor)
let result = try await manager.loadData()
#expect(result == expectedData)
}
@Test("Sync manager handles read errors gracefully")
func testReadError() async {
let mockFileAccessor = MockFileAccessor()
mockFileAccessor.readError = CocoaError(.fileReadCorruptFile)
let manager = SyncManager(fileAccessor: mockFileAccessor)
await #expect(throws: SyncError.self) {
try await manager.sync()
}
}
```
## Best Practices
- **Single Responsibility**: Each protocol should handle one concern — don't create "god protocols" with many methods
- **Sendable conformance**: Required when protocols are used across actor boundaries
- **Default parameters**: Let production code use real implementations by default; only tests need to specify mocks
- **Error simulation**: Design mocks with configurable error properties for testing failure paths
- **Only mock boundaries**: Mock external dependencies (file system, network, APIs), not internal types
## Anti-Patterns to Avoid
- Creating a single large protocol that covers all external access
- Mocking internal types that have no external dependencies
- Using `#if DEBUG` conditionals instead of proper dependency injection
- Forgetting `Sendable` conformance when used with actors
- Over-engineering: if a type has no external dependencies, it doesn't need a protocol
## When to Use
- Any Swift code that touches file system, network, or external APIs
- Testing error handling paths that are hard to trigger in real environments
- Building modules that need to work in app, test, and SwiftUI preview contexts
- Apps using Swift concurrency (actors, structured concurrency) that need testable architecture