Robin Singh
|
e4cb5a14b3
|
feat(skill): add data-scraper-agent — AI-powered public data collection for any source (#503)
* feat(skill): add data-scraper-agent skill
Workflow skill for building AI-powered public data collection agents.
Covers any scraping target: job boards, prices, news, GitHub, sports, events.
- Full architecture guide (config.yaml, scraper/, ai/, storage/)
- Gemini Flash free tier client with 4-model fallback chain
- Batch API pattern (5 items/call) — stays within free tier
- Feedback learning loop from user decisions
- Notion / Sheets / Supabase storage templates
- GitHub Actions cron schedule (100% free)
- Anti-patterns table, free tier limits reference, quality checklist
- Real-world examples and reference implementation (job-hunt-agent)
* fix(skill): address PR #503 review violations in data-scraper-agent
- Read batch_size from config.yaml instead of hardcoded constant
- Branch main.py on storage.provider; label example as Notion-only
- Replace undefined sync_feedback() with load_feedback() + comment
- Add commented Playwright browser install step to CI workflow
- Add permissions: contents: write; remove silent `git push || true`
- Remove external unvetted repo link from Reference Implementation
- Move import json to top of pipeline.py block (was after usage)
- Guard context.md read with exists() check; fall back to empty string
- Replace deprecated datetime.utcnow() with datetime.now(timezone.utc)
- Remove duplicate config.yaml entry from project directory template
|
2026-03-16 13:35:44 -07:00 |
|