| Agent / pipeline | What | State |
|---|
| AI Scraper 2 · Schnucks | Next.js _next/data JSON → fullUpc + nutrition, $0 direct; 19,736 staged | done |
| AI Scraper 2 · Foodtown | Instacart SSR DOM-UPC, multizone (8 sessions+Decodo) — now running 24/7 on DO droplet 159.223.164.186 (systemd), ~112K to go | running |
| AI Scraper 2 · Pathmark | Instacart SSR DOM-UPC, multizone (4 sessions+Decodo) — now running 24/7 on DO droplet (systemd), ~62K to go | running |
| AI Scraper 2 · multizone fix | built multizone_harvest.py — beats Instacart per-ZONE-SESSION throttle via N independent store cookies, resume-aware | done |
| AI Scraper 2 · Edgar-reject recovery | FINDING: reject pool = OCR gap not scrape gap. ~200K HEB/Walmart/CVS have images, just need Edgar re-OCR (owner). Source re-scrape = Incapsula-locked, not worth cost. → EDGAR_REJECT_RECOVERY.md | done |
| AI Scraper 3 · Wegmans | open Algolia (hardcoded keys) → real UPC + STRUCTURED nutrition + ingredients, $0; brand-facet sharding | running |
| AI Scraper 3 · Hy-Vee | self-hosted Next.js __NEXT_DATA__ → item.ean13 + structured ingredients, ~102K, $0 | running |
| AI Scraper 3 · Woodman's | Instacart zone-gated UPC + nutrition/ingredients via Decodo rotating proxy; autorun | running |
| AI Scraper 3 · QC + playbook | audit_staging.py (100% scannable GTIN) + verify_correctness.py (label vs Open Food Facts) + AI_SCRAPING_PLAYBOOK.md | done |
| Sprouts SSR harvest | shop.sprouts.com ~94K PDPs, UPC+nutrition | running |
| Freshop new keys | 21 AWG/regional app_keys, private labels | running |
| Instacart-SSR discovery | validating ~50 shop.<retailer>.com storefronts | running |
| Wynshop chains discovery | finding new Mi9/storefrontgateway chains | running |