Commit Graph

3 Commits

Author SHA1 Message Date
ab875cd4d9 fix: JV scraper broken by brotli encoding + improved robustness
ROOT CAUSE: _browser_headers() included 'Accept-Encoding: gzip, deflate, br'
but the container has no brotli decoder. Server sent compressed response
that requests couldn't decode → garbled HTML → empty title → 'Could not
find product' error on Demo A and Demo C.

FIXES:
- Remove 'br' from Accept-Encoding (use 'gzip, deflate' only)
- Price extraction: try itemprop on any element, then .pricec class, then regex
- Image extraction: multi-strategy (itemprop, gallery links, CDN pattern, OG)
- Detect homepage redirect (product removed/renamed) → clear error message
- Increase timeout from 15s to 20s for JV product scraping

TESTED:
- D3+K2: Title ✓, Price £12.95 ✓, 10 benefits ✓, 3 images ✓
- Vitamin D3 4000iu: Title ✓, £8.95 ✓, 6 benefits ✓, 7 images ✓
- B12: Title ✓, £11.95 ✓, 10 benefits ✓, 7 images ✓
- Removed product: clean error 'redirected to homepage'
2026-03-02 22:43:38 +08:00
ccfc9ceeb1 fix: bulletproof competitor scraper — 4-tier fallback chain
Tier 1-3: HTTP with Chrome/Firefox/Safari UAs + full browser headers
Tier 4: Gemini + Google Search grounding (bypasses everything)

- Dead URLs (404): skips straight to Gemini, finds product via Google
- Cloudflare/CAPTCHA: detected and routed to Gemini
- JS-rendered pages: Gemini reads them via Google's infrastructure
- Updated default competitor URL to Vitabiotics (works direct)

Tested against:
- H&B dead URL (404) → Gemini found full product data
- Boots (Cloudflare) → Gemini returned £4.00, 4.6★, 8 bullets
- Vitabiotics → direct Chrome scrape, 9 bullets
- Amazon (CAPTCHA) → Gemini grounding fallback
2026-03-02 21:12:55 +08:00
09d837a660 v2: Live Flask app — real Gemini AI demos, Nano Banana image gen, real £19.4M data dashboard
- Flask + gunicorn backend replacing static nginx
- 3 live AI demos powered by Gemini 2.5 Flash
- Nano Banana + Nano Banana Pro for product image generation
- Real JV ecommerce dashboard (728K orders, 230K customers, 4MB data)
- AI Infrastructure Proposal + Offer pages
- Live product scraper for justvitamins.co.uk + competitor pages
- API: /api/scrape, /api/generate-pack, /api/competitor-xray, /api/pdp-surgeon, /api/generate-images
2026-03-02 20:02:25 +08:00