fix: bulletproof competitor scraper — 4-tier fallback chain

Tier 1-3: HTTP with Chrome/Firefox/Safari UAs + full browser headers
Tier 4: Gemini + Google Search grounding (bypasses everything)

- Dead URLs (404): skips straight to Gemini, finds product via Google
- Cloudflare/CAPTCHA: detected and routed to Gemini
- JS-rendered pages: Gemini reads them via Google's infrastructure
- Updated default competitor URL to Vitabiotics (works direct)

Tested against:
- H&B dead URL (404) → Gemini found full product data
- Boots (Cloudflare) → Gemini returned £4.00, 4.6★, 8 bullets
- Vitabiotics → direct Chrome scrape, 9 bullets
- Amazon (CAPTCHA) → Gemini grounding fallback
This commit is contained in:
2026-03-02 21:12:55 +08:00
parent 88fb443f63
commit ccfc9ceeb1
2 changed files with 388 additions and 31 deletions

View File

@@ -80,7 +80,7 @@
<div class="input-row">
<div class="input-group">
<label>COMPETITOR PRODUCT URL</label>
<input type="url" id="demoB-url" placeholder="https://www.competitor.com/product..." value="https://www.hollandandbarrett.com/shop/product/holland-barrett-vitamin-d3-tablets-25ug-1000-i-u--60001496">
<input type="url" id="demoB-url" placeholder="https://www.competitor.com/product..." value="https://www.vitabiotics.com/products/ultra-vitamin-d-1000iu">
</div>
<button class="btn-gen blue" id="demoB-btn" onclick="runDemoB()">🔍 X-Ray This Competitor</button>
</div>