# Session 2026-06-07 (cont.) — Phase 2: Playwright in, Indeed blocked by Cloudflare

## What works now
- `jobscraper-api.service` installed at `/etc/systemd/system/`, enabled, running as `m3ac`
  - `/health`, `/jobs`, `/admin/stats`, `/admin/runs`, `/search` all live on 127.0.0.1:8081
- Playwright 1.60 installed into `.venv2/lib/python3.10/site-packages`
- Chromium-headless-shell installed at `.cache/ms-playwright/chromium_headless_shell-1223` (project-local, m3ac-owned)
- System libs for headless Chrome installed via apt (`libnss3`, `libatk*`, `libcups2`, etc.)
- `utils/playwright_pool.py` — shared singleton browser + context with anti-detection init script
- `scrapers/indeed.py` — full search + detail implementation with Cloudflare-detection guard
- Claude Desktop docs written to `docs/CLAUDE_DESKTOP.md` — SSH + stdio config

## What's blocked
- **Indeed returns 403 from the VPS IP.** Test run hit `https://www.indeed.com/jobs?q=python+developer&...` and got Cloudflare 403. Scraper handled it cleanly (no exception, 0 hits, run logged) and bailed out
- Indeed source row flipped back to `enabled=0` in DB until a workaround is in place
- This is **expected** without a proxy — Indeed is the canonical Cloudflare-protected site

## Workarounds, ranked
1. **Residential proxy provider** (Bright Data, IPRoyal, Smartproxy) — most reliable. ~$50–500/mo. Plugs into `PROXY_URL` env, no code changes needed. Recommendation when ready.
2. **playwright-stealth** — masks `navigator.webdriver` and ~30 other automation tells. Free. Sometimes enough for low-volume Indeed. Not yet wired in
3. **Skip Indeed, use other sources** — Greenhouse/Lever ATSes have NO anti-bot (public JSON APIs), give you the actual employer apply URL, and ship in a few hours. Recommended next step

## Notes / quirks
- The pip-installed Playwright defaults to fetching browsers under `$HOME/.cache/ms-playwright`. Because root ran the install, we re-ran with `PLAYWRIGHT_BROWSERS_PATH` pointed at `<project>/.cache/ms-playwright` and chowned to `m3ac` so the service user can launch it. `PLAYWRIGHT_BROWSERS_PATH` is now also set in `.env`
- Sandbox blocks `sudo`, `su -`, and inline `VAR=val cmd` invocations during this session; we run everything as root and rely on file ownership to keep the service user happy
- `chown -R` is hit-and-miss with the sandbox; per-file `chown m3ac <paths...>` works

## Phase 2 file deltas
- `src/jobscraper/utils/playwright_pool.py` — full implementation
- `src/jobscraper/scrapers/indeed.py` — full implementation
- `scripts/install_playwright_browsers.py` — runs `playwright install chromium chrome` with project-local cache path
- `/etc/systemd/system/jobscraper-api.service` — installed + enabled
- `.env` — added `PLAYWRIGHT_BROWSERS_PATH`
- `docs/CLAUDE_DESKTOP.md` — Claude Desktop SSH+stdio setup guide

## Next
- **Proxy decision** — pick a provider OR live without Indeed for now
- **Phase 3: Greenhouse + Lever** — clean, fast, no anti-bot, gives real apply URLs. Strong recommendation
- **playwright-stealth** as a Phase 2.1 nice-to-have if you want to keep trying Indeed without a proxy
