# Session 2026-06-08 — Phase 9: multi-ATS scrapers + Playwright applier framework

## What's new

### Scrapers (9 total)
- **workday** — public `/wday/cxs/{tenant}/{site}/jobs` POST API; per-tenant config
- **icims** — Playwright on `careers-{employer}.icims.com/jobs/search`; per-employer config
- **taleo** — Playwright on `{org}.taleo.net/careersection/{section}/jobsearch.ftl`
- (existing: linkedin, indeed, greenhouse, lever, usajobs, adzuna)

### Appliers (6 total)
- **playwright_base.PlaywrightApplier** — abstract scaffold: `dry_run` + `submit` shells, screenshot per step, payload persistence, confirm_token gate, daily cap. Subclasses only implement `walk_form(page, ctx)`
- **WorkdayApplier** — click Apply → optional sign-in (creds in `application_answers.workday_credentials[tenant]`) → fill form via answer-matchers + heuristics → resume upload → stop at Review (does NOT submit). `submit()` re-opens and clicks the final Submit button only with a valid confirm_token
- **ICIMSApplier**, **TaleoApplier** — similar shape
- **GenericApplier** — catch-all for ATSes we haven't specifically modeled

### Profile changes
- `confidence_mode` column added — `known_only | review_unknowns | auto_send` (default `review_unknowns`)
- `exclude_companies` updated: JPMorgan, JPMorgan Chase, JP Morgan, JPMC, Chase, Chase Bank (per Michael's request; he works at JPMC now)

### Smart question routing (the answer to "make best guess")
`appliers/playwright_base.fill_known_or_guess(page, elem, label, ctx)`:
1. **Matchers first** — checks label text against the 19 `application_answers` matchers ("work auth", "sponsorship", "salary expectation", "LinkedIn URL", etc.). When matched → fills with known answer, `confidence=known`
2. **Mode check** — if profile is `known_only` and no match, the field is skipped and added to `needs_review`
3. **Heuristic guess** — for `review_unknowns` mode, simple patterns fill what they can: first/last name, email, phone, city/state/country, generic Y/N for consent-style questions, `years_experience_total` for unlabeled experience years. `confidence=guessed`
4. **Otherwise** — flagged in `needs_review` for dashboard review

Every fill ends up in the dry-run preview with its `label`, `value`, matched `key`, and `confidence` so you can see at a glance which fields the agent was guessing vs answering with conviction.

## Confidence modes

| Mode | What happens |
|---|---|
| `known_only` | refuses unknown fields; conservative |
| `review_unknowns` *(default)* | guesses everything plausible; dry-run preview highlights guesses; you OK them before submit |
| `auto_send` | guesses + auto-submits if required fields present (intended for high-confidence API ATSes only — Greenhouse/Lever) |

The `confirm_token` rail still holds for `auto_send` — the user just doesn't have to manually call `submit_apply`.

## ATS detection (URL-based routing)

`appliers/registry.detect_for_url(url)` walks detectors in order:
greenhouse → lever → workday → icims → taleo → generic. First match wins.
When `plan_apply` runs against a job, the right adapter is selected automatically.

## What's enabled now / next steps for you

**Enabled & live:** linkedin, greenhouse (6 tokens), lever (3 tokens), usajobs (pending API key)

**Scaffolded, disabled until you populate config:**
- workday — needs `tenants: [{tenant, wd_pod, site}, ...]` — see commented examples in `scripts/seed_sources.py`
- icims — needs `employers: [{slug, careers_url}, ...]`
- taleo — same shape

I can look up the real Workday tenant URLs for Columbus employers (Nationwide, Cardinal Health, Huntington, Kroger, OhioHealth, Battelle, etc.) and bulk-load them — those are public info. Or you can paste them and I'll add them.

## Hard rules still enforced
- No `submit_apply` without fresh `confirm_token` from `dry_run_apply`
- 30-minute token TTL
- Max 10 real submits / 24h
- Dry-run takes screenshots of every step into `data/screenshots/<application_id>/`
- All inputs the agent fills are logged with `(label, value, confidence)` triples on the application row

## Open
- LLM cover-letter / free-text generation (Michael deferred — appliers fall back to template + heuristic)
- Real resume PDF (still a 1-line stub at `data/resumes/_dummy.pdf`)
- Per-tenant Workday accounts (each employer requires registration → credentials live in `application_answers.workday_credentials[tenant]`)
- **Worker restart still pending** — sandbox blocked the previous `systemctl restart`; the running worker is still on yesterday's code (no timezone fix, no new scrapers, no answers routing)
