Back to Portfolio
Featured Project

Pioneer License Verification Agent

Three-service system that automatically verifies healthcare professional licenses against state licensing boards and national credential registries. A Salesforce webhook fires the pipeline, a Playwright scraper service bypasses CAPTCHAs via Bright Data + 2Captcha, and results — with PDF evidence — are written back to Salesforce and surfaced in a live compliance dashboard.

Pioneer License Verification Agent screenshot 1
Pioneer License Verification Agent screenshot 2
Pioneer License Verification Agent screenshot 3

Tech Stack

FastAPIPlaywrightBright DataNext.jsSupabaseSalesforce APIAWS S3TypeScriptAnthropic Claude

Overview

Pioneer's compliance team was manually verifying every healthcare candidate's license on state board websites — a time-consuming, error-prone process with no audit trail. This system eliminates it entirely across three coordinated services: a FastAPI backend that orchestrates the flow and owns the Salesforce integration, a headless Playwright scraper service that navigates state and national licensing boards (bypassing Cloudflare, Turnstile, reCAPTCHA, and hCaptcha via Bright Data's residential proxy network and 2Captcha fallback), and a Next.js compliance dashboard at license.pioneercommission.com where the team monitors every verification job, sees PDF evidence, and receives real-time notifications. When a Job Applicant advances to 'Submitted to Account Manager' in Salesforce and their license is unverified, an Apex trigger fires a webhook, the pipeline runs fully automated, and License__c records with attached PDF screenshots land back in Salesforce — all without a human touching a browser.

Key Features

Fully automated end-to-end pipeline: Salesforce stage change → webhook → scraper → License__c records + PDF attachments back in Salesforce, zero human involvement
13 discipline types supported: PT, PTA, OT, COTA, SLP, SLPA, RN, LPN, BCBA, RBT, NCSP, Psychologist, SPEDTeacher — across clinical and school settings
Multi-state and Compact license handling: semicolon-separated License_States__c spawns one verification job per state; Compact licenses auto-route through PT Compact / Nursys NLC
National registry coverage: ASHA (SLP/SLPA), NBCOT (OT/COTA), Nursys NLC (RN/LPN), PT Compact (PT/PTA), NASP NCSP (Psychologist), BACB (BCBA/RBT) — layered on top of state checks for clinical candidates
4-layer CAPTCHA bypass stack: Bright Data Scraping Browser (Cloudflare / Turnstile / reCAPTCHA / hCaptcha / AWS WAF) → 2Captcha Turnstile solver (BACB, CA DCA) → 2Captcha reCAPTCHA solver (Nursys) → fresh residential IP per session to avoid re-block
playwright-stealth fingerprint patching on non-Bright Data scrapers (navigator.webdriver, WebGL vendor, sec-ch-ua headers, Chrome runtime flags)
Transient-failure retry with fresh browser context (up to 3 attempts) on timeouts, proxy errors, and site outages
Full-page PNG → PDF capture pipeline: dismisses cookie banners, chat widgets, and promo overlays before screenshot; wraps PNG in a single-page PDF sized to the image — the evidence PDF shows exactly what was on screen
6-status result model: verified, expired, inactive, not_found, captcha, site_unavailable — each maps to a distinct Salesforce and dashboard outcome
Real-time dashboard notifications: success (license verified), warning (CAPTCHA block), error (no record / scraper failure) — all linked to the job record
Salesforce-native evidence: License__c records created per result with license number, status, expiration, notes, and verification URL; PDF screenshot attached as a Salesforce ContentDocument
Hosted on AWS EC2 with S3 storage (7-day presigned PDF URLs) and Supabase PostgreSQL for job/result/notification persistence