Skip to content
Cascading Labs QScrape VoidCrawl Yosoi

Captcha Handling

VoidCrawl takes a hard line on captchas: surface, do not solve. When a page trips a bot wall, VoidCrawl raises a typed exception with enough context for your pipeline to rotate, back off, or escalate to a human. It does not click checkboxes, submit tokens, or call third-party solving services.

This keeps the library safe, predictable, and on the right side of every major service’s terms of use.

What gets detected

As of 0.3.0, VoidCrawl probes the DOM for known markers. Detection is best-effort and deliberately conservative. False positives are worse than false negatives when the remediation is “rotate the upstream”.

ProviderDOM signal
Google reCAPTCHAiframe[src*="recaptcha"], .g-recaptcha
hCaptchaiframe[src*="hcaptcha"], .h-captcha
Cloudflare Turnstileiframe[src*="challenges.cloudflare.com"], .cf-turnstile
Cloudflare interstitial#cf-challenge, title contains “Just a moment”

Catching it in Python

from voidcrawl import BrowserPool, PoolConfig
from voidcrawl.profiles import CaptchaDetected
async with BrowserPool(PoolConfig()) as pool, pool.acquire() as tab:
try:
await tab.goto("https://example.com/gated-page", fail_on_captcha=True)
html = await tab.content()
except CaptchaDetected as e:
# e.args[0] -> "captcha detected: recaptcha"
# Rotate the upstream (new proxy, new profile) and retry upstream.
raise

The fail_on_captcha=True flag is opt-in. Existing 0.2.x callers keep the silent-pass behaviour. Without the flag, goto returns normally and you can detect manually via content() or a selector probe.

Catching it over MCP

MCP tool errors include a structured data envelope so your agent can dispatch without string-matching messages:

{
"code": -32001,
"message": "captcha detected: turnstile",
"data": {
"exception": "CaptchaDetected",
"kind": "turnstile"
}
}

In a Claude Code skill file, the idiomatic recipe is:

  1. navigate the URL.
  2. If the response carries data.exception === "CaptchaDetected", do not re-hit the same URL on the same fingerprint.
  3. Change something before retrying — go headful first (see below), then a cleaner IP. Don’t just stop: managed Turnstile often passes once headful.

Passing managed Turnstile (headful)

Detecting a captcha is not the same as failing it. Cloudflare Turnstile in managed mode — the most common embed — scores the browser and, for a sufficiently real one, issues a token with no interaction. VoidCrawl’s hardened fingerprint (hardware GPU, consistent UA/Client-Hints, no JS injection — see Stealth) clears it when headful:

ModeManaged Turnstile
HeadfulPasses non-interactively (verified server-side: siteverify success:true, interactive:false)
HeadlessGated — stalls at before-interactive, no token

So the first move on a Turnstile wall is go headful (CHROME_HEADLESS=0, or the headful Docker container) — not rotate. This is pass, don’t solve: Cloudflare issues the token to a browser it scores as human; VoidCrawl never forges or auto-submits one.

A site can also gate its results behind a login/account wall after Turnstile passes (e.g. “sign in to view”). That’s an auth gate, not bot detection — passing Turnstile doesn’t bypass it.

Rotation strategies

When headful still doesn’t clear the wall, the lever is usually IP reputation — a flagged or datacenter IP gets challenged even with a clean, headful browser. Your pipeline picks the response:

  • New IP. Swap to a residential/different proxy in BrowserConfig.proxy — the biggest lever once the fingerprint is clean.
  • New profile. Lease a different warm profile via with_profile. (A warm profile’s cf_clearance helps with the Cloudflare edge gate, not an inline managed-Turnstile widget.)
  • Back off. Sleep and retry later. Some walls are rate-based and clear on their own.
  • Abort. Record the URL as uncrawlable and move on.

What VoidCrawl will not do: feed images to an OCR service, solve reCAPTCHA puzzles, or forge/inject Turnstile tokens. A managed Turnstile that issues its own token to a real-enough browser is a pass, not solving (above) — but VoidCrawl never fabricates one.

FAQs

Does VoidCrawl solve captchas?

No, and it will not. VoidCrawl’s philosophy is “surface, do not solve”. When a page trips a bot wall, VoidCrawl raises CaptchaDetected with enough context for your pipeline to rotate, back off, or escalate to a human. It never clicks checkboxes, submits tokens, or calls third-party solving services.

Which captcha providers does VoidCrawl detect?

As of 0.3.0, VoidCrawl detects Google reCAPTCHA (v2 and v3), hCaptcha, Cloudflare Turnstile, and Cloudflare interstitial challenges. Detection is DOM-only and deliberately conservative.

What DOM signals does VoidCrawl use for detection?

See the table above for the exact selectors per provider.

Does VoidCrawl detect canvas-only (visual) captchas?

No. 0.3.0 is DOM-only. A captcha rendered purely to canvas with no DOM fingerprint is not detected. Opt-in visual detection is tracked for 0.4.

How do I catch CaptchaDetected in Python?

Call goto(url, fail_on_captcha=True) and wrap it in try: ... except CaptchaDetected: .... Without the flag, goto returns normally (silent behaviour is preserved for 0.2.x compatibility), and you can detect manually via content() or a selector probe.

How does the MCP server report captchas?

Every tool call that hits a captcha returns an MCP error with code -32001, a human-readable message, and a structured data envelope: { "exception": "CaptchaDetected", "kind": "recaptcha" }. Agent middleware dispatches on data.exception without string-matching the message.

What should my pipeline do when it sees CaptchaDetected?

Rotate. Options, in order of typical preference: lease a different warm profile, switch to a new proxy, back off and retry later, or mark the URL as uncrawlable and move on. VoidCrawl is the detector, not the recovery mechanism.

References

reCAPTCHA. Google. Developer documentation for Google’s reCAPTCHA v2 and v3. https://developers.google.com/recaptcha

hCaptcha. Intuition Machines. Official hCaptcha integration documentation. https://docs.hcaptcha.com/

Turnstile. Cloudflare. Documentation for Cloudflare’s privacy-first captcha alternative. https://developers.cloudflare.com/turnstile/

VoidCrawl. CascadingLabs. VoidCrawl source repository on GitHub. https://github.com/CascadingLabs/VoidCrawl