Native Chrome Profiles
Sometimes a scrape needs the same Chrome you use day to day, logged into LinkedIn, your SSO provider, or a SaaS dashboard. VoidCrawl’s profile API leases one of your real Chrome profiles exclusively for a session, launches Chrome against it, and returns the lease when you are done.
This is a Python-only surface. The MCP server pins to a profile at startup, but never exposes profile management to agents.
Listing installed profiles
from voidcrawl.profiles import list_profiles
for p in list_profiles(): print(p.name, p.path)list_profiles() is synchronous and safe to call from any thread. It discovers every directory under the platform-specific Chrome root that contains a Preferences file.
| OS | Discovery root |
|---|---|
| Linux | ~/.config/google-chrome/, ~/.config/chromium/, ~/.config/google-chrome-beta/ |
| macOS | ~/Library/Application Support/Google/Chrome/, ~/Library/Application Support/Chromium/ |
| Windows | %LOCALAPPDATA%\Google\Chrome\User Data\, %LOCALAPPDATA%\Chromium\User Data\ |
Common profile names: Default, Profile 1, Profile 2, Guest Profile.
Leasing a profile
The happy path is the with_profile async context manager:
import asynciofrom voidcrawl.profiles import with_profile
async def main(): async with with_profile("Profile 1") as handle: page = await handle.session.new_page("https://linkedin.com/feed") html = await page.content() print(len(html))
asyncio.run(main())On entry, VoidCrawl:
- Resolves
"Profile 1"to its on-disk path. - Opens
<profile>/.voidcrawl.lockand takes an exclusive advisory lock (cross-process, cross-user). - Launches Chrome headful with
--user-data-dir=<profile>. - Hands back a
ProfileHandleexposing.session(a liveBrowserSession).
On exit, Chrome is closed cleanly and the lock file is released.
Lease errors
| Exception | When it fires |
|---|---|
ProfileNotFound | No directory matching name under any discovery root. |
ProfileBusy | Another voidcrawl process holds the lock and lease_timeout=0. |
ProfileLeaseExpired | Lock was held for the full lease_timeout window and we gave up waiting. |
All three inherit from VoidCrawlError, so a broad except VoidCrawlError: catches everything:
from voidcrawl.profiles import with_profile, ProfileBusy, ProfileNotFound
try: async with with_profile("Profile 1", lease_timeout=10.0) as handle: ...except ProfileBusy: print("Another pipeline is already using this profile")except ProfileNotFound as e: print(f"No such profile: {e}")lease_timeout is the number of seconds to poll before giving up (default 300.0). Use 0.0 for a fail-fast check.
Low-level API
If you need explicit lifecycle control, for example when embedding into another context manager:
from voidcrawl.profiles import acquire_profile, release_profile
handle = await acquire_profile("Profile 1", lease_timeout=30.0)try: page = await handle.session.new_page("https://example.com") ...finally: await release_profile(handle)Design notes
- One Chrome process per profile. Every lease spawns its own
--user-data-dirChrome. This costs roughly 200 MB RAM per active profile but gives you clean per-profile crash isolation and sidesteps Chrome’s--profile-directorysingleton locking. - Warm profiles only. VoidCrawl never creates or seeds profiles. Log into whatever service you need in real Chrome first; VoidCrawl borrows it.
- MCP scope. Profile management is pipeline-only; agents drive whatever profile you pinned at server launch. See the MCP server guide.
Rotation (bot-wall hygiene)
Hitting a bot-managed domain repeatedly from one identity raises its risk score. Rotation is a pipeline pattern, not an agent action — keep a pool of (proxy, profile) identities and pick one per task:
- Proxy first. A residential / rotating exit (
BrowserConfig.proxy) is the biggest lever for IP reputation — and the only lever once the fingerprint is already clean (see Stealth). - Profile per identity. Give each identity its own
user_data_dirand round-robin. Don’t fan one profile across concurrent leases — Chrome locks it (one process per profile). - Pace. Reuse one session for same-origin work; space
fetch_manybatches against managed domains rather than firing them back-to-back.
There is no built-in rotator; it’s a caller/pipeline concern.
Warm profiles & Cloudflare cf_clearance
A profile that’s browsed a Cloudflare-fronted site carries a cf_clearance cookie. Be precise about what that buys you:
- ✅ It satisfies the Cloudflare edge challenge (the “checking your browser” interstitial) on revisits.
- ❌ It does not satisfy an inline managed-Turnstile widget — that issues a separate
cf-turnstile-responsetoken, scored fresh per request. A bankedcf_clearancedoes nothing for it.
So a warm profile helps you reach a Cloudflare-fronted page, but clearing an on-page Turnstile is the headful + hardware-GPU job (Captcha handling). cf_clearance is also bound to the UA, so it only helps if the session presents the same UA that earned it. To persist it, mount a persistent user_data_dir (or a Docker volume over the profile dir).
FAQs
What is a native Chrome profile in VoidCrawl?
A directory under Chrome’s user data root (for example ~/.config/google-chrome/Profile 1) that already contains your cookies, logged-in sessions, and installed extensions. VoidCrawl can lease one of these exclusively and launch Chrome against it, so your scrape inherits the same logged-in state your real browser has.
Where does VoidCrawl look for installed profiles?
See the platform table above. Any directory under those roots containing a Preferences file is recognised as a profile.
What happens if two processes try to lease the same profile?
The second caller sees ProfileBusy immediately (if lease_timeout=0) or ProfileLeaseExpired after waiting the full timeout. VoidCrawl uses a cross-process advisory lock at <profile>/.voidcrawl.lock, so the arbitration works even across separate Python interpreters.
Do I need to close my real Chrome before leasing a profile?
Yes, for the specific profile you want to lease. Chrome itself holds a SingletonLock on any profile it is actively using, independent of VoidCrawl’s lock. If your daily-driver Chrome has that profile open, close the window first.
Why one Chrome process per profile instead of one Chrome with —profile-directory?
Per-profile isolation. Separate --user-data-dir processes each crash, restart, and lease independently, and they sidestep Chrome’s internal singleton behaviour. The trade-off is roughly 200 MB RAM per active profile, which is acceptable for pipeline workloads.
Is the profile API available over MCP?
No, it is Python-only. The MCP server picks a profile at startup with --profile NAME or VOIDCRAWL_PROFILE=NAME, and agents drive whatever Chrome the operator handed them. That keeps profile selection a human decision.
Does VoidCrawl create new profiles or seed them with logins?
No. VoidCrawl borrows warm profiles you already maintain in real Chrome. Log into the service you need normally, then point VoidCrawl at that profile name.
References
△ User Data Directory. Chromium. Reference for Chrome’s user data root and profile layout. https://chromium.googlesource.com/chromium/src/+/HEAD/docs/user_data_dir.md
○ fs2. danburkert. Cross-platform advisory file locks in Rust, used for profile arbitration. https://docs.rs/fs2/latest/fs2/
◑ VoidCrawl. CascadingLabs. VoidCrawl source repository on GitHub. https://github.com/CascadingLabs/VoidCrawl