Profiles

Q: What is a native Chrome profile in VoidCrawl?

A directory under Chrome's user data root (for example ~/.config/google-chrome/Profile 1) that already contains your cookies, logged-in sessions, and installed extensions. VoidCrawl can lease one of these exclusively and launch Chrome against it, so your scrape inherits the same logged-in state your real browser has.

Q: Where does VoidCrawl look for installed profiles?

On Linux, the user config dirs for google-chrome, chromium, and google-chrome-beta. On macOS, the Application Support dirs for Google Chrome and Chromium. On Windows, the Local AppData User Data dirs for Chrome and Chromium. Any directory under those roots containing a Preferences file counts.

Q: What happens if two processes try to lease the same profile?

The second caller sees ProfileBusy immediately (if lease_timeout is 0) or ProfileLeaseExpired after waiting the full timeout. VoidCrawl uses a cross-process advisory lock at profile/.voidcrawl.lock, so the arbitration works even across separate Python interpreters.

Q: Why one Chrome process per profile instead of one Chrome with profile-directory?

Per-profile isolation. Separate user-data-dir processes each crash, restart, and lease independently, and they sidestep Chrome's internal singleton behaviour. The trade-off is roughly 200 MB RAM per active profile, which is acceptable for pipeline workloads.

Q: Is the profile API available over MCP?

Native Chrome profile discovery is not exposed over MCP. VoidCrawl-managed profiles are exposed as metadata and pool tools, and session_open can lease a managed profile by profile_id or profile_pool.

Q: Does VoidCrawl create new profiles or seed them with logins?

Native profile leasing never creates or seeds your daily Chrome profiles. Managed profiles can be created under VOIDCRAWL_PROFILE_ROOT, but login and identity seeding are still operator or pipeline work.

Q: How do I get a persistent profile that VoidCrawl owns, without leasing?

Set BrowserConfig.user_data_dir to a durable directory. Chrome creates it on first launch and reuses it on every run, so cookies and site state persist. It works with BrowserSession and a single-browser BrowserPool, but takes no lease lock -- cross-process arbitration is on you.

Sometimes a scrape needs persistent browser state: cookies, local storage, extensions, or a warmed anti-bot identity. VoidCrawl supports two profile models:

Native Chrome profile leasing borrows an existing daily Chrome profile such as Default or Profile 1. This is a Python-only surface, except that the MCP server can be pinned to one native profile at launch.
VoidCrawl-managed profiles are standalone Chrome user_data_dir roots under VOIDCRAWL_PROFILE_ROOT. Python and MCP can create, clone, list, pool, and lease them per session.

Use native leasing when a human has already logged into a real Chrome profile. Use managed profiles when a crawler or MCP agent needs a bounded set of scrape-owned identities.

Listing installed profiles

from voidcrawl.profiles import list_profiles

for p in list_profiles():
    print(p.name, p.path)

list_profiles() is synchronous and safe to call from any thread. It discovers every directory under the platform-specific Chrome root that contains a Preferences file.

OS	Discovery root
Linux	`~/.config/google-chrome/`, `~/.config/chromium/`, `~/.config/google-chrome-beta/`
macOS	`~/Library/Application Support/Google/Chrome/`, `~/Library/Application Support/Chromium/`
Windows	`%LOCALAPPDATA%\Google\Chrome\User Data\`, `%LOCALAPPDATA%\Chromium\User Data\`

Common profile names: Default, Profile 1, Profile 2, Guest Profile.

Leasing a profile

The happy path is the with_profile async context manager:

import asyncio
from voidcrawl.profiles import with_profile

async def main():
    async with with_profile("Profile 1") as handle:
        page = await handle.session.new_page("https://linkedin.com/feed")
        html = await page.content()
        print(len(html))

asyncio.run(main())

On entry, VoidCrawl:

Resolves "Profile 1" to its on-disk path.
Opens <profile>/.voidcrawl.lock and takes an exclusive advisory lock (cross-process, cross-user).
Launches Chrome headful with --user-data-dir=<profile>.
Hands back a ProfileHandle exposing .session (a live BrowserSession).

On exit, Chrome is closed cleanly and the lock file is released.

Lease errors

Exception	When it fires
`ProfileNotFound`	No directory matching `name` under any discovery root.
`ProfileBusy`	Another voidcrawl process holds the lock and `lease_timeout=0`.
`ProfileLeaseExpired`	Lock was held for the full `lease_timeout` window and we gave up waiting.

All three inherit from VoidCrawlError, so a broad except VoidCrawlError: catches everything:

from voidcrawl.profiles import with_profile, ProfileBusy, ProfileNotFound

try:
    async with with_profile("Profile 1", lease_timeout=10.0) as handle:
        ...
except ProfileBusy:
    print("Another pipeline is already using this profile")
except ProfileNotFound as e:
    print(f"No such profile: {e}")

lease_timeout is the number of seconds to poll before giving up (default 300.0). Use 0.0 for a fail-fast check.

Low-level API

If you need explicit lifecycle control, for example when embedding into another context manager:

from voidcrawl.profiles import acquire_profile, release_profile

handle = await acquire_profile("Profile 1", lease_timeout=30.0)
try:
    page = await handle.session.new_page("https://example.com")
    ...
finally:
    await release_profile(handle)

Design notes

One Chrome process per profile. Every lease spawns its own --user-data-dir Chrome. This costs roughly 200 MB RAM per active profile but gives you clean per-profile crash isolation and sidesteps Chrome’s --profile-directory singleton locking.
Native profiles are warm only. VoidCrawl never creates or seeds your daily Chrome profiles. Log into whatever service you need in real Chrome first, close that profile, then let VoidCrawl borrow it.
Managed profiles are scrape-owned. VoidCrawl can create and clone standalone managed profiles, but it still does not fabricate logins or bypass site policy. You seed them by normal browsing, a trusted operator flow, or your own pipeline.
MCP scope. MCP clients cannot enumerate native Chrome profiles. They can use managed-profile tools inside the configured managed root. See the MCP server guide.

VoidCrawl-managed profiles

Managed profiles are standalone Chromium user_data_dir directories owned by VoidCrawl. They live under VOIDCRAWL_PROFILE_ROOT; if unset, VoidCrawl uses the platform data dir, for example ~/.local/share/voidcrawl/profiles on Linux.

Python API:

from voidcrawl.profiles import ProfileRegistry

registry = ProfileRegistry.default()
registry.create_profile(
    "research-1",
    description="warm identity for research tasks",
    labels=["research"],
)
registry.create_pool("research", ["research-1"], max_active=1)

Split one profile across Chrome instances

Chrome refuses to let two processes write one physical user_data_dir. split_profile turns one quiesced managed profile into isolated temporary copies in a single operation:

from voidcrawl import BrowserConfig, BrowserSession

async with registry.split_profile("research-1", copies=2) as split:
    first_path, second_path = split.paths
    async with (
        BrowserSession(BrowserConfig(user_data_dir=first_path)) as first,
        BrowserSession(BrowserConfig(user_data_dir=second_path)) as second,
    ):
        ...  # two separate Chrome instances from the same starting profile

VoidCrawl holds one source lease across the complete split. Both copies therefore contain the same starting cookies, storage, extensions, bookmarks, and profile identity. Each copy has a unique directory and its own Chrome SingletonLock, like two Docker volumes initialized from one profile.

This is copy-on-start isolation, not live synchronization. Writes made by one Chrome do not appear in the other and are not merged back into the source. The temporary copies are deleted when the context exits. Splits are limited to 2 through 16 copies as a disk-usage guardrail. Use snapshot_profile when only one disposable copy is needed.

Close regular Chrome, then fork its installed native Default profile into two visible Chrome instances with the repository’s runnable demo:

uv run python examples/profile_split_headful.py --hold-seconds 20

Use --source "Profile 1" or an explicit native profile-directory path to select a different profile. fork_profile copies that profile and Chrome’s root Local State into each standalone worker directory.

MCP tools expose the same registry:

Tool	Purpose
`profile_create`	Create a standalone managed profile.
`profile_clone`	Clone a managed profile id or explicit `user_data_dir` path into a new managed profile.
`profile_list` / `profile_describe`	Return metadata only: id, path, labels, description, size, status.
`profile_delete`	Delete an unlocked managed profile.
`profile_pool_create`	Create or replace a named round-robin pool.
`profile_pool_list` / `profile_pool_describe`	Inspect pools and member profile metadata.

Lease a managed profile into a stateful MCP session:

{ "profile_id": "research-1", "headful": true }

Or lease from a pool:

{ "profile_pool": "research", "headful": false }

The response includes the selected profile_id. Managed-profile tools never expose cookie or storage values, but sessions using those profiles can act with their authenticated state, so scope VOIDCRAWL_PROFILE_ROOT deliberately.

Dedicated persistent profiles (`BrowserConfig.user_data_dir`)

Leasing borrows a profile your real Chrome maintains. When you instead want a dedicated, scrape-owned profile that persists across runs — a warm identity for anti-bot validation, or a long-lived local session that banks cookies like cf_clearance — set user_data_dir directly on BrowserConfig:

from voidcrawl import BrowserConfig, BrowserSession

cfg = BrowserConfig(user_data_dir="/var/lib/voidcrawl/warm-profile")
async with BrowserSession(cfg) as browser:
    page = await browser.new_page("https://example.com")

The directory is created by Chrome on first launch and reused on every subsequent one, so cookies, local storage, and site state survive between runs. It works with BrowserSession and with a single-browser BrowserPool; a pool with browsers > 1 or chrome_ws_urls raises ValueError — one Chrome process per profile is a hard rule (Chrome locks the directory).

Rotation (bot-wall hygiene)

Hitting a bot-managed domain repeatedly from one identity raises its risk score. Rotation is a pipeline pattern, not an agent action — keep a pool of (proxy, profile) identities and pick one per task:

Proxy first. A residential / rotating exit (BrowserConfig.proxy) is the biggest lever for IP reputation — and the only lever once the fingerprint is already clean (see Stealth).
Profile per identity. Give each identity its own user_data_dir and round-robin. Don’t fan one profile across concurrent leases — Chrome locks it (one process per profile).
Pace. Reuse one session for same-origin work; space fetch_many batches against managed domains rather than firing them back-to-back.

For MCP sessions, create a managed pool with profile_pool_create and call session_open with profile_pool. For Python-only flows, rotation can still be a caller pattern over (proxy, user_data_dir) identities.

Warm profiles & Cloudflare `cf_clearance`

A profile that’s browsed a Cloudflare-fronted site carries a cf_clearance cookie. Be precise about what that buys you:

✅ It satisfies the Cloudflare edge challenge (the “checking your browser” interstitial) on revisits.
❌ It does not satisfy an inline managed-Turnstile widget — that issues a separate cf-turnstile-response token, scored fresh per request. A banked cf_clearance does nothing for it.

So a warm profile helps you reach a Cloudflare-fronted page, but clearing an on-page Turnstile is the headful + hardware-GPU job (Captcha handling). cf_clearance is also bound to the UA, so it only helps if the session presents the same UA that earned it. To persist it, point BrowserConfig.user_data_dir at a durable directory (or a Docker volume over the profile dir).

FAQs

What is a native Chrome profile in VoidCrawl?

A directory under Chrome’s user data root (for example ~/.config/google-chrome/Profile 1) that already contains your cookies, logged-in sessions, and installed extensions. VoidCrawl can lease one of these exclusively and launch Chrome against it, so your scrape inherits the same logged-in state your real browser has.

Where does VoidCrawl look for installed profiles?

See the platform table above. Any directory under those roots containing a Preferences file is recognised as a profile.

What happens if two processes try to lease the same profile?

The second caller sees ProfileBusy immediately (if lease_timeout=0) or ProfileLeaseExpired after waiting the full timeout. VoidCrawl uses a cross-process advisory lock at <profile>/.voidcrawl.lock, so the arbitration works even across separate Python interpreters.

Do I need to close my real Chrome before leasing a profile?

Yes, for the specific profile you want to lease. Chrome itself holds a SingletonLock on any profile it is actively using, independent of VoidCrawl’s lock. If your daily-driver Chrome has that profile open, close the window first.

Why one Chrome process per profile instead of one Chrome with —profile-directory?

Per-profile isolation. Separate --user-data-dir processes each crash, restart, and lease independently, and they sidestep Chrome’s internal singleton behaviour. The trade-off is roughly 200 MB RAM per active profile, which is acceptable for pipeline workloads.

How do I get a persistent profile that VoidCrawl owns, without leasing?

Set BrowserConfig.user_data_dir to a durable directory. Chrome creates it on first launch and reuses it on every run, so cookies and site state persist. It works with BrowserSession and a single-browser BrowserPool, but takes no lease lock — cross-process arbitration is on you.

Is the profile API available over MCP?

Native Chrome profile discovery is not exposed over MCP. The server can be pinned to one native profile at startup with --profile NAME or VOIDCRAWL_PROFILE=NAME.

VoidCrawl-managed profiles are exposed over MCP as profile_* and profile_pool_* tools, and session_open can lease one with profile_id or profile_pool. Those tools return metadata only, but a session launched with the profile can use its authenticated browser state.

Does VoidCrawl create new profiles or seed them with logins?

For native Chrome profile leasing, no. VoidCrawl borrows warm profiles you already maintain in real Chrome.

For managed profiles, yes: VoidCrawl can create an empty standalone profile or clone an existing managed profile or explicit source path. It does not seed logins for you.

References

△ User Data Directory. Chromium. Reference for Chrome’s user data root and profile layout. https://chromium.googlesource.com/chromium/src/+/HEAD/docs/user_data_dir.md

○ fs2. danburkert. Cross-platform advisory file locks in Rust, used for profile arbitration. https://docs.rs/fs2/latest/fs2/

◑ VoidCrawl. CascadingLabs. VoidCrawl source repository on GitHub. https://github.com/CascadingLabs/VoidCrawl