Skip to content
Cascading Labs QScrape VoidCrawl Yosoi

Operations API

The current public surface is operation-first. Python helpers and CLI commands compile user input into typed request models, execute the same runtime, and return typed result envelopes that are safe to serialize.

OperationPython helperRequestResultCLI
Scrape URLs with contractsys.scrape(...)ys.ScrapeRequestys.ScrapeResultyosoi scrape
Search the web for source URLsys.search(...)ys.SearchRequestys.SearchResultyosoi search
Crawl seed URLsys.crawl(...) or ys.run_crawl(...)ys.CrawlRequestys.CrawlResult or CrawlRunSummaryyosoi crawl
Map sitemap URLs or subdomainsys.map(...)ys.MapRequestys.MapResultyosoi map

Scrape

ys.scrape(...) accepts one URL or many URLs and one contract or many contracts. It returns a ScrapeResult envelope with one ScrapeUnitResult per URL and contract pair.

import yosoi as ys
result = await ys.scrape(
'https://qscrape.dev/l1/news/articles/',
ys.NewsArticle,
model='groq:llama-3.3-70b-versatile',
)
for unit in result.results:
print(unit.url, unit.contract, unit.status, unit.record_count)
print(unit.records)

Stable fields on each unit include selector_source, cache_decision, llm_used, llm_reason, quality_status, quality_issues, expected_record_count, record_count, records, and error.

CLI equivalent:

uvx yosoi scrape https://qscrape.dev/l1/news/articles/ \
--contract @NewsArticle \
--model groq:llama-3.3-70b-versatile \
--json

Use --request request.json when an agent or another service already produced a ScrapeRequest JSON document. Use --dump-request to inspect what the CLI will execute.

ys.search(...) wraps DDGS search and normalizes provider rows into ranked hits plus a URL list. Policy can provide durable defaults for backend, region, safesearch, page, time limit, and result count.

import yosoi as ys
result = await ys.search(
'Cascading Labs Yosoi selector discovery',
limit=5,
backend='google,bing,brave',
)
for hit in result.hits:
print(hit.rank, hit.title, hit.url)

CLI equivalent:

uvx yosoi search "Cascading Labs Yosoi selector discovery" \
--limit 5 \
--backend google,bing,brave \
--json

Crawl

ys.crawl(...) runs the crawl coordinator over one or more seeds. For machine-safe output, build a CrawlRequest and call ys.run_crawl(...).

import yosoi as ys
request = ys.CrawlRequest.from_axes(
['https://qscrape.dev/'],
limit=25,
compact=True,
policy=ys.Policy(crawl=ys.CrawlPolicy(budget=ys.CrawlBudget(max_pages=25))),
)
result = await ys.run_crawl(request)
print(result.status)
print(result.summary)

CLI equivalent:

uvx yosoi crawl https://qscrape.dev/ \
--limit 25 \
--compact \
--json

Use --stress --run-id ID to store crawl-run metrics and emit compact, stress-friendly output.

Map

ys.map(...) discovers sitemap URLs from robots.txt, default sitemap locations, and nested sitemap indexes. With subdomain discovery enabled, it shells out to subfinder and returns normalized host inventory.

import yosoi as ys
site = await ys.map('qscrape.dev', max_urls=200)
print(site.status, site.root_host)
for row in site.urls[:10]:
print(row.url)

CLI equivalent:

uvx yosoi map qscrape.dev --max-urls 200 --json
uvx yosoi map qscrape.dev --subdomains --json

ys.MapResult includes sitemaps, urls, hosts, subdomains, and errors, so it is a good source-discovery step before crawl or scrape planning.