Skip to content
Cascading Labs QScrape VoidCrawl Yosoi

Functions

Generated from yosoi v0.0.2a19. Only symbols in __all__ are listed.

File

File(trigger: str | None = ..., href: str | None = ..., url: str | None = ..., description: str | None = ..., allowed_types: Iterable[str] | None = ..., max_bytes: int | None = ..., kwargs: Any = {}) -> Any

attr

attr(value: str, name: str) -> SelectorEntry

check_policy

check_policy(policy: str | CrawlPolicy | Policy | None = ..., seeds: tuple[str, ...] = ...) -> PolicyCheck

claude_sdk

claude_sdk(model_name: str = ..., kwargs: Any = {}) -> ModelPolicy

crawl

crawl(seeds: str | Sequence[str], contracts: Sequence[type[Contract] | str] | type[Contract] | str | None = ..., limit: int | None = ..., policy: Policy | None = ..., fetcher_type: str | None = ..., persist: bool = ..., progress: bool | None = ..., console: Any | None = ...) -> CrawlRunSummary

css

css(value: str) -> SelectorEntry

discover

discover() -> SelectorEntry

fingerprint

fingerprint(source: object, ax_snapshot: Any = ..., headers: dict[str, str] | None = ..., endpoints: Sequence[str] | None = ...) -> PageFingerprint

global_id

global_id(value: str, name: str) -> SelectorEntry

js

js(script: str | None = None, description: str | None = None, kwargs: Any = {}) -> Any

Declare a contract field extracted by a JS program run in the live browser tab.

Two modes:

Hand-authored — provide script. The expression is evaluated as-is on every fetch. No LLM involved::

signals: dict = ys.js("(() => ({ has_alita: !!window.__alita__ }))()")

Discovery-driven — omit script, provide description. Yosoi’s :class:JsDiscoveryOrchestrator writes and verifies the script once per domain, then caches it (CAS-92)::

signals: dict = ys.js(description="Detect Alita embed and competitor widgets")

Args:

  • script str | None — JavaScript IIFE to evaluate. None triggers JS discovery.
  • description str | None — Human-readable description used by the LLM during discovery. Required when script is None.
  • **kwargs Any — Additional arguments forwarded to pydantic.Field (e.g. default, description as a pydantic field description).

Returns: Any — A pydantic FieldInfo with yosoi_action metadata.

Raises:

  • ValueError — When neither script nor description is provided.

jsonld

jsonld(value: str) -> SelectorEntry

load_urls_from_file

load_urls_from_file(filepath: str) -> list[str]

Load URLs from a file (JSON, plain text, CSV, Excel, Parquet, or Markdown). Args:

  • filepath str — Path to file containing URLs.

Returns: list[str] — List of URL strings.

Raises:

  • FileNotFoundError — If file does not exist.
  • ValueError — If file format requires unavailable dependencies.

map

map(url: str, max_sitemaps: int = ..., max_urls: int = ..., max_subdomains: int = ..., subfinder_bin: str = ..., subfinder_timeout: int = ..., include_robots: bool = ..., include_default_sitemaps: bool = ..., include_subdomains: bool = ..., discover_subdomains: bool = ...) -> MapResult

opencode

opencode(model_name: str = ..., kwargs: Any = {}) -> ModelPolicy

policy_arn

policy_arn(namespace: str, name: str) -> str

regex

regex(value: str) -> SelectorEntry

register_coercion

register_coercion(type_name: str, description: str = '', semantic: SemanticRule | None = None, config_defaults: Any = {}) -> Callable[[Callable[..., CoercedValue]], Callable[..., Any]]

Decorator that registers a coercion function and returns a Field factory.

The decorated function becomes the Field factory — its name is what you use in a Contract. The coercion logic is stored internally in the registry.

Decorator kwargs define the config schema:

  • description: default field description

  • all other kwargs: config keys that appear in json_schema_extra and are forwarded to the coerce function via config Args:

  • type_name str — The yosoi_type identifier (e.g. 'price').

  • description str — Default field description shown in manifests and to the AI.

  • semantic SemanticRule | None — Optional :class:SemanticRule describing the shape a correct value should have. Used by the discovery semantic-retry loop.

  • **config_defaults Any — Config keys with their default values. These become keyword arguments on the generated factory function.

Example::

@register_coercion('phone', description='A phone number', country_code='+1')
def PhoneNumber(v, config, source_url=None):
import re
digits = re.sub(r'\D', '', str(v))
return config.get('country_code', '+1') + digits
# PhoneNumber is now a Field factory:
# PhoneNumber(country_code='+44') -> Field(json_schema_extra={...})

resolve_contract

resolve_contract(name: str | dict[str, Any] | ContractSpec) -> type[Contract]

Resolve a contract to a Contract class.

This is the programmatic API. No fuzzy matching or file scanning is performed — those are CLI-only DX features in SchemaParamType.

Resolution order:

  1. ContractSpec / dict → rehydrate via ContractSpec.to_contract()
  2. Exact match in BUILTIN_SCHEMAS
  3. Exact match in _CONTRACT_REGISTRY (custom schemas)
  4. Dynamic import via path:ClassName Args:
  • name str | dict[str, Any] | ContractSpec — Contract name, path:ClassName string, inline ContractSpec, or dict.

Returns: type[Contract] — The resolved Contract subclass.

Raises:

  • ValueError — If no matching contract is found.

resolve_crawl_policy

resolve_crawl_policy(policy: str | CrawlPolicy | Policy | None = ...) -> CrawlPolicy

role

role(value: str, name: str, nth: int = ...) -> SelectorEntry

scrape

scrape(url: str | Sequence[str], contract: type[Contract] | str | Sequence[type[Contract] | str], model: _YosoiConfig | _LLMConfig | ModelPolicy | str | None = ..., kwargs: Any = {}) -> ScrapeResult

scrape_many

scrape_many(urls: list[str] | tuple[str, ...], contract: type[Contract] | str, model: _YosoiConfig | _LLMConfig | ModelPolicy | str | None = ..., kwargs: Any = {}) -> ScrapeResult

scrape_sync

scrape_sync(url: str, contract: type[Contract] | str, model: _YosoiConfig | _LLMConfig | ModelPolicy | str | None = ..., kwargs: Any = {}) -> ScrapeResult

search(query: str, kind: str | None = ..., provider: str | None = ..., backend: str | None = ..., region: str | None = ..., safesearch: str | None = ..., timelimit: str | None = ..., max_results: int | None = ..., limit: int | None = ..., page: int | None = ..., policy: Policy | None = ...) -> SearchResult

show

show(value: Any, format: Literal['auto', 'table', 'plain', 'json'] = ..., title: str | None = ..., console: Any = ..., fingerprint: object | bool | None = ...) -> None

visual

visual(x: float, y: float, value: str = ...) -> SelectorEntry

xpath

xpath(value: str) -> SelectorEntry