Functions
Generated from yosoi
v0.0.2a19. Only symbols in__all__are listed.
File
File(trigger: str | None = ..., href: str | None = ..., url: str | None = ..., description: str | None = ..., allowed_types: Iterable[str] | None = ..., max_bytes: int | None = ..., kwargs: Any = {}) -> Any
attr
attr(value: str, name: str) -> SelectorEntry
check_policy
check_policy(policy: str | CrawlPolicy | Policy | None = ..., seeds: tuple[str, ...] = ...) -> PolicyCheck
claude_sdk
claude_sdk(model_name: str = ..., kwargs: Any = {}) -> ModelPolicy
crawl
crawl(seeds: str | Sequence[str], contracts: Sequence[type[Contract] | str] | type[Contract] | str | None = ..., limit: int | None = ..., policy: Policy | None = ..., fetcher_type: str | None = ..., persist: bool = ..., progress: bool | None = ..., console: Any | None = ...) -> CrawlRunSummary
css
css(value: str) -> SelectorEntry
discover
discover() -> SelectorEntry
fingerprint
fingerprint(source: object, ax_snapshot: Any = ..., headers: dict[str, str] | None = ..., endpoints: Sequence[str] | None = ...) -> PageFingerprint
global_id
global_id(value: str, name: str) -> SelectorEntry
js
js(script: str | None = None, description: str | None = None, kwargs: Any = {}) -> Any
Declare a contract field extracted by a JS program run in the live browser tab.
Two modes:
Hand-authored — provide script. The expression is evaluated as-is
on every fetch. No LLM involved::
signals: dict = ys.js("(() => ({ has_alita: !!window.__alita__ }))()")Discovery-driven — omit script, provide description. Yosoi’s
:class:JsDiscoveryOrchestrator writes and verifies the script once per
domain, then caches it (CAS-92)::
signals: dict = ys.js(description="Detect Alita embed and competitor widgets")Args:
scriptstr | None— JavaScript IIFE to evaluate.Nonetriggers JS discovery.descriptionstr | None— Human-readable description used by the LLM during discovery. Required whenscriptisNone.**kwargsAny— Additional arguments forwarded topydantic.Field(e.g.default,descriptionas a pydantic field description).
Returns: Any — A pydantic FieldInfo with yosoi_action metadata.
Raises:
ValueError— When neitherscriptnordescriptionis provided.
jsonld
jsonld(value: str) -> SelectorEntry
load_urls_from_file
load_urls_from_file(filepath: str) -> list[str]
Load URLs from a file (JSON, plain text, CSV, Excel, Parquet, or Markdown). Args:
filepathstr— Path to file containing URLs.
Returns: list[str] — List of URL strings.
Raises:
FileNotFoundError— If file does not exist.ValueError— If file format requires unavailable dependencies.
map
map(url: str, max_sitemaps: int = ..., max_urls: int = ..., max_subdomains: int = ..., subfinder_bin: str = ..., subfinder_timeout: int = ..., include_robots: bool = ..., include_default_sitemaps: bool = ..., include_subdomains: bool = ..., discover_subdomains: bool = ...) -> MapResult
opencode
opencode(model_name: str = ..., kwargs: Any = {}) -> ModelPolicy
policy_arn
policy_arn(namespace: str, name: str) -> str
regex
regex(value: str) -> SelectorEntry
register_coercion
register_coercion(type_name: str, description: str = '', semantic: SemanticRule | None = None, config_defaults: Any = {}) -> Callable[[Callable[..., CoercedValue]], Callable[..., Any]]
Decorator that registers a coercion function and returns a Field factory.
The decorated function becomes the Field factory — its name is what you use in a Contract. The coercion logic is stored internally in the registry.
Decorator kwargs define the config schema:
-
description: default field description -
all other kwargs: config keys that appear in
json_schema_extraand are forwarded to the coerce function viaconfigArgs: -
type_namestr— Theyosoi_typeidentifier (e.g.'price'). -
descriptionstr— Default field description shown in manifests and to the AI. -
semanticSemanticRule | None— Optional :class:SemanticRuledescribing the shape a correct value should have. Used by the discovery semantic-retry loop. -
**config_defaultsAny— Config keys with their default values. These become keyword arguments on the generated factory function.
Example::
@register_coercion('phone', description='A phone number', country_code='+1')def PhoneNumber(v, config, source_url=None): import re digits = re.sub(r'\D', '', str(v)) return config.get('country_code', '+1') + digits
# PhoneNumber is now a Field factory:# PhoneNumber(country_code='+44') -> Field(json_schema_extra={...})resolve_contract
resolve_contract(name: str | dict[str, Any] | ContractSpec) -> type[Contract]
Resolve a contract to a Contract class.
This is the programmatic API. No fuzzy matching or file scanning is
performed — those are CLI-only DX features in SchemaParamType.
Resolution order:
- ContractSpec / dict → rehydrate via
ContractSpec.to_contract() - Exact match in BUILTIN_SCHEMAS
- Exact match in _CONTRACT_REGISTRY (custom schemas)
- Dynamic import via
path:ClassNameArgs:
namestr | dict[str, Any] | ContractSpec— Contract name,path:ClassNamestring, inline ContractSpec, or dict.
Returns: type[Contract] — The resolved Contract subclass.
Raises:
ValueError— If no matching contract is found.
resolve_crawl_policy
resolve_crawl_policy(policy: str | CrawlPolicy | Policy | None = ...) -> CrawlPolicy
role
role(value: str, name: str, nth: int = ...) -> SelectorEntry
scrape
scrape(url: str | Sequence[str], contract: type[Contract] | str | Sequence[type[Contract] | str], model: _YosoiConfig | _LLMConfig | ModelPolicy | str | None = ..., kwargs: Any = {}) -> ScrapeResult
scrape_many
scrape_many(urls: list[str] | tuple[str, ...], contract: type[Contract] | str, model: _YosoiConfig | _LLMConfig | ModelPolicy | str | None = ..., kwargs: Any = {}) -> ScrapeResult
scrape_sync
scrape_sync(url: str, contract: type[Contract] | str, model: _YosoiConfig | _LLMConfig | ModelPolicy | str | None = ..., kwargs: Any = {}) -> ScrapeResult
search
search(query: str, kind: str | None = ..., provider: str | None = ..., backend: str | None = ..., region: str | None = ..., safesearch: str | None = ..., timelimit: str | None = ..., max_results: int | None = ..., limit: int | None = ..., page: int | None = ..., policy: Policy | None = ...) -> SearchResult
show
show(value: Any, format: Literal['auto', 'table', 'plain', 'json'] = ..., title: str | None = ..., console: Any = ..., fingerprint: object | bool | None = ...) -> None
visual
visual(x: float, y: float, value: str = ...) -> SelectorEntry
xpath
xpath(value: str) -> SelectorEntry