Skip to content
Cascading Labs QScrape VoidCrawl Yosoi

JSON Output

Set OutputPolicy(formats=...) in Python (or --output on the CLI) to persist extracted data automatically.

CLI

uvx yosoi --url https://qscrape.dev/l1/news --contract NewsArticle --output json

Combine multiple formats in one run:

uvx yosoi --url https://qscrape.dev/l1/news --contract NewsArticle --output json,csv

Python

# output.py
import asyncio
import yosoi as ys
class Article(ys.Contract):
title: str = ys.Title()
author: str = ys.Author()
async def main():
policy = ys.Policy.cascade(
ys.Policy.from_env(),
ys.Policy(output=ys.OutputPolicy(formats=('json',))),
)
for item in await ys.scrape('https://qscrape.dev/l1/news', Article, policy=policy):
print(item.get('title'))
asyncio.run(main())

Run it:

uv run python output.py

Results are written to .yosoi/content/<domain>/results.json. Multi-item pages are saved as {"items": [...]}.

Multiple Formats at Once

policy = ys.Policy.cascade(
ys.Policy.from_env(),
ys.Policy(output=ys.OutputPolicy(formats=('json', 'csv'))),
)

Supported Formats

FormatExtensionNotes
json.jsonOne file per domain
jsonl.jsonlOne JSON object per line (append-friendly)
ndjson.jsonlAlias for jsonl
csv.csvFlat tabular output
md.mdMarkdown table
xlsx.xlsxRequires openpyxl (uv add yosoi[tabular])
parquet.parquetRequires pyarrow (uv add yosoi[tabular])