Output Formats
Lectito produces all output formats during extraction.
The formats come from the same cleaned article root. That means callers can store HTML for fidelity, use Markdown for display or editing, and use plain text for search without running extraction multiple times.
#![allow(unused)] fn main() { let article = extract(html, base_url, &ReadabilityOptions::default())?.unwrap(); let html = article.content; let markdown = article.markdown; let text = article.text_content; }
HTML
content is cleaned article HTML. Scripts, styles, navigation, sidebars, and
other page chrome are removed where possible. Relative URLs are resolved when a
base URL is provided.
Use HTML when you need the closest representation of the extracted article. It keeps images, links, tables, inline markup, and other structure that can be lost in plain text.
Markdown
markdown is generated from the cleaned article HTML. It preserves common
reader content:
- headings
- paragraphs
- links and images
- lists
- blockquotes
- code blocks
- tables
- math
- footnotes
The CLI Markdown output includes TOML frontmatter:
lectito parse article.html --format markdown
Markdown is useful when the next step is a reader view, note-taking system, static archive, or editor. It is also easier to diff in tests than HTML.
Plain Text
text_content is normalized article text. Use it for indexing, previews, and
readability checks.
Plain text should not be treated as a rendering format. It discards links, images, and most document structure.
JSON
The CLI can serialize the article:
lectito parse article.html --format json --pretty
JSON is the best CLI format when another program needs metadata and content together.