Output Formats

Work with different output formats: Markdown, JSON, text, and HTML.

Overview

The Article struct provides several ways to render extracted content:

MethodFormatRequires Feature
to_markdown()Markdownmarkdown
to_markdown_with_config()Markdown with custom optionsmarkdown
to_json()Serialized Article JSONAlways available
to_text()Plain textAlways available
content fieldCleaned HTMLAlways available

Markdown

Convert an article to Markdown:

use lectito_core::parse;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let markdown = article.to_markdown()?;
    println!("{}", markdown);

    Ok(())
}

Markdown Configuration

Use MarkdownConfig for frontmatter, references, and image handling:

use lectito_core::{parse, MarkdownConfig};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let config = MarkdownConfig {
        include_frontmatter: true,
        include_references: true,
        strip_images: false,
        include_title_heading: true,
    };

    let markdown = article.to_markdown_with_config(&config)?;
    println!("{}", markdown);

    Ok(())
}

Frontmatter Fields

When include_frontmatter is enabled, Lectito can emit fields such as:

+++
title = "Article Title"
author = "John Doe"
date = "2025-01-17"
site = "Example"
image = "https://example.com/image.jpg"
favicon = "https://example.com/favicon.ico"
excerpt = "A brief description of the article"
word_count = 500
reading_time_minutes = 2.5
+++

JSON

Article::to_json() returns a serialized view of the article itself:

use lectito_core::parse;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let json = article.to_json()?;
    println!("{}", json);

    Ok(())
}

JSON Structure

{
  "content": "<div>Cleaned HTML content...</div>",
  "text_content": "Plain text content...",
  "metadata": {
    "title": "Article Title",
    "author": "John Doe",
    "date": "2025-01-17",
    "excerpt": "A brief description",
    "site_name": "Example",
    "language": "en"
  },
  "length": 1234,
  "word_count": 500,
  "reading_time": 2.5,
  "source_url": "https://example.com/article",
  "confidence": 0.92
}

Plain Text

Extract just the text content:

use lectito_core::parse;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let text = article.to_text();
    println!("{}", text);

    Ok(())
}

Plain text preserves the readable text content without HTML tags.

HTML

Access the cleaned HTML directly:

use lectito_core::parse;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let cleaned_html = &article.content;
    println!("{}", cleaned_html);

    Ok(())
}

The cleaned HTML:

  • removes clutter such as navigation and ads
  • keeps the main content structure
  • preserves images when preserve_images is enabled
  • preserves supported embeds when preserve_video_embeds is enabled

Choosing a Format

FormatUse Case
MarkdownBlog posts, docs, static publishing
JSONAPIs, storage, downstream processing
TextAnalysis, indexing, search
HTMLWeb display or further HTML processing