Output Formats

Work with different output formats: Markdown, JSON, text, and HTML.

Overview

The Article struct provides methods for converting to different formats:

Method	Format	Requires Feature
`to_markdown()`	Markdown with frontmatter	`markdown`
`to_json()`	Structured JSON	Always available
`to_text()`	Plain text	Always available
`content` field	Cleaned HTML	Always available

Markdown

Convert article to Markdown with YAML frontmatter:

use lectito_core::parse;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let markdown = article.to_markdown()?;
    println!("{}", markdown);

    Ok(())
}

Output Format

+++
title = "Article Title"
author = "John Doe"
published_date = "2025-01-17"
excerpt = "A brief description of the article"
word_count = 500
+++

# Article Title

Article content here...

Paragraph with **bold** and _italic_ text.

Customizing Markdown

Use MarkdownFormatter for more control:

use lectito_core::{parse, MarkdownFormatter, MarkdownConfig};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let config = MarkdownConfig {
        frontmatter: true,
        // Add more options as available
    };

    let formatter = MarkdownFormatter::new(config);
    let markdown = formatter.format(&article)?;

    println!("{}", markdown);

    Ok(())
}

JSON

Get structured JSON with all metadata:

use lectito_core::parse;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let json = article.to_json()?;
    println!("{}", json);

    Ok(())
}

JSON Structure

{
    "metadata": {
        "title": "Article Title",
        "author": "John Doe",
        "published_date": "2025-01-17",
        "excerpt": "A brief description",
        "language": "en"
    },
    "content": "<div>Cleaned HTML content...</div>",
    "text_content": "Plain text content...",
    "word_count": 500,
    "readability_score": 35.5
}

Parsing JSON

use lectito_core::parse;
use serde_json::Value;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let json = article.to_json()?;
    let value: Value = serde_json::from_str(&json)?;

    println!("Title: {}", value["metadata"]["title"]);

    Ok(())
}

Plain Text

Extract just the text content:

use lectito_core::parse;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let text = article.to_text();
    println!("{}", text);

    Ok(())
}

Output Format

Plain text includes:

Headings as lines with # prefixes
Paragraphs separated by blank lines
List items with * or 1. prefixes
No HTML tags or markdown syntax

HTML

Access the cleaned HTML directly:

use lectito_core::parse;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    // Cleaned HTML is in the `content` field
    let cleaned_html = &article.content;
    println!("{}", cleaned_html);

    Ok(())
}

HTML Characteristics

The cleaned HTML:

Removes clutter (navigation, sidebars, ads)
Keeps main content structure
Preserves images (if preserve_images is true)
Removes most scripts and styles
Maintains heading hierarchy

Choosing a Format

Format	Use Case
Markdown	Blog posts, documentation, static sites
JSON	APIs, databases, further processing
Text	Analysis, indexing, simple display
HTML	Web display, further HTML processing

Format Conversion Examples

Markdown to File

use lectito_core::parse;
use std::fs;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let markdown = article.to_markdown()?;
    fs::write("article.md", markdown)?;

    Ok(())
}

JSON for API Response

use lectito_core::parse;
use warp::Filter;

async fn extract_article(body: String) -> Result<impl warp::Reply, warp::Rejection> {
    let article = parse(&body).unwrap();
    let json = article.to_json().unwrap();
    Ok(warp::reply::json(&json))
}

Text for Analysis

use lectito_core::parse;

fn analyze_text(html: &str) -> Result<(), Box<dyn std::error::Error>> {
    let article = parse(html)?;
    let text = article.to_text();

    // Analyze word frequency
    let words: Vec<&str> = text.split_whitespace().collect();
    println!("Word count: {}", words.len());

    // Count sentences
    let sentences = text.split(&['.', '!', '?'][..]).count();
    println!("Sentence count: {}", sentences);

    Ok(())
}

HTML for Display

use lectito_core::parse;

fn display_article(html: &str) -> Result<(), Box<dyn std::error::Error>> {
    let article = parse(html)?;

    // Use in a template
    let rendered = format!(
        r#"
        <!DOCTYPE html>
        <html>
        <head>
            <title>{}</title>
        </head>
        <body>
            <article>{}</article>
        </body>
        </html>
        "#,
        article.metadata.title.unwrap_or_default(),
        article.content
    );

    Ok(())
}

Next Steps

Configuration - Advanced configuration options
Basic Usage - Core usage patterns
Concepts - Understanding the algorithm

Lectito Documentation