Output Formats

Work with different output formats: Markdown, JSON, text, and HTML.

Overview

The Article struct provides methods for converting to different formats:

MethodFormatRequires Feature
to_markdown()Markdown with frontmattermarkdown
to_json()Structured JSONAlways available
to_text()Plain textAlways available
content fieldCleaned HTMLAlways available

Markdown

Convert article to Markdown with YAML frontmatter:

use lectito_core::parse;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let markdown = article.to_markdown()?;
    println!("{}", markdown);

    Ok(())
}

Output Format

+++
title = "Article Title"
author = "John Doe"
published_date = "2025-01-17"
excerpt = "A brief description of the article"
word_count = 500
+++

# Article Title

Article content here...

Paragraph with **bold** and _italic_ text.

Customizing Markdown

Use MarkdownFormatter for more control:

use lectito_core::{parse, MarkdownFormatter, MarkdownConfig};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let config = MarkdownConfig {
        frontmatter: true,
        // Add more options as available
    };

    let formatter = MarkdownFormatter::new(config);
    let markdown = formatter.format(&article)?;

    println!("{}", markdown);

    Ok(())
}

JSON

Get structured JSON with all metadata:

use lectito_core::parse;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let json = article.to_json()?;
    println!("{}", json);

    Ok(())
}

JSON Structure

{
    "metadata": {
        "title": "Article Title",
        "author": "John Doe",
        "published_date": "2025-01-17",
        "excerpt": "A brief description",
        "language": "en"
    },
    "content": "<div>Cleaned HTML content...</div>",
    "text_content": "Plain text content...",
    "word_count": 500,
    "readability_score": 35.5
}

Parsing JSON

use lectito_core::parse;
use serde_json::Value;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let json = article.to_json()?;
    let value: Value = serde_json::from_str(&json)?;

    println!("Title: {}", value["metadata"]["title"]);

    Ok(())
}

Plain Text

Extract just the text content:

use lectito_core::parse;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let text = article.to_text();
    println!("{}", text);

    Ok(())
}

Output Format

Plain text includes:

  • Headings as lines with # prefixes
  • Paragraphs separated by blank lines
  • List items with * or 1. prefixes
  • No HTML tags or markdown syntax

HTML

Access the cleaned HTML directly:

use lectito_core::parse;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    // Cleaned HTML is in the `content` field
    let cleaned_html = &article.content;
    println!("{}", cleaned_html);

    Ok(())
}

HTML Characteristics

The cleaned HTML:

  • Removes clutter (navigation, sidebars, ads)
  • Keeps main content structure
  • Preserves images (if preserve_images is true)
  • Removes most scripts and styles
  • Maintains heading hierarchy

Choosing a Format

FormatUse Case
MarkdownBlog posts, documentation, static sites
JSONAPIs, databases, further processing
TextAnalysis, indexing, simple display
HTMLWeb display, further HTML processing

Format Conversion Examples

Markdown to File

use lectito_core::parse;
use std::fs;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let html = "<html>...</html>";
    let article = parse(html)?;

    let markdown = article.to_markdown()?;
    fs::write("article.md", markdown)?;

    Ok(())
}

JSON for API Response

use lectito_core::parse;
use warp::Filter;

async fn extract_article(body: String) -> Result<impl warp::Reply, warp::Rejection> {
    let article = parse(&body).unwrap();
    let json = article.to_json().unwrap();
    Ok(warp::reply::json(&json))
}

Text for Analysis

use lectito_core::parse;

fn analyze_text(html: &str) -> Result<(), Box<dyn std::error::Error>> {
    let article = parse(html)?;
    let text = article.to_text();

    // Analyze word frequency
    let words: Vec<&str> = text.split_whitespace().collect();
    println!("Word count: {}", words.len());

    // Count sentences
    let sentences = text.split(&['.', '!', '?'][..]).count();
    println!("Sentence count: {}", sentences);

    Ok(())
}

HTML for Display

use lectito_core::parse;

fn display_article(html: &str) -> Result<(), Box<dyn std::error::Error>> {
    let article = parse(html)?;

    // Use in a template
    let rendered = format!(
        r#"
        <!DOCTYPE html>
        <html>
        <head>
            <title>{}</title>
        </head>
        <body>
            <article>{}</article>
        </body>
        </html>
        "#,
        article.metadata.title.unwrap_or_default(),
        article.content
    );

    Ok(())
}

Next Steps