Lectito

Lectito is a Rust library and CLI tool for extracting readable article content from HTML.

Most web pages contain way more than the text a reader came for, like ads, navigation, related links, comment areas, tracking markup, hidden elements, and presentation wrappers. Lectito tries to identify the main content root and return a smaller document that is useful for reading, storage, search, and conversion.

It returns:

cleaned article HTML
Markdown
plain text
page metadata
extraction diagnostics

Lectito is parser-first. The core API accepts HTML and an optional base URL. URL fetching exists in the CLI for convenience, but the library does not require network access.

This keeps the library usable in environments that already have HTML available: crawlers, browser extensions, desktop apps, mobile apps, tests, and offline archives.

Main APIs

#![allow(unused)]
fn main() {
use lectito::{extract, ReadabilityOptions};

let html = r#"<article><h1>Title</h1><p>Article text.</p></article>"#;
let article = extract(html, Some("https://example.com/post"), &ReadabilityOptions::default())?;

if let Some(article) = article {
    println!("{}", article.markdown);
}
Ok::<(), lectito::Error>(())
}

Use extract_with_diagnostics when tuning extraction or debugging a bad page. Use is_probably_readable before extraction when you only need a quick yes/no answer. result, options, diagnostics, and Markdown helpers rather than internal scoring or cleanup modules.

Lectito.rs

Lectito

Main APIs