Configuration

Customize Lectito's extraction behavior with configuration options.

ReadabilityConfig

The ReadabilityConfig struct controls extraction parameters. Use the builder pattern:

use lectito_core::{Readability, ReadabilityConfig};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = ReadabilityConfig::builder()
        .min_score(25.0)
        .char_threshold(500)
        .nb_top_candidates(8)
        .preserve_images(true)
        .preserve_video_embeds(true)
        .build();

    let reader = Readability::with_config(config);
    let article = reader.parse("<html>...</html>")?;

    Ok(())
}

Readability Options

FieldDefaultDescription
min_score20.0Minimum score required for extraction
char_threshold500Minimum character count for strong candidates
nb_top_candidates5Number of top candidates to keep during scoring
max_elems_to_parse0Maximum number of elements to score, 0 means unlimited
remove_unlikelytrueRemove obvious chrome before scoring
keep_classesfalsePreserve class attributes in output HTML
preserve_imagestrueKeep images in extracted content
preserve_video_embedstrueKeep supported video embeds

Strict Extraction

For high-quality content only:

use lectito_core::ReadabilityConfig;

let config = ReadabilityConfig::builder()
    .min_score(30.0)
    .char_threshold(1000)
    .build();

Lenient Extraction

For short pages or difficult layouts:

use lectito_core::ReadabilityConfig;

let config = ReadabilityConfig::builder()
    .min_score(10.0)
    .char_threshold(200)
    .remove_unlikely(false)
    .build();

Text-Only Extraction

Remove images and embeds:

use lectito_core::ReadabilityConfig;

let config = ReadabilityConfig::builder()
    .preserve_images(false)
    .preserve_video_embeds(false)
    .build();

FetchConfig

Configure HTTP fetching behavior:

use lectito_core::{fetch_and_parse_with_config, FetchConfig, ReadabilityConfig};
use std::collections::HashMap;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let fetch_config = FetchConfig {
        timeout: 60,
        user_agent: "MyBot/1.0".to_string(),
        headers: HashMap::new(),
    };

    let read_config = ReadabilityConfig::builder()
        .min_score(25.0)
        .build();

    let article = fetch_and_parse_with_config(
        "https://example.com/article",
        &read_config,
        &fetch_config,
    ).await?;

    Ok(())
}

Fetch Options

FieldTypeDefaultDescription
timeoutu6430Request timeout in seconds
user_agentStringBrowser-like Lectito UAUser-Agent header value
headersHashMap<String, String>emptyExtra request headers

Default Values

use lectito_core::ReadabilityConfig;

let config = ReadabilityConfig::default();

assert_eq!(config.min_score, 20.0);
assert_eq!(config.char_threshold, 500);
assert_eq!(config.nb_top_candidates, 5);
assert_eq!(config.max_elems_to_parse, 0);
assert!(config.remove_unlikely);
assert!(!config.keep_classes);
assert!(config.preserve_images);
assert!(config.preserve_video_embeds);

Site Configuration

For sites that require custom extraction rules, use the site configuration feature:

[dependencies]
lectito-core = { version = "0.1", features = ["siteconfig"] }

Site configuration uses the FTR-style ruleset and the ConfigLoader APIs to apply per-site extraction rules.

Next Steps