Diagnostics

Use diagnostics to inspect extraction decisions.

Diagnostics are for development, fixture work, and bug reports. They explain which candidates were considered, which root was selected, and why an extraction was accepted or downgraded to a best attempt.

#![allow(unused)]
fn main() {
use lectito::{extract_with_diagnostics, ReadabilityOptions};

let report = extract_with_diagnostics(html, base_url, &ReadabilityOptions::default())?;
println!("{:?}", report.diagnostics.outcome);
}

ExtractionReport contains:

  • article: the extracted article, if found
  • diagnostics: details about attempts and candidate selection

Outcomes:

OutcomeMeaning
AcceptedAn attempt met char_threshold.
BestAttemptNo attempt met the threshold, but non-empty content was found.
NoContentNo useful content was found.

Each attempt records:

  • cleanup flags
  • candidate count
  • top candidates
  • entry points
  • selected root
  • cleanup counts
  • recovery counts
  • extracted text length

When a site profile or code extractor matches, diagnostics include site_rule. That record reports the matched profile or extractor, whether it was bundled, which roots were selected, how many removals ran, whether the result met char_threshold, and any fallback reason.

Start with outcome, selected_root, and text_len. If the selected root is wrong, inspect the candidate list. If the root is right but output is noisy, inspect cleanup counts and preserved classes.

CLI diagnostics:

lectito parse article.html --diagnostic-format pretty
lectito parse article.html --diagnostic-format json