Article

Article is the extraction result.

The struct is serializable and contains both content and metadata. The content fields are generated from the selected article root; metadata can come from document metadata, JSON-LD, Open Graph tags, or the extracted content itself.

#![allow(unused)]
fn main() {
pub struct Article {
    pub title: Option<String>,
    pub byline: Option<String>,
    pub dir: Option<String>,
    pub lang: Option<String>,
    pub content: String,
    pub markdown: String,
    pub text_content: String,
    pub length: usize,
    pub excerpt: Option<String>,
    pub site_name: Option<String>,
    pub published_time: Option<String>,
    pub image: Option<String>,
    pub domain: Option<String>,
    pub favicon: Option<String>,
}
}

Fields:

Field	Meaning
`title`	Best title from metadata or document content.
`byline`	Author/byline when detected.
`dir`	Text direction, such as `ltr` or `rtl`.
`lang`	Document language when detected.
`content`	Cleaned article HTML.
`markdown`	Markdown generated from `content`.
`text_content`	Plain text generated from `content`.
`length`	Character length of extracted text.
`excerpt`	Short summary or first useful paragraph.
`site_name`	Publisher or site name.
`published_time`	Publication timestamp when detected.
`image`	Lead image URL when detected.
`domain`	Source domain when available.
`favicon`	Favicon URL when detected.

content, markdown, and text_content are different views of the same extracted article. Prefer content when structure matters, markdown when the article will be displayed or edited as text, and text_content when indexing or summarizing.

Lectito.rs

Article