Article

Article is the extraction result.

The struct is serializable and contains both content and metadata. The content fields are generated from the selected article root; metadata can come from document metadata, JSON-LD, Open Graph tags, or the extracted content itself.

#![allow(unused)]
fn main() {
pub struct Article {
    pub title: Option<String>,
    pub byline: Option<String>,
    pub dir: Option<String>,
    pub lang: Option<String>,
    pub content: String,
    pub markdown: String,
    pub text_content: String,
    pub length: usize,
    pub excerpt: Option<String>,
    pub site_name: Option<String>,
    pub published_time: Option<String>,
    pub image: Option<String>,
    pub domain: Option<String>,
    pub favicon: Option<String>,
}
}

Fields:

FieldMeaning
titleBest title from metadata or document content.
bylineAuthor/byline when detected.
dirText direction, such as ltr or rtl.
langDocument language when detected.
contentCleaned article HTML.
markdownMarkdown generated from content.
text_contentPlain text generated from content.
lengthCharacter length of extracted text.
excerptShort summary or first useful paragraph.
site_namePublisher or site name.
published_timePublication timestamp when detected.
imageLead image URL when detected.
domainSource domain when available.
faviconFavicon URL when detected.

content, markdown, and text_content are different views of the same extracted article. Prefer content when structure matters, markdown when the article will be displayed or edited as text, and text_content when indexing or summarizing.