Introduction

Beacon is an experimental Python type checker and developer experience platform written in Rust. This documentation set describes the architecture, design decisions, and research that power the project. Whether you are contributing to the codebase, evaluating the language server, or exploring the type system, start here to orient yourself.

Core Capabilities

Beacon provides a complete LSP-based, type-safe development environment for Python:

Type System

  • Hindley-Milner type inference with automatic generalization
  • Type narrowing through pattern matching and control flow
  • Protocol satisfaction with variance checking
  • Gradual typing compatibility

Code Intelligence

  • Real-time diagnostics for syntax, semantic, and type errors
  • Hover tooltips with inferred types and builtin documentation
  • Smart completions using symbol table analysis
  • Go to definition and find all references
  • Workspace and document symbol search with fuzzy matching

Refactoring & Code Actions

  • Symbol renaming with workspace-wide validation
  • Quick fixes for common issues (unused imports, Optional types, pattern completions)
  • Protocol method implementation assistance
  • Type annotation insertion from inferred types

Editor Integration

  • VS Code and Zed extensions with full feature support
  • Compatible with any LSP client (Neovim, Helix, etc.)
  • Semantic token highlighting and inlay hints
  • Fast incremental analysis with multi-layer caching

What You'll Find

  • LSP Overview: A deep dive into our Language Server Protocol implementation, including its goals, building blocks, and feature set.
  • Type System Research: Summaries of the academic and practical references influencing Beacon’s approach to Hindley–Milner inference, gradual typing, and structural subtyping.
  • Contributor Guides (planned): Setup instructions, style guidelines, and workflows for building and testing Beacon.

Project Vision

Beacon aims to combine precise type checking with interactive tooling that stays responsive for everyday Python development. The project embraces:

  • Fast feedback loops enabled by incremental analysis.
  • Interoperability with modern editors via LSP.
  • A pragmatic blend of theoretical rigor and implementable engineering.

Getting Started

  1. Clone the repository and install Rust 1.70+ (stable).
  2. Run cargo check from the workspace root to verify the build.
  3. Launch the LSP server with cargo run -p beacon-lsp or integrate with an editor using the provided configuration (see the LSP chapter).
  4. Browse the documentation sidebar for in-depth topics.

Contributing

We welcome pull requests and discussions. To get involved:

  • Review open issues
  • Read the upcoming contributor guide (work in progress).
  • Join the conversation in our community channels (details to be added).

Beacon is evolving quickly; expect iteration, experimentation, and plenty of opportunities to help shape the future of type checking for Python.

Configuration

Beacon LSP can be configured through TOML files for standalone usage or through your editor's settings when using an extension.

Configuration Files

Beacon searches for configuration in the following order:

  1. beacon.toml in your workspace root
  2. [tool.beacon] section in pyproject.toml

If multiple configuration files are found, beacon.toml takes precedence.

TOML Structure

Beacon configuration uses TOML sections to organize related settings:

[type_checking]
mode = "balanced"

[python]
version = "3.12"
stub_paths = ["stubs", "typings"]

[workspace]
source_roots = ["src", "lib"]
exclude_patterns = ["**/venv/**"]

[inlay_hints]
enable = true
variable_types = true

[diagnostics]
unresolved_imports = "warning"
circular_imports = "warning"

[formatting]
enabled = true
line_length = 88
quote_style = "double"
trailing_commas = "multiline"

[advanced]
incremental = true
cache_size = 100

Configuration Options

Type Checking

Configure type checking behavior under the [type_checking] section.

mode

Type checking strictness mode. Controls how the type checker handles annotation mismatches and inference.

  • Type: string
  • Default: "balanced"
  • Values:
    • "strict": Annotation mismatches are hard errors with strict enforcement
    • "balanced": Annotation mismatches are diagnostics with quick fixes, but inference proceeds
    • "relaxed": Annotations supply bounds but can be overridden by inference
[type_checking]
mode = "balanced"

Python

Configure Python-specific settings under the [python] section.

version

Target Python version for feature support (e.g., pattern matching in 3.10+, PEP 695 syntax in 3.12+).

  • Type: string
  • Default: "3.12"
  • Values: "3.9", "3.10", "3.11", "3.12", "3.13"
[python]
version = "3.12"

stub_paths

Additional paths to search for .pyi stub files.

  • Type: array of strings
  • Default: ["stubs"]
[python]
stub_paths = ["stubs", "typings", "~/.local/share/python-stubs"]

Workspace

Configure workspace settings under the [workspace] section.

source_roots

Source roots for module resolution in addition to workspace root.

  • Type: array of strings
  • Default: []
[workspace]
source_roots = ["src", "lib"]

exclude_patterns

Glob patterns to exclude from workspace scanning.

  • Type: array of strings
  • Default: []
[workspace]
exclude_patterns = ["**/venv/**", "**/.venv/**", "**/node_modules/**"]

Inlay Hints

inlay_hints.enable

Master toggle for all inlay hints.

  • Type: boolean
  • Default: true
[inlay_hints]
enable = true

inlay_hints.variable_types

Show inlay hints for inferred variable types on assignments without explicit type annotations.

  • Type: boolean
  • Default: true
[inlay_hints]
variable_types = true

inlay_hints.function_return_types

Show inlay hints for inferred function return types on functions without explicit return type annotations.

  • Type: boolean
  • Default: true
[inlay_hints]
function_return_types = true

inlay_hints.parameter_names

Show inlay hints for parameter names in function calls to improve readability.

  • Type: boolean
  • Default: false
[inlay_hints]
parameter_names = false

Diagnostics

Configure diagnostic severity levels under the [diagnostics] section.

unresolved_imports

Diagnostic severity level for imports that cannot be resolved.

  • Type: string
  • Default: "warning"
  • Values: "error", "warning", "info"
[diagnostics]
unresolved_imports = "warning"

circular_imports

Diagnostic severity level for circular import dependencies.

  • Type: string
  • Default: "warning"
  • Values: "error", "warning", "info"
[diagnostics]
circular_imports = "warning"

Formatting

Configure code formatting behavior under the [formatting] section. Beacon provides PEP8-compliant formatting through the LSP.

formatting.enabled

Master toggle for code formatting.

  • Type: boolean
  • Default: true
[formatting]
enabled = true

formatting.line_length

Maximum line length before wrapping.

  • Type: integer
  • Default: 88 (Black-compatible)
  • Range: 20-200
[formatting]
line_length = 88

formatting.indent_size

Number of spaces per indentation level.

  • Type: integer
  • Default: 4
  • Range: 2-8
[formatting]
indent_size = 4

formatting.quote_style

String quote style preference.

  • Type: string
  • Default: "double"
  • Values:
    • "single": Use single quotes for strings
    • "double": Use double quotes for strings
    • "preserve": Keep existing quote style
[formatting]
quote_style = "double"

formatting.trailing_commas

Trailing comma behavior in multi-line structures.

  • Type: string
  • Default: "multiline"
  • Values:
    • "always": Add trailing commas to all multi-line structures
    • "multiline": Add trailing commas only to multi-line nested structures
    • "never": Never add trailing commas
[formatting]
trailing_commas = "multiline"

formatting.max_blank_lines

Maximum consecutive blank lines allowed.

  • Type: integer
  • Default: 2
  • Range: 0-5
[formatting]
max_blank_lines = 2

formatting.import_sorting

Import statement sorting style.

  • Type: string
  • Default: "pep8"
  • Values:
    • "pep8": stdlib, third-party, local
    • "isort": isort-compatible sorting
    • "off": Disable import sorting
[formatting]
import_sorting = "pep8"

formatting.compatibility_mode

Compatibility with other Python formatters.

  • Type: string
  • Default: "black"
  • Values:
    • "black": Black formatter compatibility (88 char line length)
    • "autopep8": autopep8 compatibility (79 char line length)
    • "pep8": Strict PEP8 (79 char line length)
[formatting]
compatibility_mode = "black"

formatting.use_tabs

Use tabs instead of spaces for indentation (not recommended).

  • Type: boolean
  • Default: false
[formatting]
use_tabs = false

formatting.normalize_docstring_quotes

Normalize quotes in docstrings to match quote_style.

  • Type: boolean
  • Default: true
[formatting]
normalize_docstring_quotes = true

formatting.spaces_around_operators

Add spaces around binary operators.

  • Type: boolean
  • Default: true
[formatting]
spaces_around_operators = true

formatting.blank_line_before_class

Add blank lines before class definitions.

  • Type: boolean
  • Default: true
[formatting]
blank_line_before_class = true

formatting.blank_line_before_function

Add blank lines before function definitions.

  • Type: boolean
  • Default: true
[formatting]
blank_line_before_function = true

Advanced Options

Configure advanced performance and analysis settings under the [advanced] section.

max_any_depth

Maximum depth for Any type propagation before elevating diagnostics. Higher values are more permissive.

  • Type: integer
  • Default: 3
  • Range: 0-10
[advanced]
max_any_depth = 3

incremental

Enable incremental type checking for faster re-analysis.

  • Type: boolean
  • Default: true
[advanced]
incremental = true

workspace_analysis

Enable workspace-wide analysis and cross-file type checking.

  • Type: boolean
  • Default: true
[advanced]
workspace_analysis = true

enable_caching

Enable multi-layer caching of parse trees, type inference results, and formatting outputs. Caching dramatically improves performance for incremental edits and repeated operations.

Beacon uses four cache layers:

  • TypeCache: Node-level type inference (capacity: 100)
  • ScopeCache: Scope-level analysis with content hashing (capacity: 200)
  • AnalysisCache: Document-level analysis artifacts (capacity: 50)
  • IntrospectionCache: Persistent Python introspection (capacity: 1000)

When enabled, Beacon automatically invalidates stale cache entries when documents change. Scope-level content hashing ensures only modified scopes are re-analyzed.

  • Type: boolean
  • Default: true
[advanced]
enable_caching = true

For technical details on cache architecture and invalidation strategies, see Caching.

cache_size

Maximum number of documents to cache in the document-level analysis cache. Higher values improve performance for large workspaces at the cost of memory usage.

  • Type: integer
  • Default: 100
  • Range: 0-1000
[advanced]
cache_size = 100

Example Configurations

Basic Configuration (beacon.toml)

[type_checking]
mode = "strict"

[python]
version = "3.12"

[diagnostics]
unresolved_imports = "error"
circular_imports = "warning"

Advanced Configuration (beacon.toml)

[type_checking]
mode = "balanced"

[python]
version = "3.13"
stub_paths = ["stubs", "typings"]

[workspace]
source_roots = ["src", "lib"]
exclude_patterns = ["**/venv/**", "**/.venv/**", "**/build/**"]

[inlay_hints]
enable = true
variable_types = true
function_return_types = true
parameter_names = false

[diagnostics]
unresolved_imports = "warning"
circular_imports = "info"

[formatting]
enabled = true
line_length = 100
indent_size = 4
quote_style = "double"
trailing_commas = "multiline"
import_sorting = "pep8"

[advanced]
max_any_depth = 5
incremental = true
workspace_analysis = true
enable_caching = true
cache_size = 200

Using pyproject.toml

[tool.beacon.type_checking]
mode = "strict"

[tool.beacon.python]
version = "3.12"
stub_paths = ["stubs", "typings"]

[tool.beacon.workspace]
source_roots = ["src"]
exclude_patterns = ["**/venv/**", "**/.venv/**"]

[tool.beacon.diagnostics]
unresolved_imports = "error"

[tool.beacon.formatting]
enabled = true
line_length = 88
quote_style = "double"
trailing_commas = "multiline"

Configuration Precedence

When using Beacon with an editor extension (e.g., VSCode), configuration is merged in the following order (later sources override earlier ones):

  1. Default values - Built-in defaults
  2. TOML file - beacon.toml or pyproject.toml
  3. Editor settings - VSCode settings, Zed settings, Neovim config, etc.

This allows you to set project-wide defaults in TOML while still being able to override specific settings through your editor.

Beacon Language Server

Beacon's Language Server Protocol (LSP) implementation bridges the Rust-based analyzer with editors such as Zed, VSCode/VSCodium, Neovim, and Helix. This chapter documents the system from high-level goals to feature-by-feature behaviour.

LSP Capabilities Quick Reference

Beacon implements the following LSP features:

  • Diagnostics: Real-time syntax, semantic, and type error reporting
  • Hover: Context-sensitive type information and documentation
  • Completion: Symbol table-based completions
  • Navigation: Go to definition, find references, document highlights
  • Symbols: Document outline and workspace fuzzy search
  • Semantic tokens and inlay hints
  • Refactoring: Rename, code actions, quick fixes

See Feature Providers for detailed implementation.

Documentation Overview

Use the sidebar to jump into any topic, or start with the sections below:

  • Goals And Scope - what the server delivers today and what is intentionally out of scope.
  • Architecture Overview - how shared state, concurrency, and feature wiring are structured.
  • Document Pipeline - how file contents become parse trees, ASTs, and symbol tables.
  • Caching - multi-layer cache architecture and invalidation strategies for fast incremental updates.
  • Feature Providers - the capabilities exposed via LSP requests and notifications.
  • Request Lifecycles - end-to-end flows for initialization, diagnostics, completions, and more.
  • Workspace Services - cross-file features and emerging workspace indexing plans.
  • Testing Strategy - automated coverage for providers and backend flows.
  • Current Limitations - known gaps and trade-offs in the current implementation.
  • Next Steps - near-term improvements on the roadmap.

If you are new to the Language Server Protocol itself, read the primer in Learn → Language Server Protocol before diving into these implementation details.

Goals

Deliver an incremental, performant, pragmatic Hindley-Milner (HM) type inference and checking engine for Python that integrates with modern editor tooling via the Language Server Protocol (LSP). The system should support Python’s dynamic features thoughtfully, interoperate with typing hints, and scale to multi-file projects.

Why HM for Python?

HM type systems provide principled inference (no annotations required), compositional reasoning, strong guarantees, & fast unification-based algorithms (Algorithm W family).

Challenges

  • Pervasive dynamism (monkey-patching, __getattr__, metaclasses, duck typing, runtime reflection),
  • Nominal & structural patterns mixed
  • Subtyping-ish expectations (None, unions, protocols)
  • First-class classes & functions
  • Decorators
  • Generators
  • Async
  • Pattern matching (PEP 634).

Design

HM core + pragmatic extensions, with a gradual boundary to accommodate Python idioms and annotations:

HM for expressions and local bindings.

Controlled subtyping-like features via union/optionals and protocols/structural constraints.

Annotation-aware: treat PEP 484/PEP 695 types as constraints and hints.

Soundness modes: "strict", "balanced", "relaxed" (affecting treatment of Any, unknown attributes, dynamic imports).

               ┌──────────────────────────────────────────────────┐
               │                  LSP Frontend                    │
               │ (tower-lsp or custom) using lsp-types for models │
               └───────────────▲───────────────────────▲──────────┘
                               │                       │
                     Requests / Notifications    Diagnostics, hovers
                               │                       │
┌──────────────────────────────┼───────────────────────┼────────────────────────────┐
│                          Language Server Core                                     │
│  ┌───────────────────────┐  ┌──────────────────────┐  ┌────────────────────────┐  │
│  │   Document Manager    │  │    Project Graph     │  │     Incremental Index  │  │
│  │ (text, versions, TS   │  │ (imports, deps,      │  │ (symbols, stubs,       │  │
│  │  parse trees)         │  │  module cache)       │  │  types, caches)        │  │
│  └──────────▲────────────┘  └──────────▲───────────┘  └──────────▲─────────────┘  │
│             │                           │                          │              │
│     ┌───────┴────────┐        ┌─────────┴──────────┐      ┌────────┴────────────┐ │
│     │  Tree-sitter   │        │  Constraint Gen    │      │   Solver / Types    │ │
│     │  Parser (Py)   │        │ (walk TS AST,      │      │ (unification,       │ │
│     │  + lossless    │        │  produce HM +      │      │  polymorphism,      │ │
│     │  syntax facts) │        │  extensions)       │      │  row/structural)    │ │
│     └───────▲────────┘        └─────────▲──────────┘      └────────▲────────────┘ │
│             │                           │                          │              │
│             └─────── Source -> AST  ────┴── Constraints ───────────┘              │
└───────────────────────────────────────────────────────────────────────────────────┘

LSP Implementation Goals

Beacon's LSP focuses on delivering a fast, editor-friendly surface for the Beacon analyzer without overcommitting to unfinished infrastructure. The current goals fall into five themes.

Primary Goals

Immediate feedback: run parsing and type analysis on every edit so diagnostics stay in sync with the buffer.

Core navigation: support hover, go-to-definition, references, and symbol search for rapid code exploration.

Authoring assistance: provide completions, document symbols, inlay hints, and semantic tokens to guide editing.

Refactoring primitives: offer reliable rename support and lay the groundwork for richer code actions.

Modular design: isolate feature logic behind provider traits so contributors can evolve features independently.

Out-of-Scope (For Now)

  • Full workspace indexing: we limit operations to open documents until indexing and cache management mature.
  • Formatting and linting: formatting endpoints and lint integrations are planned but not part of the initial release.
  • Editor-specific UX: we stick to LSP-standard capabilities instead of bespoke VS Code UI components.

Architecture Overview

The language server lives in crates/server and centres on the Backend type, which implements tower_lsp::LanguageServer. The architecture is deliberately modular so feature work and analyzer development can proceed in parallel.

Core Components

  • Backend: receives every LSP request/notification and routes it to feature providers. It owns the shared state required by multiple features.
  • Client (tower_lsp::Client): handles outbound communication, including diagnostics, logs, and custom notifications.
  • DocumentManager: thread-safe cache of open documents. Each Document stores:
    • Source text (ropey::Rope for cheap edits).
    • Tree-sitter parse tree.
    • Beacon AST.
    • Symbol table produced by the name resolver.
  • Analyzer: the Beacon type checker wrapped in an Arc<RwLock<_>> because many features need mutable access to its caches.
  • Workspace: tracks the workspace root URI and will later manage module resolution and indexing.
  • Features: a simple struct that instantiates each provider with shared dependencies and exposes them to the backend.

Concurrency Model

tower_lsp::LspService drives the backend on the Tokio runtime.

Read-heavy operations borrow documents or analyzer state immutably; diagnostics and rename take write locks to update caches.

Documents store text in a ropey::Rope, so incremental edits only touch the modified spans.

Error Handling

Feature methods typically return Option<T>: None means the feature has no answer for the request rather than hard-failing.

When unrecoverable errors occur (e.g., document not found), providers log via the client instead of crashing the server process.

Extensibility

Adding a new LSP method involves creating a provider (or extending an existing one) and exposing it through the Features struct.

Because providers depend only on DocumentManager and optionally the analyzer, they are easy to test in isolation.

This architecture keeps protocol plumbing concentrated in the backend while feature logic stays modular and testable.

Document Pipeline

The document pipeline keeps Beacon’s view of each open file synchronized with the editor. DocumentManager orchestrates the lifecycle and ensures every feature works from the same parse tree, AST, and symbol table.

Lifecycle Events

  1. Open (textDocument/didOpen)
    • Create a Document with the initial text, version, and URI.
    • Parse immediately via LspParser to populate the parse tree, AST, and symbol table.
    • Insert the document into the manager’s map.
  2. Change (textDocument/didChange)
    • Apply full or incremental edits to the document’s rope.
    • Re-run the parser to refresh derived data.
    • Invalidate analyzer caches so diagnostics and semantic queries recompute with fresh information.
  3. Save (textDocument/didSave)
    • Trigger diagnostics for the new persisted content. Behaviour matches the change handler today.
  4. Close (textDocument/didClose)
    • Remove the document and send an empty diagnostics array to clear markers in the editor.

Data Stored per Document

Text: stored as a ropey::Rope for efficient splicing.

Parse tree: Tree-sitter syntax tree produced by the parser.

AST: Beacon’s simplified abstract syntax tree used by features and the analyzer.

Symbol table: scope-aware mapping created during name resolution.

Version: latest client-supplied document version, echoed back when publishing diagnostics.

Access Patterns

get_document: exposes an immutable snapshot to consumers like hover or completion.

get_document_mut: allows controlled mutation when necessary (rare in practice).

all_documents: lists URIs so workspace-level features can iterate through open files.

By centralizing parsing and symbol management, the pipeline guarantees consistent snapshots across diagnostics, navigation, and refactoring features.

Cache Architecture

Beacon uses a multi-layer caching system to minimize redundant analysis while maintaining correctness.

Cache Layers

The system provides four specialized cache layers, each optimized for different granularities:

TypeCache (Node-Level)

Caches inferred types for specific AST nodes. Each entry maps (uri, node_id, version) to a Type.

Capacity: 100 entries (default)

Eviction: LRU

Use case: Hover requests, completion suggestions, and other features that need type information for a specific node.

ScopeCache (Scope-Level)

Provides granular incremental re-analysis at scope level rather than document level. When only a single function changes in a large file, unchanged scopes retain their cached analysis results.

Cache key: (uri, scope_id, content_hash)

Content hashing: Uses DefaultHasher to compute a deterministic hash of the scope's source text. Different content produces different hashes, enabling precise change detection.

Cached data:

  • type_map: inferred types for nodes within the scope
  • position_map: mapping from source positions to node IDs
  • dependencies: scopes this scope depends on (parent, referenced scopes)

Capacity: 200 entries (default)

Eviction: LRU

Statistics: Tracks hits/misses for performance monitoring.

Use case: Type checking, diagnostics, and semantic analysis that can reuse results from unchanged scopes.

AnalysisCache (Document-Level)

Caches complete analysis artifacts per document version. Each entry maps (uri, version) to full analysis results including type maps, position maps, type errors, and static analysis findings.

Cached data:

  • Complete type maps
  • Position maps
  • Type errors
  • Static analysis results

Capacity: 50 entries (default)

Eviction: LRU

Version-based invalidation: New document versions automatically create new cache entries rather than invalidating existing ones.

Use case: Publishing diagnostics, workspace-wide queries, and features that need complete document analysis.

IntrospectionCache (Persistent)

Caches Python introspection results for external modules and the standard library. Persists to disk in .beacon-cache/introspection.json to survive server restarts.

Cached data:

  • Function signatures
  • Docstrings
  • Module metadata

Capacity: 1000 entries

Eviction: LRU (in-memory), write-through to disk

Use case: Hover information for stdlib and third-party modules, completion for imported symbols.

Content Hashing Validation

ScopeCache uses content hashing to detect changes with high precision:

Hash computation:

#![allow(unused)]
fn main() {
let mut hasher = DefaultHasher::new();
source_content.hash(&mut hasher);
let content_hash = hasher.finish();
}

Properties:

  • Deterministic: same content always produces the same hash
  • Whitespace-sensitive: x = 1 and x=1 produce different hashes
  • Collision-resistant: sufficient for cache validation

Validation: Cache lookups compare the computed content hash against the cached key. Mismatches result in cache misses, forcing re-analysis of the modified scope.

Invalidation Strategies

Version-Based Invalidation

TypeCache checks document version on every access. If the document version differs from the cached entry's version, the entry is treated as stale.

AnalysisCache embeds version in the cache key, so new versions naturally create new entries without explicit invalidation.

Content-Based Invalidation

ScopeCache compares content hashes. When a scope's source changes:

  1. Compute new content hash from updated source
  2. Look up cache with new key
  3. Cache miss if hash differs
  4. Re-analyze and insert with new hash

Explicit Invalidation

CacheManager provides methods to invalidate specific scopes or entire documents:

invalidate_document: Removes all cache entries for a URI across all layers.

invalidate_scope: Removes entries for a specific scope from ScopeCache.

invalidate_selective: Invalidates specific scopes and returns the set of affected URIs for cascade invalidation.

Cascade Invalidation

When a scope changes, dependent scopes may also need invalidation. ImportDependencyTracker maintains a dependency graph to determine which scopes reference the changed scope, enabling selective cascade invalidation without over-invalidating.

Cache Coordination

CacheManager unifies all cache layers and coordinates invalidation:

On document change:

  1. Identify changed scopes by comparing content hashes
  2. Invalidate changed scopes in ScopeCache
  3. Clear document-level entries in AnalysisCache for the affected URI
  4. Query dependency tracker to find dependent scopes
  5. Invalidate dependents selectively
  6. TypeCache entries naturally become stale via version mismatch

On document close:

  • Remove all cache entries for the URI
  • Persist IntrospectionCache to disk

Performance Characteristics

Cache hit rates directly impact analysis latency:

Cold cache (first analysis): Full analysis required for all scopes.

Warm cache, no changes: All scopes hit, near-instant response.

Warm cache, localized change: Only changed scopes and dependents miss, dramatic speedup for large files.

ScopeCache statistics provide hit rate monitoring:

#![allow(unused)]
fn main() {
let stats = cache_manager.scope_cache_stats();
println!("Hit rate: {:.2}%", stats.hit_rate);
}

Formatter Cache

The formatter uses a separate two-level cache optimized for formatting requests:

Short-Circuit Cache

Maps (source_hash, config_hash) to unit. Detects already-formatted code in O(1) time, avoiding redundant formatting operations.

Result Cache

Maps (source_hash, config_hash, start_line, end_line) to formatted output. Reuses formatting results for identical source and configuration.

Capacity: 100 entries (default) per layer

Eviction: LRU

Use case: Format-on-save, range formatting, and editor-initiated format requests.

Feature Providers

Each capability exposed by the language server lives in its own provider under crates/server/src/features. Providers share the DocumentManager and, when needed, the analyzer.

Diagnostics

DiagnosticProvider aggregates:

  • Parse errors emitted by the parser.
  • Unbound variable checks.
  • Type errors and warnings from the analyzer.
  • Additional semantic warnings (e.g., annotation mismatches).

Results are published with document versions to prevent stale diagnostics in the editor.

Hover

HoverProvider returns context-sensitive information for the symbol under the cursor—typically inferred types or documentation snippets. It reads the current AST and analyzer output to assemble Hover responses.

The hover system integrates with the builtin documentation and dunder metadata modules to provide rich information for Python's standard types and magic methods.

Completion

CompletionProvider uses symbol tables to surface in-scope identifiers. Trigger characters (currently ".") allow editors to request completions proactively.

GotoDefinitionProvider locates definitions using symbol table lookups.

ReferencesProvider returns all occurrences of a symbol across open documents.

DocumentHighlightProvider highlights all occurrences of a symbol within a single file when the cursor is positioned on it. The provider walks the AST to identify and classify occurrences:

  • Variables: marked as READ or WRITE based on context (assignments are WRITE, usage is READ)
  • Function names: highlighted in both definitions and call sites
  • Function parameters: highlighted in both the parameter list and within the function body
  • Class members: highlighted across the class definition

Symbols

DocumentSymbolsProvider walks the AST to produce hierarchical outlines (classes, functions, variables).

WorkspaceSymbolsProvider scans all open documents, performing case-insensitive matching with fuzzy search scoring. It falls back to sensible defaults when nested symbols are missing from the symbol table. The provider supports lazy symbol resolution for LSP clients that request location details on-demand.

Semantic Enhancements

SemanticTokensProvider projects syntax nodes into semantic token types and modifiers, enabling advanced highlighting.

InlayHintsProvider emits type annotations or other inline hints derived from the analyzer.

Refactoring

RenameProvider validates proposed identifiers, gathers edits via both AST traversal and Tree-sitter scans, deduplicates overlapping ranges, and returns a WorkspaceEdit.

Code Actions

CodeActionsProvider provides quick fixes and refactoring actions:

Quick Fixes:

  • Removing unused variables and imports
  • Wrapping types with Optional for None-related type errors
  • Automatically adding from typing import Optional when needed
  • Adding missing pattern cases in match statements
  • Removing unreachable pattern cases
  • Implementing missing protocol methods for built-in protocols (Iterable, Iterator, Sized, Callable, Sequence, Mapping)

Refactorings:

  • Inserting type annotations from inferred types on variable assignments
  • Add missing imports for undefined symbols (coming soon!)
  • Extract to function/method refactorings (coming soon!)
  • Inline variable refactorings (coming soon!)

Support Modules

The features system includes specialized support modules:

builtin_docs provides embedded documentation for Python built-in types (str, int, list, dict, etc.). Documentation is loaded from JSON at compile time and includes descriptions, common methods, and links to official Python documentation.

dunders supplies metadata and documentation for Python's magic methods (__init__, __str__, etc.) and builtin variables (__name__, __file__, etc.).

Adding new features typically means introducing a provider that consumes DocumentManager, optionally the analyzer, and wiring it through the Features struct so the backend can route requests.

Request Lifecycles

This section traces how the server handles key LSP interactions from start to finish.

Initialization

  1. initialize request
    • Captures the workspace root (root_uri) from the client.
    • Builds ServerCapabilities, advertising supported features: incremental sync, hover, completion, definitions, references, highlights, code actions, inlay hints, semantic tokens (full & range), document/workspace symbols, rename, and workspace symbol resolve.
    • Returns InitializeResult with optional ServerInfo.
  2. initialized notification
    • Currently logs an info message. Future work will kick off workspace scanning or indexing.

Text Synchronization & Diagnostics

didOpen → store the document, parse it, and call publish_diagnostics.

didChange → apply edits, reparse, invalidate analyzer caches, then re-run diagnostics.

didSave → trigger diagnostics again; behaviour matches the change handler.

didClose → remove the document and publish empty diagnostics to clear markers.

publish_diagnostics collects issues via DiagnosticProvider, tagging them with the current document version to avoid race conditions.

Hover, Completion, and Navigation

hover → query HoverProvider, which reads the AST and analyzer to produce Hover content.

completion → call CompletionProvider, returning a CompletionResponse (list or completion list).

gotoDefinition, typeDefinition, references, documentHighlight → use symbol table lookups to answer navigation requests.

These operations are pure reads when possible, avoiding locks beyond short-lived document snapshots.

Symbols

documentSymbol → returns either DocumentSymbol trees or SymbolInformation lists.

workspace/symbol → aggregates symbols from every open document, performing case-insensitive matching.

workspaceSymbol/resolve → currently a no-op passthrough

Semantic Tokens & Inlay Hints

textDocument/semanticTokens/full and /range → run the semantic tokens provider to emit delta-encoded token sequences for supported types/modifiers.

textDocument/inlayHint → acquire a write lock on the analyzer and compute inline hints for the requested range.

Refactoring

textDocument/rename → validate the new identifier, locate the target symbol, collect edits (AST traversal + Tree-sitter identifiers), deduplicate, and return a WorkspaceEdit.

textDocument/codeAction → placeholder; currently returns an empty list until specific actions are implemented.

Shutdown

shutdown returns Ok(()), signalling graceful teardown.

exit follows to terminate the process. We do not persist state yet, so shutdown is effectively stateless.

Workspace Services

While most features operate on individual documents, Beacon’s language server already supports several cross-file capabilities and is laying groundwork for broader workspace awareness.

Workspace Symbols

Iterates over URIs retrieved from DocumentManager::all_documents.

For each document, fetches the AST and symbol table, then performs case-insensitive matching against the query string.

Returns SymbolInformation with ranges, optional container names, and deprecation tags (SymbolTag::DEPRECATED where applicable).

Falls back to reasonable defaults when nested symbols (e.g., class methods) are missing from the symbol table.

Document Symbols

Provides structured outlines per file, organising classes, functions, assignments, and nested items.

Editors use the resulting tree to populate outline panes, breadcrumbs, or navigation search.

Workspace State

  • The Workspace struct records the root_uri supplied during initialization.

Notifications and Logging

The backend emits window/logMessage notifications for status updates and window/showMessage for user-facing alerts.

Diagnostics are republished after changes so editors update their inline markers and problems panels.

Long-Term Plans

Implement persistent symbol indexing keyed by the workspace root.

Add background tasks that refresh indexes when files change on disk.

Support multi-root workspaces and remote filesystems where applicable.

Although the current implementation focuses on open buffers, the architecture is designed to scale to full-project workflows as these enhancements land.

PyDoc Retrieval

The language server enriches hover and completion items for third-party Python packages by executing a short-lived Python subprocess to read real docstrings and signatures from the user's environment.

Interpreter Discovery

find_python_interpreter in crates/server/src/interpreter.rs walks common virtual environment managers (Poetry, Pipenv, uv) before falling back to python on the PATH. Each probe shells out (poetry env info -p, pipenv --venv, uv python find) and returns the interpreter inside the virtual environment when successful. The search runs per workspace and only logs at debug level on success. Missing tools or failures are tolerated—only a final warn! is emitted if no interpreter can be located. Interpreter lookups currently rely on external commands and inherit their environment; this will eventually be an explicit path via LSP settings.

Introspection Flow

When a hover needs documentation for module.symbol, we call introspect in crates/server/src/introspection.rs with the discovered interpreter. introspect constructs a tiny Python script that imports the target module, fetches the attribute, and prints two sentinel sections: SIGSTART (signature) and DOCSTART (docstring). The async path spawns tokio::process::Command, while introspect_sync uses std::process::Command. Both share parsing logic via parse_introspection_output. The script uses inspect.signature and inspect.getdoc, so it respects docstring inheritance and returns cleaned whitespace. Failures to inspect still return whatever data is available.

Parsing and Error Handling

Results are parsed by scanning for the sentinel lines and trimming the sections, yielding an IntrospectionResult { signature, docstring }. Timeouts (3 seconds) protect the async path from hanging interpreters. Other errors—missing module, attribute, or import failure—come back as IntrospectionError::ExecutionFailed with the stderr payload for debugging. We log subprocess stderr on failure but avoid surfacing internal exceptions directly to the client.

Testing Guarantees

Unit tests cover the parser, confirm the generated script embeds the sentinels, and run best-effort smoke tests against standard library symbols when a Python interpreter is available. Tests skip gracefully if Python cannot be located, keeping CI green on machines without Python.

Static Analyzer

Beacon's language server leans on a modular static-analysis stack housed in crates/server/src/analysis. The subsystem ingests a parsed document, infers types, builds control-flow graphs, runs pattern exhaustiveness checks, and produces diagnostics that drive editor features like hovers and squiggles.

Pipeline Overview

Analyzer::analyze is the high-level orchestration point:

  1. Grab a consistent AST + symbol table snapshot from the DocumentManager.
  2. Check analysis cache for a previously computed result at this document version.
  3. Extract scopes and check scope-level cache for incremental analysis opportunities.
  4. Walk the tree to emit lightweight constraints describing how expressions relate (equality, calls, attributes, protocols, patterns).
  5. Build a class registry containing metadata for all class definitions (fields, methods, protocols, inheritance).
  6. Invoke the shared beacon_core unifier to solve constraints, capturing any mismatches as TypeErrorInfo.
  7. Apply the resulting substitution to refine all inferred types in the type map.
  8. Build function-level control-flow graphs and run data-flow passes to uncover use-before-def, unreachable code, and unused symbols.
  9. Package the inputs, inferred data, and diagnostics into an AnalysisResult, caching at both scope and document level for quick repeat lookups.

The analyzer produces a type_map linking AST node IDs to inferred types and a position_map linking source positions to nodes, enabling hover and type-at-position queries.

Type Inference in Brief

type_env.rs supplies the Hindley–Milner style environment that powers constraint generation. It seeds built-in symbols, hydrates annotations, and hands out fresh type variables whenever the AST does not provide one. Each visit to a FunctionDef, assignment, call, or control-flow node updates the environment and records the relationships that must hold; the actual solving is deferred so the analyzer can collect all obligations before touching the unifier. This keeps the pass linear, side-effect free, and easy to extend with new AST constructs.

The constraint system supports multiple relationship types:

  • Equal: Type equality constraints (t1 ~ t2)
  • Call: Function application with argument and return types
  • HasAttr: Attribute access with method binding and inheritance resolution
  • Protocol: Structural conformance checks for both built-in protocols (Iterable, Iterator, Sequence, AsyncIterable, AsyncIterator, Awaitable) and user-defined Protocol classes
  • MatchPattern: Pattern matching with binding extraction
  • PatternExhaustive: Exhaustiveness checking for match statements
  • PatternReachable: Reachability checking to detect unreachable patterns

Once constraints reach solve_constraints, they are unified in order. Successful unifications compose into a substitution map, while failures persist with span metadata so editor clients can render precise diagnostics. The class registry enables attribute resolution with full inheritance support, overload resolution for methods decorated with @overload, and structural protocol checking for user-defined Protocol classes.

Control & Data Flow

cfg.rs and data_flow.rs provide the structural analyses that complement pure typing:

  • The CFG builder splits a function body into BasicBlocks linked by typed edges (normal flow, branch outcomes, loop exits, exception edges, etc.), mirroring Python semantics closely enough for downstream passes to reason about reachability.
  • The data-flow analyzer consumes that graph plus the original AST slice to flag common hygiene issues: variables read before assignment, code that cannot execute, and symbols that never get used. Results surface through DataFlowResult and end up in the final AnalysisResult.

This layered approach lets the LSP report both type-level and flow-level problems in a single request, keeping feedback tight while avoiding duplicate walks of the AST.

Class Metadata & Method Resolution

The class_metadata module tracks comprehensive information about class definitions:

  • Fields: Inferred from assignments in __init__ and class body
  • Methods: Including support for overload sets via @overload decorator
  • Special methods: __init__ and __new__ signatures for constructor checking
  • Decorators: @property, @classmethod, @staticmethod tracking
  • Protocols: Marks classes inheriting from typing.Protocol for structural conformance checking
  • Inheritance: Base class tracking with method resolution order for attribute lookup

Method types can be either single signatures or overload sets. When resolving a method call, the analyzer attempts to match argument types against overload signatures before falling back to the implementation signature.

Pattern Matching Support

The pattern and exhaustiveness modules provide comprehensive pattern matching analysis:

  • Type checking for all pattern forms (literal, capture, wildcard, sequence, mapping, class, OR, AS)
  • Exhaustiveness checking to ensure match statements cover all cases
  • Reachability checking to detect unreachable patterns subsumed by earlier cases
  • Binding extraction to track variables introduced by patterns

This enables diagnostics like PM001 (non-exhaustive match) and PM002 (unreachable pattern).

Linting & Additional Diagnostics

The linter and rules modules implement static checks beyond type correctness. Many BEA-series diagnostic codes are implemented, with others awaiting parser or symbol table enhancements. See the table of linting rules for details.

Utilities

Beyond inference and CFG analysis, the module exposes helpers for locating unbound identifiers, invalidating cached results when documents change, and bridging between symbol-table scopes and LSP positions.

Beacon Linter

The Beacon Rule Engine is a modular static analysis system powering diagnostics in Beacon.

It's foundationally a pure Rust implementation of PyFlakes.

Suppressing Warnings

Individual linter warnings can be suppressed using inline comments:

import os  # noqa: BEA015  # Suppress unused import warning
x = undefined  # noqa  # Suppress all warnings on this line

See Suppressions for complete documentation on suppression comments.


Legend: ⚠ = Warning ✕ = Error ⓘ = Info

CodeName / RuleKindLevelCategoryDescription
BEA001UndefinedNameNamingVariable or function used before being defined.
BEA002DuplicateArgumentFunctionsDuplicate parameter names in a function definition.
BEA003ReturnOutsideFunctionFlowreturn statement outside of a function or method body.
BEA004YieldOutsideFunctionFlowyield or yield from used outside a function context.
BEA005BreakOutsideLoopFlowbreak used outside a for/while loop.
BEA006ContinueOutsideLoopFlowcontinue used outside a for/while loop.
BEA007DefaultExceptNotLastExceptionA bare except: is not the final exception handler in a try block.
BEA008RaiseNotImplementedSemanticsUsing raise NotImplemented instead of raise NotImplementedError.
BEA009TwoStarredExpressionsSyntaxTwo or more * unpacking expressions in assignment.
BEA010TooManyExpressionsInStarredAssignmentSyntaxToo many expressions when unpacking into a starred target.
BEA011IfTupleLogicA tuple literal used as an if condition — always True.
BEA012AssertTupleLogicAssertion always true due to tuple literal.
BEA013FStringMissingPlaceholdersStringsf-string declared but contains no {} placeholders.
BEA014TStringMissingPlaceholdersStringst-string declared but contains no placeholders.
BEA015UnusedImportSymbolsImport is never used within the file.
BEA016UnusedVariableSymbolsLocal variable assigned but never used.
BEA017UnusedAnnotationSymbolsAnnotated variable never referenced.
BEA018RedefinedWhileUnusedNamingVariable redefined before original was used.
BEA019ImportShadowedByLoopVarScopeImport name shadowed by a loop variable.
BEA020ImportStarNotPermittedImportsfrom module import * used inside a function or class.
BEA021ImportStarUsedImportsimport * prevents detection of undefined names.
BEA022UnusedIndirectAssignmentNamingGlobal or nonlocal declared but never reassigned.
BEA023ForwardAnnotationSyntaxErrorTypingSyntax error in forward type annotation.
BEA024MultiValueRepeatedKeyLiteralDictDictionary literal repeats key with different values.
BEA025PercentFormatInvalidFormatStringsInvalid % format string.
BEA026IsLiteralLogicComparing constants with is or is not instead of ==/!=.
BEA027DefaultExceptNotLastExceptionBare except: must appear last.
BEA028UnreachableCodeFlowCode after a return, raise, or break is never executed.
BEA029RedundantPassCleanuppass used in a block that already has content.
BEA030EmptyExceptExceptionexcept: with no handling code (silent failure).

Rules

BEA001

Example

print(foo) before foo is defined.

Fix

Define the variable before use or fix the typo.

BEA002

Example

def f(x, x):
    pass

Fix

Rename one of the parameters.

BEA003

Example

Top-level return 5 in a module.

Fix

Remove or move inside a function.

BEA004

Example

yield x at module scope.

Fix

Wrap in a generator function.

BEA005

Example

break in global scope or in a function without loop.

Fix

Remove or restructure the code to include a loop.

BEA006

Example

continue in a function with no loop.

Fix

Remove or replace with control flow logic.

BEA007

Example

except: followed by except ValueError:

Fix

Move the except: block to the end of the try.

BEA008

Example

raise NotImplemented

Fix

Replace with raise NotImplementedError.

BEA009

Example

a, *b, *c = d

Fix

Only one starred target is allowed.

BEA010

Example

a, b, c, d = (1, 2, 3)

Fix

Adjust unpacking count.

BEA011

Example

if (x,):
    ...

Fix

Remove accidental comma or rewrite condition.

BEA012

Example

assert (x, y)

Fix

Remove parentheses or fix expression.

BEA013

Example

f"Hello world"

Fix

Remove the f prefix if unnecessary.

BEA014

Example

t"foo"

Fix

Remove the t prefix if unused.

BEA015

Example

import os not referenced.

Fix

Remove the unused import.

BEA016

Example

x = 10 never referenced again.

Fix

Remove assignment or prefix with _ if intentional.

BEA017

Example

x: int declared but unused.

Fix

Remove or use variable.

BEA018

Example

x = 1; x = 2 without reading x.

Fix

Remove unused definition.

BEA019

Example

import os
for os in range(3):
    ...

Fix

Rename loop variable.

BEA020

Example

def f():
    from math import *

Fix

Move import to module level.

BEA021

Example

from os import *

Fix

Replace with explicit imports.

BEA022

Example

global foo never used.

Fix

Remove redundant declaration.

BEA023

Example

def f() -> "List[int": ...

Fix

Fix or quote properly.

BEA024

Example

{'a': 1, 'a': 2}

Fix

Merge or remove duplicate keys.

BEA025

Example

"%q" % 3

Fix

Correct format specifier.

BEA026

Example

x is 5

Fix

Use ==/!=.

BEA027

Example

As above.

Fix

Reorder exception handlers.

BEA028

Example

return 5; print("unreachable")

Fix

Remove or refactor code.

BEA029

Example

def f():
    pass
    return 1

Fix

Remove redundant pass.

BEA030

Example

try:
    ...
except:
    pass

Fix

Handle exception or remove block.

Planned

NameKindCategorySeverityRationale
Mutable Default ArgumentMutableDefaultArgumentSemanticDetect functions that use a mutable object (e.g., list, dict, set) as a default argument.
Return in FinallyReturnInFinallyFlowCatch a return, break, or continue inside a finally block: this often suppresses the original exception and leads to subtle bugs.
For-Else Without BreakForElseWithoutBreakFlowThe for ... else construct where the else never executes a break is confusing and often mis-used. If you have else: on a loop but never break, you may signal confusing logic.
Wrong Exception CaughtBroadExceptionCaughtExceptionCatching too broad exceptions (e.g., except Exception: or except:) instead of specific types can hide bugs. You already have empty except; this expands to overly broad catching.
Inconsistent Return TypesInconsistentReturnTypesFunctionA function that returns different types on different paths (e.g., return int in one branch, return None in another) may lead to consuming code bugs especially if not annotated.
Index / Key Errors LikelyUnsafeIndexOrKeyAccessDataDetect patterns that likely lead to IndexError or KeyError, e.g., accessing list/dict without checking length/keys, especially inside loops.
Unused Coroutine / Async FunctionUnusedCoroutineSymbolIn async code: a async def function is defined but neither awaited nor returned anywhere — likely a bug.
Resource Leak / Unclosed DescriptorUnclosedResourceSymbolDetect file or network resource opened (e.g., open(...)) without being closed or managed via context manager (with).
Logging Format String ErrorsLoggingFormatErrorStringUsing % or f-string incorrectly in logging calls (e.g., logging format mismatches number of placeholders) can cause runtime exceptions or silent failures.
Comparison to None Using == / !=NoneComparisonLogicDiscourage == None or != None in favor of is None / is not None.

Beacon Diagnostic Codes

Beacon’s Diagnostic provider combines parser feedback, Hindley–Milner type errors, annotation coverage checks, control/data-flow analysis, and workspace import resolution into a single stream of LSP diagnostics.

This guide lists every diagnostic code emitted by that pipeline so you can interpret squiggles quickly and trace them back to the subsystem described in Type Checking, Static Analyzer, and Type Checking Modes.

LSP severity for imports (circular vs. unresolved) remains configurable under [diagnostics] as documented in Configuration. To temporarily suppress any diagnostic, use the mechanisms described in Suppressions.


Legend:

  • ⚠ = Warning
  • ✕ = Error
  • ⓘ = Info/Hints

Note that per-mode rows show the icon used in strict / balanced / relaxed order

CodeNameLevelCategoryDescription
ANY001UnsafeAnyUsageType SafetyDeep inference found an Any value, reducing type safety.
ANN001AnnotationMismatch✕ ⚠ ⓘAnnotationsDeclared annotation disagrees with the inferred type.
ANN002MissingVariableAnnotation✕ ⚠AnnotationsAssignment lacks an annotation in strict/balanced modes.
ANN003ParameterAnnotationMismatch✕ ⚠ ⓘAnnotationsParameter annotation conflicts with inferred usage.
ANN004MissingParameterAnnotation✕ ⚠AnnotationsParameter missing annotation when inference is precise.
ANN005ReturnAnnotationMismatch✕ ⚠ ⓘAnnotationsFunction return annotation disagrees with inference.
ANN006MissingReturnAnnotation✕ ⚠AnnotationsFunction lacks return annotation when inference is concrete.
ANN007ImplicitAnyParameterAnnotationsStrict mode forbids implicit Any on parameters.
ANN008ImplicitAnyReturnAnnotationsStrict mode forbids implicit Any return types.
ANN009MissingClassAttributeAnnotationAnnotationsStrict mode requires explicit annotations on class attributes.
ANN010BareExceptClauseAnnotationsStrict mode forbids bare except: clauses without exception types.
ANN011ParameterImplicitAnyAnnotationsBalanced mode warns when parameter type resolves to implicit Any.
ANN012ReturnImplicitAnyAnnotationsBalanced mode warns when return type resolves to implicit Any.
DUNDER_INFOEntryPointGuardDunder PatternsHighlights if __name__ == "__main__": guard blocks.
DUNDER001MagicMethodOutOfScopeDunder PatternsMagic methods defined outside a class.
HM001TypeMismatchType SystemHindley–Milner could not unify two types.
HM002OccursCheckFailedType SystemRecursive type variable detected (infinite type).
HM003UndefinedTypeVarType SystemReferenced type variable was never declared.
HM004KindMismatchType SystemWrong number of type arguments supplied to a generic.
HM005InfiniteTypeType SystemInference produced a non-terminating type (self-referential).
HM006ProtocolNotSatisfiedType SystemValue fails to implement the required protocol methods.
HM007AttributeNotFoundAttributesAttribute or method does not exist on the receiver type.
HM008ArgumentCountMismatchType SystemCall site passes too many or too few arguments.
HM009ArgumentTypeMismatchType SystemArgument type incompatible with the parameter type.
HM010PatternTypeMismatchPattern TypingMatch/case pattern cannot match the subject type.
HM011KeywordArgumentErrorType SystemUnknown or duplicate keyword arguments in a call.
HM012GenericTypeErrorType SystemCatch-all Hindley–Milner error (value restriction, etc.).
HM013PatternStructureMismatchPattern TypingPattern shape (mapping, class, sequence) differs from subject.
HM014VarianceErrorVarianceInvariant/covariant/contravariant constraint violated.
MODE_INFOTypeCheckingModeModeReminder showing which type-checking mode produced diagnostics.
PM001PatternNonExhaustivePatternsMatch statement fails to cover every possible case.
PM002PatternUnreachablePatternsLater pattern is shadowed by an earlier one.
circular-importCircularImport✕ ⚠ ⓘImportsModule participates in an import cycle (severity comes from config).
missing-moduleMissingModuleImportsReferenced module is absent from the workspace/stubs.
shadowed-variableShadowedVariableScopeInner scope reuses a name that already exists in an outer scope.
undefined-variableUndefinedVariableName ResolutionName used before being defined anywhere.
unresolved-importUnresolvedImport✕ ⚠ ⓘImportsImport target cannot be resolved (severity configurable).
unreachable-codeUnreachableCodeData FlowCode after return, raise, or break never executes.
unused-variableUnusedVariableData FlowVariable assigned but never read.
use-before-defUseBeforeDefData FlowVariable read before it is assigned in the current scope.

Diagnostics by Category

Type Safety Diagnostics

Diagnostics in this category highlight when inference collapses to Any and reduces overall type safety.

ANY001 – UnsafeAnyUsage

Example

from typing import Any

payload: Any = fetch_config()
payload["timeout"]  # ANY001 – inference lost precision once `Any` appeared

Guidance

Beacon warns when unchecked Any values flow through the type map (see Special Types). Replace Any with a precise annotation, cast the value after runtime checks, or refactor APIs so that callers receive concrete types.

Annotations Diagnostics

Annotation diagnostics cover mismatches between declared types and inferred usage along with mode-specific requirements for annotations.

ANN001 – AnnotationMismatch

Example

value: int = "stale"  # Annotated as int, inferred as str

Guidance

Annotation/inference mismatches inherit their severity from the active mode (strict → error, balanced → warning, relaxed → hint). Align the annotation with real usage, or change the code so the inferred type matches. See Type Checking and Type Checking Modes for the inference rules.

ANN002 – MissingVariableAnnotation

Example

# beacon: mode=strict
profile = load_profile()  # Missing annotation (ANN002)

Guidance

Strict/balanced modes expect assignments with concrete inferred types to be annotated. Add the appropriate annotation (profile: Profile = load_profile()), or downgrade the file to relaxed mode if intentional (see Type Checking Modes).

ANN003 – ParameterAnnotationMismatch

Example

def greet(name: str) -> str:
    return name + 1  # name inferred as int due to arithmetic

Guidance

Parameter annotations must agree with how the function body uses the value. Update the annotation or refactor the body to respect it. Details about how inference follows parameter usage live in Type Checking.

ANN004 – MissingParameterAnnotation

Example

def send_email(address):
    ...

Balanced/strict modes infer address: str (or similar) and emit ANN004.

Guidance

Add explicit parameter annotations whenever inference is concrete: def send_email(address: str) -> None:. Relaxed mode skips this check entirely (see Type Checking Modes).

ANN005 – ReturnAnnotationMismatch

Example

def parity(flag: bool) -> bool:
    return "odd"  # Return annotation mismatch

Guidance

Ensure return annotations reflect every path. Either return the annotated type or adjust the annotation. See Type Checking for how Beacon treats return types.

ANN006 – MissingReturnAnnotation

Example

def total(values):
    return sum(values)

Balanced/strict modes infer a concrete return type (e.g., int) and require -> int.

Guidance

Add return annotations when inference is precise and not Any/None: def total(values: list[int]) -> int:. Relaxed mode suppresses this requirement (see Type Checking Modes).

ANN007 – ImplicitAnyParameter

Example

# beacon: mode=strict
def transform(data):
    return data.strip()

Guidance

Strict mode disallows implicit Any on parameters even when inference could deduce a type. Add annotations for every parameter (data: str). Balanced/relaxed modes emit ANN004 instead or skip the check entirely. Review Type Checking Modes for severity rules.

ANN008 – ImplicitAnyReturn

Example

# beacon: mode=strict
def make_id():
    return uuid.uuid4().hex  # Implicit Any return type

Guidance

Strict mode requires explicit return annotations on every function. Provide the exact type (-> str) or relax the file mode if you intentionally rely on inference. See Type Checking Modes for override syntax.

ANN009 – MissingClassAttributeAnnotation

Example

# beacon: mode=strict
class Configuration:
    host = "localhost"  # Missing type annotation
    port: int = 8080  # OK: Has annotation

Guidance

Strict mode requires explicit type annotations on all class attributes. Add the annotation (host: str = "localhost") or use balanced/relaxed mode if gradual typing is preferred. Note that instance attributes (assigned in __init__ or other methods) are not subject to this check—only class-level attributes defined directly in the class body. See Type Checking Modes for mode configuration.

ANN010 – BareExceptClause

Example

# beacon: mode=strict
def process_data():
    try:
        result = risky_operation()
    except:  # ANN010: Bare except clause not allowed
        handle_error()

Guidance

Strict mode requires specific exception types in except clauses to prevent catching system exceptions like KeyboardInterrupt or SystemExit unintentionally. Replace bare except: with specific exception types:

# Good: Specific exception type
except ValueError:
    ...

# Good: Multiple exception types
except (ValueError, TypeError):
    ...

# Good: Catch most exceptions but not system ones
except Exception:
    ...

Balanced and relaxed modes allow bare except clauses for gradual adoption. See Type Checking Modes for mode configuration.

ANN011 – ParameterImplicitAny

Example

# beacon: mode=balanced
def process_unknown(data, options):
    return data  # ANN011: 'data' and 'options' have implicit Any type

Guidance

Balanced mode distinguishes between concrete inferred types (which trigger ANN004 with type suggestions) and implicit Any (which triggers ANN011). When type inference cannot determine a concrete type due to insufficient context, parameters are finalized as Any and this warning is emitted.

Add type annotations to clarify the intended types:

# Good: Explicit annotations remove ambiguity
def process_unknown(data: dict[str, Any], options: dict[str, str]) -> dict[str, Any]:
    return data

This diagnostic helps identify truly ambiguous cases where annotations provide the most value. Strict mode reports all missing parameter annotations as ANN007 errors instead. See Type Checking Modes for inference behavior.

ANN012 – ReturnImplicitAny

Example

# beacon: mode=balanced
def handle_dynamic(value):
    print(value)  # ANN012: Return type is implicit Any

Guidance

When a function's return type cannot be inferred to a concrete type, balanced mode warns with ANN012. This differs from ANN006, which fires when inference determines a concrete type but the annotation is missing.

Add an explicit return type annotation:

# Good: Explicit return type
def handle_dynamic(value: Any) -> None:
    print(value)

For functions with implicit Any returns, consider whether:

  • The return type should be None (procedures)
  • You need to add annotations to parameters to enable better inference
  • The function genuinely needs -> Any due to dynamic behavior

Strict mode reports all missing return annotations as ANN008 errors instead. See Type Checking Modes for the distinction between concrete inference and implicit Any.

Dunder Patterns Diagnostics

Beacon flags special-casing guidance for dunder blocks and entry-point guards to keep symbol metadata accurate.

DUNDER_INFO – EntryPointGuard

Example

if __name__ == "__main__":
    run_cli()

Guidance

This informational hint makes entry-point guards easier to spot. No action needed. The behavior is described in Semantic Enhancements.

DUNDER001 – MagicMethodOutOfScope

Example

def __str__():
    return "oops"  # Should live inside a class

Guidance

Define magic methods inside classes (class Foo:\n def __str__(self) -> str: ...). This keeps symbol metadata consistent with Python semantics. See Semantic Enhancements for background on how Beacon tracks dunders.

Type System Diagnostics

Type system diagnostics originate from Hindley–Milner inference, call checking, and generic validation.

HM001 – TypeMismatch

Example

def add(flag: bool) -> int:
    return flag + "!"  # int vs. str cannot unify

Guidance

Beacon’s Hindley–Milner engine reports HM001 when two types cannot unify (see Subtyping vs Unification). Convert or narrow the values so the operands share a compatible type.

HM002 – OccursCheckFailed

Example

def self_apply(f):
    return f(f)  # Requires f: T -> T, but T would have to contain itself

Guidance

Occurs-check failures indicate an infinite recursive type. Refactor so values are not applied to themselves without a wrapper type, or introduce generics that break the cycle. See Type Checking for how recursive types are limited.

HM003 – UndefinedTypeVar

Example

def use_unknown(x: U) -> U:  # U was never declared via TypeVar
    return x

Guidance

Declare every type variable with TypeVar before referencing it: U = TypeVar("U"). The generics workflow is covered in Type Checking.

HM004 – KindMismatch

Example

ids: dict[str] = {}  # dict expects two type arguments

Guidance

Provide the correct number of arguments for each generic (dict[str, int]). Beacon enforces kind arity to avoid ambiguous instantiations. See Type Checking.

HM005 – InfiniteType

Example

def paradox(x):
    return x(paradox)  # Leads to an infinite type when inferred

Guidance

Infinite type errors usually stem from higher-order functions that apply un-annotated callables to themselves. Add annotations to break the cycle or restructure the algorithm so a value is not required to contain itself.

HM006 – ProtocolNotSatisfied

Example

from typing import Iterable

def consume(xs: Iterable[str]) -> None:
    for item in xs:
        print(item.upper())

consume(10)  # int does not satisfy Iterable[str]

Guidance

Ensure call arguments implement the required protocol slots or convert them first (wrap values in iterables, implement __iter__, etc.). Protocol behavior is described in Type Checking.

HM008 – ArgumentCountMismatch

Example

def pair(a: int, b: int) -> None:
    ...

pair(1)  # Missing second positional argument

Guidance

Match the declared arity (positional + keyword-only + variadic). Add or remove arguments, or update the function signature. This follows the call constraint rules in Type Checking.

HM009 – ArgumentTypeMismatch

Example

def square(x: int) -> int:
    return x * x

square("ten")  # Argument type mismatch

Guidance

Convert arguments to the expected type or adjust the signature to accept a broader type. Beacon pinpoints the offending parameter in the diagnostic.

HM011 – KeywordArgumentError

Example

def connect(host: str, *, ssl: bool) -> None:
    ...

connect("db", secure=True)  # Unknown keyword `secure`

Guidance

Use valid keyword names, avoid duplicates, and respect positional-only/keyword-only markers. Adjust the call site or function signature accordingly.

HM012 – GenericTypeError

Example

def capture() -> int:
    cache = []
    def inner():
        cache.append(inner)
        return inner(cache)  # Triggers a generic HM012 error about unsafe recursion

Guidance

HM012 is a catch-all for rare Hindley–Milner failures (value restriction violations, unsupported constructs). Inspect the message for context, add annotations to guide inference, or refactor towards supported patterns. See Type Checking.

Attributes Diagnostics

Attribute diagnostics explain when a receiver type does not define the attribute being accessed.

HM007 – AttributeNotFound

Example

count = 10
count.splitlines()  # Attribute does not exist on int

Guidance

The analyzer could not find the attribute on the receiver type. Narrow the type, convert the value, or fix typos. Beacon adds contextual hints (e.g., “splitlines is a string method”). See Type Checking for attribute resolution notes.

Pattern Typing Diagnostics

Pattern typing diagnostics focus on structural mismatches that arise during match statement analysis.

HM010 – PatternTypeMismatch

Example

def parse(match_obj):
    match match_obj:
        case (x, y):  # HM010 if match_obj is inferred as str
            ...

Guidance

Ensure match subjects and patterns agree (use tuples with tuple subjects, mappings with dicts, etc.). Pattern typing is detailed in Pattern Matching Support.

HM013 – PatternStructureMismatch

Example

def report(event):
    match event:
        case {"kind": kind, "meta": {"user": user}}:
            ...

If event is inferred as a tuple or class, the mapping pattern structure mismatches.

Guidance

Use patterns whose structure matches the subject (mappings for dicts, class patterns for dataclasses, etc.). Details live in Pattern Matching Support.

Variance Diagnostics

Variance diagnostics describe when mutable containers or position constraints break covariance/contravariance rules.

HM014 – VarianceError

Example

pets: list[object] = ["dog", "cat"]  # list is invariant
specific_pets: list[str] = pets  # HM014: cannot assign list[str] to list[object]

Guidance

Respect variance constraints. Mutable containers are invariant, so consider using immutable collections (tuple[str, ...]) or widening the source type. The diagnostic message includes targeted advice per position (in/out). See Type Checking.

Mode Diagnostics

Mode diagnostics are informational hints emitted when Beacon reports which type-checking mode produced a set of issues.

MODE_INFO – TypeCheckingMode

Example

Type checking mode: balanced (workspace default) - ...

Guidance

Beacon appends this hint whenever diagnostics appear so you know whether strict/balanced/relaxed rules applied. Use # beacon: mode=strict (etc.) to override as described in Type Checking Modes.

Patterns Diagnostics

Pattern diagnostics describe exhaustiveness and reachability issues detected in structural pattern matching.

PM001 – PatternNonExhaustive

Example

def handle(flag: bool) -> str:
    match flag:
        case True:
            return "y"

No False case triggers PM001.

Guidance

Add the missing cases (case False: or case _:). Exhaustiveness checking is covered in Pattern Matching Support.

PM002 – PatternUnreachable

Example

match value:
    case _:
        return 0
    case 1:
        return value  # Unreachable after wildcard case

Guidance

Reorder or delete subsumed patterns so every case is reachable. See Pattern Matching Support.

Imports Diagnostics

Import diagnostics enumerate issues with module resolution, missing files, and configurable severities for unresolved imports.

circular-import – CircularImport

Example

# module_a.py
from module_b import helper

# module_b.py
from module_a import helper  # completes a cycle

Guidance

Break the cycle by moving shared code into a third module, deferring imports inside functions, or rethinking module boundaries Severity comes from [diagnostics.circular_imports] in Configuration.

missing-module – MissingModule

Example

import backend.plugins.payments  # File or package missing from workspace/stubs

Guidance

Add the module to the workspace, fix typos, or adjust your import path. Beacon reports missing modules as errors because runtime execution would fail immediately.

unresolved-import – UnresolvedImport

Example

from services import codecs  # services module exists, codecs submodule does not

Guidance

Fix the module path, add missing files, or install the dependency. Severity is controlled by [diagnostics.unresolved_imports] in Configuration.

Scope Diagnostics

Scope diagnostics call out variable shadowing inside nested scopes.

shadowed-variable – ShadowedVariable

Example

token = "outer"

def handler():
    token = "inner"  # Shadows outer variable

Guidance

Rename inner variables or move logic closer to usage to avoid surprising shadowing. The static analyzer describes its scope walk in Control & Data Flow.

Name Resolution Diagnostics

Name resolution diagnostics highlight names that were never defined anywhere in the file or workspace.

undefined-variable – UndefinedVariable

Example

print(total)  # `total` never defined

Guidance

Define the name, import it, or limit the scope where it’s used. Unlike use-before-def, this check runs at the file level via Analyzer::find_unbound_variables (see Static Analyzer).

Data Flow Diagnostics

Data-flow diagnostics track unreachable blocks and variable usage ordering issues.

unreachable-code – UnreachableCode

Example

def foo():
    return 42
    print("never runs")  # Unreachable

Guidance

Remove or refactor unreachable statements. Diagnostics carry the UNNECESSARY tag so editors can gray out the code. Pipeline details sit in Control & Data Flow.

unused-variable – UnusedVariable

Example

def process():
    result = compute()  # Never read later

Guidance

Use the variable, prefix with _ to mark as intentionally unused, or delete it. See Control & Data Flow for how Beacon tracks reads/writes.

use-before-def – UseBeforeDef

Example

def build():
    print(total)
    total = 10  # total read before assignment in this scope

Guidance

Reorder statements so assignments precede reads, or mark outer-scope variables as nonlocal/global when appropriate. Data-flow analysis is described in Control & Data Flow.

Type Checking

Beacon provides powerful static type checking for Python code, combining the rigor of Hindley-Milner type inference with the flexibility needed for Python's dynamic features.

Suppressing Type Errors

Type checking errors can be suppressed using inline comments:

x: int = "string"  # type: ignore
value: str = 42  # type: ignore[assignment]  # Suppress specific error type

See Suppressions for complete documentation on suppression comments.

Type System Philosophy

Beacon's type checker is designed with a core principle: context-aware strictness. It maintains strong type safety for genuinely unsafe operations while being permissive for common, safe Python patterns.

Design Goals

  1. High Signal-to-Noise Ratio: Report errors that matter, not false positives from valid Python code
  2. Catch Real Bugs: Focus on type mismatches that lead to runtime errors
  3. Support Gradual Typing: Work seamlessly with both annotated and unannotated code
  4. Python-First Semantics: Understand Python idioms rather than forcing ML-style patterns

Union and Optional Types

How Union Types Work

Union types represent values that can be one of several types. Beacon treats union types using subtyping semantics rather than strict structural equality.

# This is valid - None is a member of Optional[int]
def get_value() -> int | None:
    return None  # No error

# Union members work naturally
x: int | str = 42  # int is a subtype of int | str
y: int | str = "hello"  # str is a subtype of int | str

Optional Types

Optional[T] is syntactic sugar for Union[T, None]. Beacon understands that None is a valid value for Optional types without requiring explicit checks:

from typing import Optional

def process(value: Optional[str]) -> None:
    # Assigning None to Optional is always valid
    result: Optional[str] = None  # No error

Type Narrowing

While union types are permissive for assignment, accessing attributes or calling methods requires narrowing:

def process(value: int | None) -> int:
    # Error: None doesn't have __add__
    return value + 1

def process_safe(value: int | None) -> int:
    if value is None:
        return 0
    # value is narrowed to int here
    return value + 1  # OK

Beacon provides several narrowing mechanisms:

  1. None Checks: if x is None / if x is not None
  2. isinstance() Guards: if isinstance(x, int)
  3. Truthiness: if x narrows away None and falsy values
  4. Type Guards: User-defined type guard functions
  5. Match Statements: Pattern matching with exhaustiveness checking

Subtyping vs Unification

Beacon's type checker uses two complementary mechanisms:

Unification (Strict)

Used for non-union types. Requires structural equality:

x: int = 42
y: str = x  # Error: cannot unify int with str

Subtyping (Flexible)

Used when union types are involved. Checks semantic compatibility:

x: int = 42
y: int | str = x  # OK: int <: int | str

z: int | str | None = None  # OK: None <: int | str | None

This hybrid approach provides:

  • Strictness where it matters: Direct type mismatches are caught
  • Flexibility for unions: Common patterns like Optional work naturally

Type Inference

Beacon infers types even without annotations:

def add(x, y):
    return x + y
# Inferred type: (int, int) -> int or (str, str) -> str
# (overloaded based on usage)

numbers = [1, 2, 3]
# Inferred type: list[int]

Value Restriction

Beacon applies the value restriction to prevent unsafe generalization:

empty_list = []  # Type: list[Never] - cannot generalize
# Must provide type hint for empty collections:
numbers: list[int] = []  # Type: list[int]

Special Types

Any

Any is the escape hatch for truly dynamic code. It unifies with all types without errors:

from typing import Any

def dynamic_operation(x: Any) -> Any:
    return x.anything()  # No type checking

Use Any sparingly - it disables type checking for that value.

Never

Never represents impossible values or code paths that never return:

def unreachable() -> Never:
    raise RuntimeError("Never returns")

def example(x: int) -> int:
    if x < 0:
        unreachable()
    return x  # Type checker knows we only reach here if x >= 0

Top (⊤)

Top is the supertype of all types. It appears in generic bounds and protocol definitions but is rarely used directly.

Flow-Sensitive Type Narrowing

Beacon tracks type information through control flow:

def process(x: int | str | None) -> int:
    if x is None:
        return 0
    # x: int | str here

    if isinstance(x, int):
        return x
    # x: str here

    return len(x)  # OK: x is definitely str

This works with:

  • If statements
  • While loops
  • Match statements
  • Try-except blocks
  • Boolean operators (and, or)

Exhaustiveness Checking

Match statements and if-elif chains are checked for exhaustiveness:

def handle(x: bool) -> str:
    match x:
        case True:
            return "yes"
        case False:
            return "no"
    # OK: all cases covered

def incomplete(x: int | str) -> str:
    if isinstance(x, int):
        return "number"
    # Warning: str case not handled

Generic Types

Beacon supports generic types with type parameters:

from typing import TypeVar, Generic

T = TypeVar('T')

class Box(Generic[T]):
    def __init__(self, value: T) -> None:
        self.value = value

    def get(self) -> T:
        return self.value

# Type inference works:
int_box = Box(42)  # Box[int]
str_box = Box("hello")  # Box[str]

Protocols

Beacon supports structural typing through protocols:

from typing import Protocol

class Drawable(Protocol):
    def draw(self) -> None: ...

def render(obj: Drawable) -> None:
    obj.draw()  # OK if obj has draw() method

class Circle:
    def draw(self) -> None:
        print("drawing circle")

render(Circle())  # OK: Circle satisfies Drawable protocol

Common Patterns

Optional Chaining

from typing import Optional

def get_name(user: Optional[dict]) -> Optional[str]:
    if user is None:
        return None
    return user.get("name")  # Type checker knows user is dict

Union Type Discrimination

def process(value: int | list[int]) -> int:
    if isinstance(value, int):
        return value
    return sum(value)  # value is list[int] here

Type Guard Functions

from typing import TypeGuard

def is_str_list(val: list) -> TypeGuard[list[str]]:
    return all(isinstance(x, str) for x in val)

def process(items: list[int | str]) -> None:
    if is_str_list(items):
        # items: list[str] here
        print(",".join(items))  # OK

Error Messages

Beacon provides context-aware error messages:

  • String/Int Mixing: Suggests explicit conversion
  • None Errors: Explains Optional types and None checks
  • Union Errors: Shows which union branches failed and why
  • Collection Mismatches: Identifies list vs dict vs tuple confusion

Error messages focus on actionable fixes rather than type theory jargon.

Configuration

Type checking strictness can be controlled via beacon.toml:

[analysis]
# Warn when Any is used (default: false)
warn-on-any = true

# Strict mode: disallow implicit Any (default: false)
strict = false

# Report unused variables (default: true)
unused-variables = true

Best Practices

  1. Use Optional for nullable values: Optional[T] is clearer than T | None for function signatures
  2. Narrow before use: Check for None before accessing attributes
  3. Leverage type guards: Create reusable type narrowing functions
  4. Avoid Any: Use Union types or Protocol types for flexibility
  5. Add type hints to empty collections: Help inference with list[int]() instead of []
  6. Trust the type checker: If it says a path is unreachable, it probably is

When Type Checking Fails You

Sometimes the type checker can't infer what you know is true. Use these escape hatches:

from typing import cast, Any

# cast: Assert a type without runtime check
value = cast(int, get_dynamic_value())

# Any: Disable type checking
dynamic: Any = get_unknown_type()

# Type ignore comment (use sparingly)
result = complex_operation()  # type: ignore

# Assert narrowing
x: int | None = get_value()
assert x is not None
# x: int here (type checker understands assert)

Use these sparingly and document why the type checker needs help.

Type Checking Modes

Beacon supports three type checking modes that let you balance type safety with development flexibility: strict, balanced, and relaxed.

Configuration

Set the mode in your beacon.toml (see config documentatation for more information):

[type_checking]
mode = "balanced"  # or "strict" or "relaxed"

Override the mode for specific files using a comment directive at the top of the file:

# beacon: mode=strict

def calculate(x: int, y: int) -> int:
    return x + y

Mode Comparison

FeatureStrictBalancedRelaxed
Annotation mismatchesErrorWarningHint
Missing annotations (inferred)ErrorWarningSilent
Implicit AnyErrorWarningSilent
Bare except clausesErrorAllowedAllowed
Class attribute annotationsRequiredOptionalOptional

Strict Mode

Enforces complete type annotation coverage with no type inference allowed. All function parameters and return types must have explicit type annotations.

Characteristics:

  • All annotation mismatches are errors
  • All function parameters must have explicit type annotations (ANN007)
  • All function return types must have explicit type annotations (ANN008)
  • All class attributes must have explicit type annotations (ANN009)
  • Bare except: clauses are forbidden (ANN010)
  • Missing annotations are treated as implicit Any, which is forbidden in strict mode
  • Type inference is not allowed as a substitute for explicit annotations
  • Best for greenfield projects, type-safe libraries, and critical components

Example:

# beacon: mode=strict

# ✓ Valid - fully annotated
def process(data: list[int]) -> int:
    total: int = sum(data)
    return total

# ✗ Error - missing return type annotation (ANN008)
# Even though the return type could be inferred as int
def calculate(x: int, y: int):
    return x + y

# ✗ Error - parameter 'first' missing annotation (ANN007)
# Strict mode requires explicit annotations, no inference
def format_name(first, last: str) -> str:
    return f"{first} {last}"

# ✗ Error - both parameters and return type missing (ANN007, ANN008)
# Strict mode requires all annotations, even when types could be inferred
def add(x, y):
    return x + y

# Class attributes also require explicit annotations in strict mode
class Config:
    # ✗ Error - class attribute missing annotation (ANN009)
    host = "localhost"

    # ✓ Valid - annotated class attribute
    port: int = 8080

# Exception handling requires specific exception types
def process() -> int:
    try:
        return risky_operation()
    except:  # ✗ Error - bare except not allowed (ANN010)
        return -1

def safe_process() -> int:
    try:
        return risky_operation()
    except (ValueError, TypeError):  # ✓ Valid - specific exception types
        return -1

Balanced Mode

Provides helpful warnings while allowing gradual type annotation adoption. Distinguishes between concrete inferred types and implicit Any to guide annotation efforts.

Characteristics:

  • Annotation mismatches are warnings (not errors)
  • Missing annotations with concrete inferred types trigger warnings showing the inferred type
  • Implicit Any types (unresolvable inference) trigger warnings to identify ambiguous cases
  • Allows mixing annotated and unannotated code (gradual typing)
  • Ideal for incrementally adding types to existing projects

Example:

# beacon: mode=balanced

# ✓ No warnings - fully annotated
def process(data: list[int]) -> int:
    return sum(data)

# ⚠ Warning ANN006 - missing return type annotation (inferred as int)
# Suggestion includes the inferred type to guide annotation
def calculate(x: int, y: int):
    return x + y

# ⚠ Warning ANN004 - parameter 'first' missing annotation (inferred as str)
# Type can be inferred from usage context
def format_name(first, last: str) -> str:
    return f"{first} {last}"

# ⚠ Warning ANN011 - parameter 'data' has implicit Any type
# ⚠ Warning ANN012 - return type is implicit Any
# Type inference couldn't determine concrete types
def process_unknown(data, options):
    return data

# ⚠ Warning ANN011 - parameter 'b' has implicit Any type
# Gradual typing: warns only on unannotated parameter
def mixed_params(a: int, b, c: int) -> int:
    return a + b + c

Relaxed Mode

Minimally intrusive type checking focused on explicit mismatches.

Characteristics:

  • Only explicit annotation mismatches produce hints
  • Missing annotations are silent
  • Maximum flexibility for exploration and legacy code
  • Useful for initial type system adoption

Example:

# beacon: mode=relaxed

# ✓ No diagnostics
def process(data):
    return sum(data)

# ℹ Hint only - annotation doesn't match inference (ANN001)
def calculate(x: int, y: int) -> str:  # Returns int, not str
    return x + y

# ✓ No diagnostics - missing annotations are allowed
def format_name(first, last):
    return f"{first} {last}"

Annotation Coverage Diagnostics

Beacon validates type annotations against inferred types and reports missing annotations based on the active mode.

Diagnostic Codes

CodeDescriptionStrictBalancedRelaxed
ANN001Annotation mismatch on assignmentsErrorWarningHint
ANN002Missing annotation on assignmentsErrorWarningSilent
ANN003Parameter annotation mismatchErrorWarningHint
ANN004Missing parameter annotation (inferred type is concrete)-WarningSilent
ANN005Return type annotation mismatchErrorWarningHint
ANN006Missing return type annotation (inferred type is concrete)-WarningSilent
ANN007Parameter missing annotation (strict mode)Error--
ANN008Return type missing annotation (strict mode)Error--
ANN009Class attribute missing annotationError--
ANN010Bare except clause without exception typeError--
ANN011Parameter has implicit Any type-Warning-
ANN012Return type has implicit Any type-Warning-

Strict Mode: All missing parameter and return type annotations trigger ANN007/ANN008 errors respectively. Class attributes without annotations trigger ANN009 errors. Bare except clauses trigger ANN010 errors.

Balanced Mode: Distinguishes between concrete inferred types and implicit Any:

  • ANN004/ANN006: Missing annotations where type inference determined a concrete type (warns with suggested type)
  • ANN011/ANN012: Missing annotations where type inference resulted in implicit Any (warns about ambiguity)

See complete diagnostic codes documentation for more information

Type Inference and Implicit Any

After constraint solving, Beacon finalizes any unresolved type variables as Any enabling balanced mode to distinguish between:

  1. Concrete inferred types: Type inference successfully determined a specific type (int, str, list, etc.)
  2. Implicit Any: Type inference couldn't resolve to a concrete type due to insufficient context
  3. Active type variables: Still in the inference process (no diagnostic yet)

Diagnostic behavior:

  • Strict mode: All missing annotations are errors (ANN007/ANN008), regardless of inference
  • Balanced mode: Warns on both concrete inferred types (ANN004/ANN006) and implicit Any (ANN011/ANN012)
  • Relaxed mode: Silent on missing annotations, only hints on explicit mismatches
  • NoneType returns: No diagnostic for procedures with implicit None return (void functions)

Example:

# beacon: mode=balanced

# ⚠ Warning ANN004/ANN006 - concrete inferred type (int)
# Suggestion shows the inferred type
def add(x: int, y: int):
    return x + y  # Inferred as int

# ⚠ Warning ANN011/ANN012 - implicit Any
# Type inference couldn't determine concrete type
def process(data):
    return transform(data)  # Unknown transform behavior

# ✓ No diagnostic - procedure with None return
def log_message(msg: str):
    print(msg)  # Inferred as None, no warning needed

Logging and Observability

Beacon uses structured logging via the tracing ecosystem to provide comprehensive observability for both development and production environments.

Architecture

The logging infrastructure is built on three components:

  1. Core Logging Module (beacon-core::logging) - Centralized configuration and initialization
  2. LSP Server Instrumentation - Protocol events, file analysis, and type inference logging
  3. CLI Logs Command - Real-time log viewing and filtering

Local Development

Running with Logs

Start the LSP server with full tracing enabled:

RUST_LOG=trace cargo run --bin beacon-lsp

Logs are written to two destinations:

  • File: logs/lsp.log (daily rotating, persistent)
  • stderr: Immediate console output during development

Log Levels

Set the log level using the RUST_LOG environment variable:

# Only errors and panics
RUST_LOG=error cargo run --bin beacon-lsp

# Warnings and errors
RUST_LOG=warn cargo run --bin beacon-lsp

# High-level events (default for releases)
RUST_LOG=info cargo run --bin beacon-lsp

# Detailed operation logs (recommended for development)
RUST_LOG=debug cargo run --bin beacon-lsp

# Full verbosity including protocol messages
RUST_LOG=trace cargo run --bin beacon-lsp

Module-Specific Filtering

Target specific modules for detailed logging:

# Debug level for LSP, trace for analysis
RUST_LOG=beacon_lsp=debug,beacon_lsp::analysis=trace cargo run --bin beacon-lsp

# Trace constraint generation only
RUST_LOG=beacon_constraint=trace cargo run --bin beacon-lsp

Watching Logs

CLI Logs Command

View logs in real-time using the debug logs command:

# Show all current logs
beacon debug logs

# Follow mode - continuously watch for new entries
beacon debug logs --follow

# Filter by pattern (regex supported)
beacon debug logs --follow --filter "ERROR|WARN"

# Filter to specific module
beacon debug logs --follow --filter "analysis"

# Use custom log file
beacon debug logs --follow --path /custom/path/to/log.txt

Log Output Format

Logs are colorized by level for easy scanning:

  • ERROR - Bright red
  • WARN - Yellow
  • INFO - White
  • DEBUG - Cyan
  • TRACE - Dimmed

Example output:

2025-11-08T12:15:42Z [INFO] beacon_lsp: Starting Beacon LSP server version="0.1.0"
2025-11-08T12:15:43Z [DEBUG] beacon_lsp::backend: Received initialize request root_uri=Some(file:///workspace)
2025-11-08T12:15:44Z [INFO] beacon_lsp::analysis: Starting analysis uri="file:///workspace/main.py"
2025-11-08T12:15:44Z [DEBUG] beacon_lsp::analysis: Generating constraints uri="file:///workspace/main.py"
2025-11-08T12:15:45Z [INFO] beacon_lsp::analysis: Analysis completed uri="file:///workspace/main.py" duration_ms=142 type_count=87 error_count=0

What Gets Logged

Protocol Events

All LSP protocol requests and notifications are logged at appropriate levels:

  • initialize, shutdown - INFO
  • textDocument/didOpen, didChange, didClose - INFO
  • workspace/didChangeConfiguration - INFO
  • Diagnostics publishing - DEBUG
  • Feature requests (hover, completion, etc.) - TRACE

File Analysis

The analysis pipeline logs key stages:

  1. Analysis Start (INFO) - URI, timestamp
  2. Document Retrieval (DEBUG) - Version, source length, scope count
  3. Cache Hit/Miss (DEBUG) - Whether cached results are used
  4. Constraint Generation (DEBUG) - Number of constraints generated
  5. Solver Execution (DEBUG) - Constraint count, solver invocation
  6. Solver Completion (INFO) - Type error count
  7. Analysis Completion (INFO) - Duration, type count, error count

Example analysis sequence:

INFO  Starting analysis uri="file:///app/main.py"
DEBUG Retrieved document data uri="file:///app/main.py" version=3 source_length=1247 scopes=8
DEBUG Generating constraints uri="file:///app/main.py"
DEBUG Constraints generated, starting solver uri="file:///app/main.py" constraint_count=142
INFO  Constraint solving completed uri="file:///app/main.py" type_error_count=0
INFO  Analysis completed uri="file:///app/main.py" version=3 duration_ms=89 type_count=142 error_count=0

Type Inference

Constraint generation and solving steps are logged:

  • Constraint count per file
  • Solver initialization
  • Type errors encountered
  • Substitution application

Workspace Operations

Multi-file workspace operations:

  • Workspace indexing start/completion
  • Dependency updates
  • Module invalidation
  • Re-analysis of affected modules

Configuration Changes

Configuration loading and hot-reload:

  • Config file discovery
  • Validation warnings
  • Runtime updates
  • Mode changes (strict/balanced/relaxed)

Error Handling

Panic Logging

Panics are automatically captured and logged before termination:

#![allow(unused)]
fn main() {
PANIC at src/analysis/mod.rs:245:12: Type variable unification failed unexpectedly
}

The panic hook logs:

  • Panic message
  • File location (file:line:column)
  • Payload details

Error Propagation

Errors are logged at appropriate points with context:

ERROR Failed to open document uri="file:///invalid.py" error="File not found"
ERROR Failed to update document uri="file:///app.py" error="Invalid UTF-8 in document"

Environment Variables

RUST_LOG

Controls log level filtering using the env_filter syntax:

# Global level
RUST_LOG=debug

# Per-module levels
RUST_LOG=info,beacon_lsp::analysis=trace

# Multiple modules
RUST_LOG=beacon_lsp=debug,beacon_constraint=trace,tower_lsp=warn

LSP_LOG_PATH

Override the default log file location:

LSP_LOG_PATH=/var/log/beacon/custom.log cargo run --bin beacon-lsp

Default: logs/lsp.log

Output Formats

Text (Default)

Human-readable format with timestamps, levels, and targets:

2025-11-08T12:15:42.123Z INFO beacon_lsp::backend: Server initialized version="0.1.0"

JSON

Structured JSON output for machine parsing (configurable via LogFormat::Json):

{
  "timestamp": "2025-11-08T12:15:42.123Z",
  "level": "INFO",
  "target": "beacon_lsp::backend",
  "message": "Server initialized",
  "version": "0.1.0"
}

Release

In production, logging defaults to WARNING level:

  • Minimal performance overhead
  • Only errors and warnings logged
  • File rotation prevents unbounded growth
  • Daily rotation with date-stamped files

Log Rotation

Logs automatically rotate daily:

  • Current log: logs/lsp.log
  • Rotated logs: logs/lsp.log.2025-11-07, logs/lsp.log.2025-11-06, etc.

Coverage Gaps

The following areas have limited or no logging coverage (documented in roadmap):

  • Detailed symbol resolution tracing
  • Per-module diagnostic generation
  • Configuration hot-reload events (currently limited)
  • WebSocket/TCP transport logging (when using --tcp)
  • Stub cache operations and introspection queries

Testing Strategy

Beacon’s LSP crate includes both unit tests and async integration tests to ensure feature behaviour remains stable as the analyzer evolves.

Provider Unit Tests

Each feature module embeds targeted tests that construct in-memory documents via DocumentManager::new().

Common scenarios include rename edits across nested scopes, workspace symbol searches, and diagnostic generation for simple errors.

Because providers operate on real ASTs and symbol tables, these tests exercise production logic without needing a running language server.

Backend Integration Tests

Async tests spin up an in-process tower_lsp::LspService<Backend> to simulate client interactions.

They call methods like initialize, did_open, did_change, hover, and completion, asserting that responses match expectations and no panics occur.

This pattern verifies protocol wiring, capability registration, and shared state management without external tooling.

Command-line Checks

cargo check and cargo check --tests are run frequently for quick feedback.

cargo fmt --check enforces formatting consistency across Rust code.

Documentation changes are validated with mdbook build docs to catch broken links or syntax errors.

Current Limitations

The Beacon language server already covers core workflows but still has notable constraints. Understanding these limitations helps set expectations for contributors and users.

Open-Document Focus

Most features only inspect documents currently open in the editor.

Closed files are invisible until workspace indexing is implemented, so cross-project references or renames may miss targets.

Analyzer Coupling

Rename and references rely on a mix of AST traversal and simple heuristics; deep semantic queries across modules are not yet available.

Analyzer caches are invalidated wholesale after edits. Incremental typing work is on the roadmap but not implemented.

Performance

Tree-sitter reparses the entire document per change. While acceptable for small files, large modules may benefit from incremental parsing.

Workspace symbol searches iterate synchronously over all open documents, which can lag in large sessions.

Feature Gaps

Code actions support basic quick fixes (removing unused variables/imports, wrapping types with Optional) but many advanced refactorings remain unimplemented.

Formatting endpoints (textDocument/formatting, etc.) are unimplemented.

Configuration (Config) is still a stub and does not honor user settings.

Tooling Ergonomics

Error messages from the analyzer can be terse; improving diagnostics and logs is part of future work.

There is no persistence of analysis results across sessions, so large projects require recomputation on startup.

Next Steps

The following projects are planned to evolve Beacon’s language server from a solid MVP into a full-featured development companion.

Analyzer Integration

Tighten the connection between the LSP and analyzer so rename, references, and completions can operate across modules.

Cache analyzer results to avoid repeated full reanalysis after every edit.

Surface richer hover information (e.g., inferred types with provenance, docstrings).

Workspace Indexing

Build a background indexer that scans the workspace root, populating symbol data for unopened files.

Add file watchers to refresh indexes when on-disk files change outside the editor.

Support multi-root workspaces and remote development scenarios.

Tooling Enhancements

Implement formatting (textDocument/formatting, rangeFormatting) and integrate with Beacon's formatting rules.

Expand code actions beyond the current quick fixes (remove unused, wrap with Optional) to include:

  • Insert type annotations from inference
  • Add missing imports for undefined symbols
  • Implement missing protocol methods
  • Extract to function/method refactorings
  • Inline variable refactorings

Extend semantic tokens with modifier support (documentation, deprecated symbols) and align with editor theming.

Performance & Reliability

Adopt Tree-sitter’s incremental parsing to reduce reparse costs for large files.

Improve logging and telemetry so users can diagnose performance issues or protocol errors.

Harden handling of unexpected client input, ensuring the server degrades gracefully.

Documentation & Ecosystem

Publish editor-specific setup guides (VS Code, Neovim, Helix, Zed) alongside troubleshooting tips.

Automate documentation deployment (see deploy-docs workflow) and version docs with releases.

Encourage community extensions by documenting provider APIs and expected invariants.

Development Quick Start

Installation

Build from source:

cargo build --release

The CLI will be available at target/release/beacon.

Install system-wide:

cargo install --path crates/cli

This installs the beacon binary to ~/.cargo/bin.

Type Checking

Check Python files for type errors using Hindley-Milner inference:

# Check a file
beacon typecheck example.py

# Check with JSON output for CI
beacon typecheck --format json example.py

# Check from stdin
cat example.py | beacon typecheck

Language Server

Install beacon-lsp system-wide:

cargo install --path crates/server

This installs the beacon-lsp binary to ~/.cargo/bin, making it available in your PATH.

Start the LSP server for editor integration:

beacon-lsp

Or use the CLI:

beacon lsp

For debugging, start with file logging:

beacon lsp --log-file /tmp/beacon.log

Debug Tools

Debug builds include additional tools for inspecting the type system:

# Build in debug mode
cargo build

# View tree-sitter CST
target/debug/beacon debug tree example.py

# Show AST with inferred types
target/debug/beacon debug ast example.py

# Display generated constraints
target/debug/beacon debug constraints example.py

# Show unification results
target/debug/beacon debug unify example.py

Note: Debug commands are only available in debug builds (compiled with cargo build), not in release builds.

Full documentation: CLI Tools

Editor Extensions

Beacon supports VS Code, Zed, and Neovim through the Language Server Protocol.

See Editor Extensions Documentation for setup instructions.

Quick Links:

Project Structure

.
├─ crates/
│  ├─ cli/              # `beacon-cli` entry point with clap
│  ├─ server/           # `beacon-lsp` LSP server (tower-lsp or raw) using lsp-types
│  ├─ core/             # `beacon-core` type definitions, solver, unifier
│  ├─ constraints/      # `beacon-constraint` constraint generation
│  └─ parser/           # `beacon-parser` tree-sitter Python adapter
└── pkg/                # Editor extensions & plugins

Typeshed Integration

Beacon integrates Python standard library type stubs from the official python/typeshed repository. The stubs are embedded at build time using a git submodule, providing version-controlled, reproducible type information for the analyzer.

Current Version

Typeshed stubs are tracked as a git submodule at typeshed/. To check the current version:

cd typeshed
git log -1 --format='%H %ci %s'

The submodule points to stormlightlabs/typeshed-stdlib-mirror, which provides a flattened mirror of typeshed's stdlib and _typeshed directories.

Stub Lookup Architecture

Beacon uses a layered stub resolution system with the following priority order:

  1. Manual stubs - Configured via config.stub_paths (highest priority)
  2. Stub packages - Directories matching *-stubs pattern
  3. Inline stubs - .pyi files located alongside .py files
  4. Typeshed stubs - Embedded stdlib stubs (fallback)

Builtins are loaded upfront during initialization. Other modules are loaded on-demand during constraint generation when imports are encountered.

Updating Typeshed

The typeshed submodule can be updated to pull in newer stub definitions from upstream.

Check Available Updates

View recent commits from python/typeshed:

cd typeshed
./scripts/metadata.sh --limit 10

Options for filtering commits:

  • --since DATE - Show commits after date (YYYY-MM-DD)
  • --until DATE - Show commits before date (YYYY-MM-DD)
  • --author NAME - Filter by GitHub username or email
  • --grep PATTERN - Search commit messages
  • --sha-only - Output only commit SHAs

Update to Specific Version

Fetch stubs from a specific python/typeshed commit:

cd typeshed
./scripts/fetch.sh <commit-sha>

This fetches the specified commit, flattens the stdlib and _typeshed directories into stubs/, and creates COMMIT.txt with metadata.

Update to Latest

Fetch and commit the latest typeshed version:

cd typeshed
./scripts/metadata.sh --limit 1 --sha-only | xargs ./scripts/fetch.sh
./scripts/commit.sh
cd ..
git add typeshed
git commit -m "chore: update typeshed submodule"

Manual Commit

After fetching stubs, commit changes manually:

cd typeshed
git add stubs COMMIT.txt
git commit -m "Bump typeshed stdlib to <commit-sha>"
git push
cd ..
git add typeshed
git commit -m "chore: update typeshed submodule"

Build Integration

Typeshed stubs are embedded into the Beacon binary at compile time. The build process:

  1. Reads stub files from typeshed/stubs/ directory
  2. Embeds them using Rust's include_str! macro
  3. Makes stubs available via get_embedded_stub(module_name) API

No runtime network access or file system dependency is required for stdlib type information.

Custom Beacon Stubs

Beacon-specific stubs that extend or override standard library behavior are kept in crates/server/stubs/:

  • capabilities_support.pyi - Beacon-specific protocol definitions

These stubs have higher priority than embedded typeshed stubs due to the layered lookup system.

Testing

Stub resolution is tested through:

  • Unit tests - Verify layered stub lookup and module resolution
  • Integration tests - Validate stdlib type checking with typeshed stubs
  • Analyzer tests - Check method resolution through inheritance chains

Test fixtures that require stub files are located in crates/server/tests/fixtures/.

Benchmarking

Beacon uses Criterion for performance benchmarking across critical paths.

Running Benchmarks

Execute all benchmarks:

cargo bench

Run specific benchmark suite:

cargo bench --bench type_inference
cargo bench --bench parser_benchmark
cargo bench --bench lsp_handlers

Criterion generates HTML reports in target/criterion/ with detailed statistics, plots, and regression detection.

Benchmark Suites

Type Inference (beacon-core)

Located at crates/core/benches/type_inference.rs, this suite tests the core type system:

  • Simple Unification: Concrete types and type variables
  • Generic Types: Lists, dictionaries with varying complexity
  • Function Types: Simple, multi-argument, and generic functions
  • Nested Types: Lists nested to varying depths
  • Substitution: Composition and application with different sizes

Parser Performance (beacon-server)

Located at crates/server/benches/parser_benchmark.rs, tests parse operations:

  • File Size Scaling: Small, medium, and large Python files
  • AST Construction: Parse tree to AST transformation
  • Incremental Reparsing: Performance of incremental updates
  • Symbol Tables: Generation across different file sizes

LSP Handlers (beacon-server)

Located at crates/server/benches/lsp_handlers.rs, measures LSP operation latency:

  • Hover: Variable and function hover info generation
  • Completion: Attribute completion and in-body completion
  • Go to Definition: Navigation for variables, functions, and classes
  • Combined Operations: Parse + hover to measure end-to-end cost

Adding Benchmarks

Create new benchmark files in crates/{crate}/benches/ and register in Cargo.toml:

[[bench]]
name = "my_benchmark"
harness = false

Use Criterion's parametric benchmarks to test performance across different input sizes or scenarios.

Interpreting Results

Criterion provides:

  • Throughput and iteration time statistics
  • Confidence intervals
  • Regression detection against previous runs
  • Visual plots in HTML reports

Monitor the benchmark reports to catch performance regressions during development.

Tracing & Observability

Beacon uses the tracing ecosystem for structured logging and instrumentation.

Log Levels

  • error: Critical failures that prevent functionality
  • warn: Recoverable issues requiring attention
  • info: High-level lifecycle events and milestones
  • debug: Detailed operational information
  • trace: Fine-grained execution details

Enabling Logs

Set RUST_LOG environment variable:

# All debug logs
RUST_LOG=debug beacon lsp

# Module-specific logging
RUST_LOG=beacon_lsp=debug,beacon_core=info beacon lsp

# Trace-level for specific module
RUST_LOG=beacon_lsp::analysis=trace beacon lsp

Instrumentation Points

LSP Lifecycle

Key events in backend.rs:

  • Server initialization and configuration loading
  • Document lifecycle (open, change, close)
  • Shutdown requests

Document Processing

In document.rs:

  • Mode directive detection from file comments
  • Parse and reparse operations

Analysis Pipeline

Stages in analysis/mod.rs:

  • Constraint generation
  • Type inference execution
  • Analysis result caching

Diagnostic Generation

Each diagnostic phase in features/diagnostics.rs:

  • Parse errors
  • Linter diagnostics
  • Type checking errors
  • Unsafe Any warnings
  • Import validation
  • Variance checking

Caching

Cache operations across cache.rs:

  • Hit/miss tracking for type, introspection, and analysis caches
  • Invalidation events
  • Import dependency tracking

Workspace Operations

In workspace.rs:

  • File discovery and indexing
  • Stub loading from typeshed and configured paths
  • Dependency graph construction

Handler Instrumentation

LSP request handlers use #[tracing::instrument] macro for automatic span creation:

#![allow(unused)]
fn main() {
#[tracing::instrument(skip(self), level = "debug")]
async fn completion(&self, params: CompletionParams) -> Result<Option<CompletionResponse>> {
    // Automatically logs entry/exit with debug level
}
}

Typechecker

The typechecker implements a Hindley-Milner type inference engine with extensions for Python's type system. It performs constraint-based type inference with gradual typing support through the Any type.

How It Works

The typechecker operates in five phases:

  1. Parse source code into an Abstract Syntax Tree
  2. Resolve symbols and build scopes
  3. Generate type constraints by walking the AST
  4. Solve constraints using unification
  5. Apply final type substitutions

The core algorithm uses Robinson's unification with an occurs check, extended with Python-specific features like union types, protocols, and row-polymorphic records.

        ┌─────────────┐
        │Source Code  │
        └──────┬──────┘
               │
               ▼
        ┌─────────────┐
        │   Parser    │
        └──────┬──────┘
               │
               ▼
        ┌─────────────────────┐
        │ AST + Symbol Table  │
        └──────────┬──────────┘
                   │
                   ▼
        ┌──────────────────────┐
        │ Constraint Generator │
        └──────────┬───────────┘
                   │
                   ▼
        ┌──────────────────────┐
        │   Constraint Set     │
        └──────────┬───────────┘
                   │
                   ▼
        ┌──────────────────────┐
        │  Constraint Solver   │
        └──────────┬───────────┘
                   │
                   ▼
        ┌──────────────────────┐
        │     Unifier          │
        └──────────┬───────────┘
                   │
                   ▼
        ┌──────────────────────┐
        │  Type Substitutions  │
        └──────────┬───────────┘
                   │
                   ▼
        ┌──────────────────────┐
        │     Type Map         │
        └──────────┬───────────┘
                   │
                   ▼
        ┌──────────────────────┐
        │     Type Errors      │
        └──────────────────────┘

Type System

The type system supports:

  • Type variables with variance annotations
  • Type constructors with kind checking
  • Function types with keyword arguments
  • Union types for Python's | operator
  • Row-polymorphic records for structural typing
  • Protocol types for structural subtyping
  • Three special types: Any (gradual typing), Top (universal supertype), Never (bottom type)

Constraint Generation

The constraint generator walks the AST and produces constraints:

  • Equal(T1, T2) - types must be identical
  • HasAttr(T, name, AttrType) - attribute access
  • Call(FuncType, Args, Result) - function calls
  • Protocol(T, ProtocolName, Impl) - protocol conformance
  • MatchPattern(T, Pattern, Bindings) - pattern matching
  • Narrowing(var, predicate, T) - flow-sensitive typing
  • Join(var, types, T) - control flow merge points
                            ┌──────────┐
                            │ AST Node │
                            └─────┬────┘
                                  │
                                  ▼
                            ┌───────────┐
                            │ Node Type │
                            └─────┬─────┘
              ┌─────────────┬─────┼───────┬────────────┐
              │             │     │       │            │
      Variable│        Call │     │ Attr  │  If/Match  │
              │             │     │       │            │
              ▼             ▼     ▼       ▼            ▼
       ┌────────────┐ ┌──────────────┐  ┌──────────────┐
       │Lookup Type │ │ Call         │  │ HasAttr      │
       │            │ │ Constraint   │  │ Constraint   │
       └──────┬─────┘ └──────┬───────┘  └──────┬───────┘
              │              │                 │
              │              │                 │          ┌──────────────┐
              │              │                 │          │ Narrowing    │
              │              │                 │          │ Constraint   │
              │              │                 │          └──────┬───────┘
              │              │                 │                 │
              └──────────────┴─────────────────┴─────────────────┘
                                          │
                                          ▼
                                   ┌──────────────┐
                                   │Constraint Set│
                                   └──────────────┘

Constraint Solving

The solver processes constraints in order, applying unification:

  1. Process equality constraints via unification
  2. Resolve attribute access using type structure
  3. Check function call compatibility
  4. Verify protocol conformance via structural matching
  5. Handle pattern matching exhaustiveness
  6. Apply type narrowing in control flow
  7. Join types at control flow merge points

The unifier maintains a substitution map from type variables to concrete types, applying the occurs check to prevent infinite types.

Limitations

Value restriction prevents generalization of mutable references, which can be overly conservative for some Python patterns.

Protocol checking handles basic structural conformance but doesn't fully support complex inheritance hierarchies with method overriding.

Type narrowing in conditionals provides basic flow-sensitive typing but lacks sophisticated constraint propagation for complex boolean expressions.

Performance degrades on files exceeding 5000 lines, though scope-level caching mitigates this for incremental edits.

The gradual typing Any type bypasses type checking, which can hide errors when overused.

Key Files

crates/
├── core/
│   ├── types.rs        # Type system implementation
│   ├── unify.rs        # Unification algorithm
│   └── subst.rs        # Type substitution
├── constraints/
│   └── solver.rs       # Constraint solver
└── analyzer/
    └── walker/mod.rs   # Constraint generation

Static Analyzer

The static analyzer performs control flow and data flow analysis to detect code quality issues beyond type errors. It builds a Control Flow Graph and runs analyses to find unreachable code, use-before-definition errors, and unused variables.

How It Works

The analyzer operates in three phases:

  1. Build Control Flow Graph from the AST
  2. Perform data flow analyses on the CFG
  3. Generate diagnostics from analysis results

The CFG captures all possible execution paths through a function, including normal flow, exception handling, loops, and early returns.

                                    ┌─────────────────────┐
                                    │        AST          │
                                    └──────────┬──────────┘
                                               │
                                               ▼
                                    ┌─────────────────────┐
                                    │    CFG Builder      │
                                    └──────────┬──────────┘
                                               │
                                               ▼
                                    ┌─────────────────────┐
                                    │ Control Flow Graph  │
                                    └──────────┬──────────┘
                                               │
                                               ▼
                                    ┌─────────────────────┐
                                    │ Data Flow Analyzer  │
                                    └──────────┬──────────┘
                                               │
                                               ▼
                        ┌──────────────────────┴──────────────────────┐
                        │          Analysis Type                      │
                        └─┬─────────────────┬──────────────────────┬──┘
                          │                 │                      │
              ┌───────────▼──────┐  ┌───────▼────────┐  ┌──────────▼──────────┐
              │   Reachability   │  │    Use-Def     │  │     Liveness        │
              └───────────┬──────┘  └───────┬────────┘  └─────────┬───────────┘
                          │                 │                     │
                          │    ┌────────────▼─────────┐           │
                          └────►      Diagnostics     ◄───────────┘
                               └──────────────────────┘

Control Flow Graph

Each function is converted into a CFG with basic blocks and edges:

  • Basic blocks contain sequential statements
  • Edges represent control flow with kinds: Normal, True, False, Exception, Break, Continue, Finally
  • Entry and exit blocks mark function boundaries
  • Loop headers have back edges for iteration
                    ┌─────────────────┐
                    │  Entry Block    │
                    └────────┬────────┘
                             │
                             ▼
                    ┌────────────────┐
                ┌───┤  If Statement  ├───┐
                │   └────────────────┘   │
           True │                        │ False
                ▼                        ▼
         ┌────────────┐           ┌────────────┐
         │ Then Block │           │ Else Block │
         └──────┬─────┘           └──────┬─────┘
                │                        │
                └────────┬───────────────┘
                         ▼
                  ┌─────────────┐
                  │ Merge Block │
                  └──────┬──────┘
                         │
                         ▼
                  ┌─────────────┐
              ┌───┤ While Loop  ├──┐
              │   └─────────────┘  │
         True │          ▲         │ False
              │          │         │
              ▼          │         ▼
         ┌─────────┐     │    ┌──────────┐
         │Loop Body├─────┘    │Exit Block│
         │         ├──────────►          │
         └─────────┘  Break   └──────────┘
     Normal│     │Continue
           └─────┘

CFG Construction

The builder walks the AST recursively:

  • If/Elif/Else creates test blocks with True/False edges to branches, merging at a common successor
  • For/While loops create headers with back edges from the body and break edges to exit
  • Try/Except creates exception edges from try blocks to each handler
  • Return/Raise statements jump to the function exit, passing through finally blocks if present
  • Break/Continue statements jump to the appropriate loop boundary

Context tracking maintains loop depth and finally block stacks to generate correct edges.

Data Flow Analyses

Use-Before-Definition detection performs forward analysis, tracking which variables are defined at each point. Any use of an undefined variable generates a diagnostic.

Unreachable code detection marks blocks reachable from the entry via depth-first search. Unreachable blocks produce warnings.

Unused variable detection tracks variable definitions and uses. Variables defined but never read generate warnings.

Hoisting analysis collects function and class definitions that are available before their textual position, matching Python's hoisting semantics.

Limitations

CFG construction is function-scoped only. Module-level control flow graphs are not yet implemented, limiting whole-program analyses.

Exception flow is simplified and doesn't track exception types through the CFG. All exception handlers are treated uniformly.

Generator functions and async/await constructs have limited support. Yield points and async boundaries don't create proper CFG edges.

Class method CFGs don't track inheritance or method resolution order, which limits cross-method data flow analysis.

Key Files

crates/
├── analyzer/src/
│   ├── cfg.rs           # Control Flow Graph construction
│   └── data_flow.rs     # Data flow analyses
└── server/src/
    └── analysis/mod.rs  # Analysis orchestration

Formatter

The formatter provides PEP8-compliant code formatting with configurable style options. It transforms Python source code into a consistent style while preserving comments and semantic meaning.

How It Works

The formatter operates in four phases:

  1. Parse source into AST and extract comments
  2. Sort imports by category
  3. Generate token stream from AST
  4. Apply formatting rules and write output

The formatter uses tree-sitter for parsing and comment extraction, ensuring accurate preservation of all source elements.

┌─────────────┐   ┌────────┐   ┌────────────────┐   ┌───────────────┐
│Source Code  ├──►│ Parser ├──►│ AST + Comments ├──►│ Import Sorter │
└─────────────┘   └────────┘   └────────────────┘   └───────┬───────┘
                                                             │
                                                             ▼
┌─────────────────┐   ┌──────────────┐   ┌──────────────────────────┐
│Formatted Output │◄──┤Formatting    │◄──┤Token Stream Generator    │
│                 │   │Writer        │   │                          │
└─────────────────┘   └──────────────┘   └────────┬─────────────────┘
                                                   │
                                                   ▼
                                          ┌─────────────────┐
                                          │  Token Stream   │
                                          └─────────────────┘

Import Sorting

Imports are categorized and sorted:

  1. Future imports from __future__
  2. Standard library imports
  3. Third-party package imports
  4. Local project imports

Within each category, imports are sorted alphabetically. This matches the style of Black and isort.

                        ┌─────────────┐
                        │All Imports  │
                        └──────┬──────┘
                               │
                               ▼
                        ┌─────────────┐
                        │ Categorize  │
                        └──────┬──────┘
              ┌────────────────┼────────────────┐
              │                │                │
              ▼                ▼                ▼                ▼
         ┌────────┐       ┌────────┐      ┌────────────┐   ┌───────┐
         │ Future │       │ Stdlib │      │Third-Party │   │ Local │
         └────┬───┘       └────┬───┘      └──────┬─────┘   └───┬───┘
              │                │                 │             │
              └────────────────┴─────────────────┴─────────────┘
                                      │
                                      ▼
                           ┌────────────────────┐
                           │Sort Alphabetically │
                           └──────────┬─────────┘
                                      │
                                      ▼
                           ┌────────────────────┐
                           │Formatted Imports   │
                           └────────────────────┘

Token Stream Generation

The AST is converted to a stream of tokens:

  • Keywords: def, class, if, for, etc.
  • Identifiers: variable and function names
  • Literals: strings, numbers, booleans
  • Operators: +, -, ==, and, etc.
  • Delimiters: (, ), [, ], :, ,
  • Whitespace: newlines, indents, dedents

Comments are attached to appropriate tokens based on their position in the source.

Formatting Rules

The writer applies formatting rules:

  • Indentation uses 4 spaces by default, configurable
  • Line length defaults to 88 characters, configurable
  • Two blank lines separate top-level definitions
  • One blank line separates method definitions
  • String quotes normalize to double quotes
  • Trailing commas added in multi-line structures
  • Operators surrounded by single spaces
  • No spaces inside brackets/parentheses
                          ┌──────────────┐
                          │ Token Stream │
                          └──────┬───────┘
                                 │
                                 ▼
                          ┌──────────────┐
                          │  Token Type  │
                          └──────┬───────┘
              ┌──────────────────┼──────────────────┬─────────────┐
              │                  │                  │             │
        Indent│            Newline│           String│   Delimiter │
              │                  │                  │             │
              ▼                  ▼                  ▼             ▼
      ┌───────────────┐  ┌──────────────┐  ┌──────────────┐  ┌─────────────┐
      │Apply Indent   │  │Check Line    │  │Normalize     │  │Add/Remove   │
      │Width          │  │Length        │  │Quotes        │  │Spaces       │
      └───────┬───────┘  └──────┬───────┘  └──────┬───────┘  └──────┬──────┘
              │                 │                 │                 │
              └─────────────────┴─────────────────┴─────────────────┘
                                        │
                                        ▼
                                ┌───────────────┐
                                │Output Buffer  │
                                └───────┬───────┘
                                        │
                                        ▼
                                ┌───────────────┐
                                │Formatted Text │
                                └───────────────┘

Caching

The formatter uses two cache layers:

Short-circuit cache checks if the source is already formatted by hashing the output. If the hash matches, formatting is skipped entirely.

Result cache stores formatted output keyed by source hash, configuration, and range. This accelerates repeated formatting of the same code.

Both caches use LRU eviction with configurable size limits.

Configuration

Formatting behavior is controlled via beacon.toml or pyproject.toml:

[tool.beacon.formatting]
line_length = 88
indent_size = 4
normalize_quotes = true
trailing_commas = true

Suppression comments disable formatting:

# beacon: fmt: off
ugly_code = {"a":1,"b":2}
# beacon: fmt: on

Limitations

Comment preservation is best-effort. Complex nested structures with interleaved comments may lose some comments or place them incorrectly.

Line length is a soft limit. Some constructs like long string literals or deeply nested expressions may exceed the configured limit.

Format-on-type is basic and only reformats the current statement plus surrounding context. It doesn't perform whole-file formatting.

The formatter doesn't understand semantic equivalence, so it may format code in ways that change behavior for dynamic features like globals() manipulation.

Key Files

crates/server/src/formatting/
├── mod.rs          # Main formatter
├── token_stream.rs # Token generation
├── writer.rs       # Output writer
├── rules.rs        # Formatting rules
├── import.rs       # Import sorting
└── cache.rs        # Result caching

Linter

The linter performs static code quality checks through 30 rules covering imports, control flow, naming, style, and type usage. It detects common Python mistakes and enforces best practices.

How It Works

The linter operates in three phases:

  1. Walk the AST tracking context (function depth, loop depth, imports, etc.)
  2. Check symbol table for unused imports and undefined names
  3. Filter diagnostics by suppression comments

Each rule is identified by a code from BEA001 to BEA030.

              ┌──────────────────────┐
              │ AST + Symbol Table   │
              └──────────┬───────────┘
                         │
                         ▼
              ┌──────────────────────┐
              │ Linter Context Init  │
              └──────────┬───────────┘
                         │
                         ▼
              ┌──────────────────────┐
              │    AST Visitor       │
              └──────────┬───────────┘
                         │
                         ▼
              ┌──────────────────────┐
              │     Node Type        │
              └─┬──────┬────┬────┬──┘
                │      │    │    │
    ┌───────────┘      │    │    └───────────┐
    │                  │    │                │
    │ Import      Loop │    │ Function  Name │
    │                  │    │                │
    ▼                  ▼    ▼                ▼
┌─────────┐    ┌─────────────┐    ┌─────────────┐
│Check    │    │Check Loop   │    │Check Name   │
│Import   │    │Rules        │    │Rules        │
│Rules    │    ├─────────────┤    └──────┬──────┘
└────┬────┘    │Check        │           │
     │         │Function     │           │
     │         │Rules        │           │
     │         └──────┬──────┘           │
     │                │                  │
     └────────────────┴──────────────────┘
                      │
                      ▼
           ┌──────────────────┐
           │   Diagnostics    │
           └──────────┬───────┘
                      │
                      ▼
           ┌──────────────────┐
           │Suppression Filter│
           └──────────┬───────┘
                      │
                      ▼
           ┌──────────────────┐
           │Final Diagnostics │
           └──────────────────┘

Rule Categories

Import rules check for star imports, unused imports, and duplicate imports.

Control flow rules detect break/continue outside loops, return/yield outside functions, and unreachable code after control flow statements.

Name rules find undefined variables, duplicate arguments, improper global/nonlocal usage, and shadowing of builtins.

Style rules flag redundant pass statements, assert on tuples, percent format issues, forward annotations, bare except clauses, and identity comparisons with literals.

Type rules validate except handler types, detect constant conditionals, check for duplicate branches, find loop variable overwrites, and verify dataclass and protocol patterns.

                              ┌───────┐
                              │ Rules │
                              └───┬───┘
                  ┌───────────────┼──────────────────┬─────────────┐
                  │               │                  │             │
                  ▼               ▼                  ▼             ▼
         ┌────────────┐  ┌──────────────┐  ┌────────────┐  ┌──────────┐
         │  Import    │  │Control Flow  │  │   Names    │  │  Style   │
         │BEA001-003  │  │ BEA004-008   │  │BEA009-014  │  │BEA015-020│
         └────────────┘  └──────────────┘  └────────────┘  └────┬─────┘
                                                                 │
                                                                 ▼
                                                          ┌────────────┐
                                                          │    Type    │
                                                          │ BEA021-027 │
                                                          └────────────┘

Context Tracking

The linter maintains context while walking the AST:

Function depth tracks nesting level to detect return/yield outside functions. Loop depth tracks loop nesting to validate break/continue. Class depth tracks class nesting for dataclass and protocol checks.

Import tracking records all imported names to detect unused imports. Loop variable tracking identifies which variables are bound by loop iteration. Global and nonlocal declaration tracking validates proper usage.

Assigned variable tracking finds variables written to, enabling unused variable detection. Dataclass and protocol scope tracking identifies decorated classes for specialized rules.

Function Context:

    ┌────────────────┐
    │ Enter Function │
    └───────┬────────┘
            │
            ▼
    ┌──────────────────────┐
    │Increment Function    │
    │Depth                 │
    └───────┬──────────────┘
            │
            ▼
    ┌──────────────────────┐
    │Visit Function Body   │
    └───────┬──────────────┘
            │
            ├──────────────────────────┐
            │                          │
            ▼                          ▼
    ┌──────────────┐         ┌───────────────────┐
    │Found Return? │         │Decrement Function │
    └───────┬──────┘         │Depth              │
            │                └───────────────────┘
    ┌───────┴────────┐
    │                │
    ▼                ▼
┌────────┐    ┌─────────────────────────┐
│OK      │    │Error: Return Outside    │
│(Depth  │    │Function (Depth = 0)     │
│> 0)    │    │                         │
└────────┘    └─────────────────────────┘

Loop Context:

    ┌────────────────┐
    │  Enter Loop    │
    └───────┬────────┘
            │
            ▼
    ┌──────────────────────┐
    │Increment Loop Depth  │
    └───────┬──────────────┘
            │
            ▼
    ┌──────────────────────┐
    │Visit Loop Body       │
    └───────┬──────────────┘
            │
            ├──────────────────────────┐
            │                          │
            ▼                          ▼
    ┌──────────────┐         ┌───────────────────┐
    │Found Break?  │         │Decrement Loop     │
    └───────┬──────┘         │Depth              │
            │                └───────────────────┘
    ┌───────┴────────┐
    │                │
    ▼                ▼
┌────────┐    ┌─────────────────────────┐
│OK      │    │Error: Break Outside     │
│(Depth  │    │Loop (Depth = 0)         │
│> 0)    │    │                         │
└────────┘    └─────────────────────────┘

Suppression

Rules can be suppressed with comments:

# beacon: ignore[BEA001]  # Suppress specific rule
from module import *

# beacon: ignore  # Suppress all rules on next line
undefined_variable

The suppression map tracks which lines have suppressions and which rules are disabled.

Symbol Table Integration

After AST traversal, the linter checks the symbol table:

Unused imports are detected by finding symbols marked as imported but never referenced. Undefined names are found by checking all name references against the symbol table. Shadowing detection compares local names against Python builtins.

Limitations

Some rules use pattern matching on decorators and don't verify actual inheritance or runtime behavior. For example, dataclass detection looks for the decorator but doesn't confirm the class truly uses the dataclass module.

Constant evaluation is limited to simple literal expressions. Complex constant folding involving functions or dynamic attributes is not supported.

Control flow analysis for unreachable code is basic and may miss some cases that require interprocedural analysis.

Exception type checking in except handlers uses name-based heuristics and doesn't perform full type analysis.

The linter doesn't track data flow across statements, so it may miss patterns like conditional initialization followed by usage.

Key Files

crates/analyzer/src/
├── linter.rs     # Main linter implementation
├── rules.rs      # Rule definitions
└── const_eval.rs # Constant evaluation

LSP Implementation

The LSP server orchestrates all analysis components and exposes them through the Language Server Protocol. It provides 15+ features including diagnostics, hover, completion, goto definition, and formatting.

How It Works

The server operates as a JSON-RPC service:

  1. Initialize server with client capabilities
  2. Manage document lifecycle (open, change, close)
  3. Respond to feature requests (hover, completion, etc.)
  4. Publish diagnostics when documents change
  5. Index workspace for cross-file features

The backend uses tower-lsp for protocol handling and implements the LanguageServer trait.

                        ┌───────────────┐
                        │ Editor Client │
                        └──────┬────────┘
                               │ JSON-RPC
                               ▼
                        ┌─────────────┐
                        │ LSP Backend │
                        └──────┬──────┘
                   ┌───────────┼───────────┬────────────┐
                   │           │           │            │
                   ▼           ▼           ▼            ▼
         ┌─────────────┐ ┌─────────┐ ┌──────────┐ ┌─────────────────┐
         │  Document   │ │Analyzer │ │Workspace │ │Feature Providers│
         │  Manager    │ └────┬────┘ └────┬─────┘ └────┬──────┬─────┘
         └──────┬──────┘      │           │            │      │
                │             │           │            │      │
      ┌─────────┼─────────┐   │           │            │      │
      │         │         │   │           │            │      │
      ▼         ▼         ▼   ▼           ▼            ▼      ▼
┌──────────┐ ┌──────────┐ ┌────────┐ ┌───────────┐ ┌──────────────────────┐
│didOpen   │ │didChange │ │didClose│ │Diagnostics│ │Hover, Completion,    │
│          │ │          │ │        │ │           │ │Definition, +13 More  │
└──────────┘ └──────────┘ └────────┘ └───────────┘ └──────────────────────┘

Document Management

The document manager tracks all open files:

When a document opens, the manager stores the URI, version, and text. When a document changes, it applies incremental or full text updates. When a document closes, it removes the document from tracking.

Each document is parsed into an AST and symbol table on demand. Parse results are cached until the document changes.

    ┌─────────┐     ┌────────────────┐     ┌───────┐     ┌───────────┐
    │ didOpen ├────►│ Store Document ├────►│ Parse ├────►│ Cache AST │
    └─────────┘     └────────────────┘     └───────┘     └───────────┘
                                              ▲
                                              │
    ┌───────────┐   ┌───────────────┐     ┌───┴────────────────┐
    │ didChange ├──►│ Apply Changes ├────►│  Invalidate Cache  │
    └───────────┘   └───────────────┘     └────────────────────┘

    ┌──────────┐    ┌─────────────────┐   ┌─────────────┐
    │ didClose ├───►│ Remove Document ├──►│ Clear Cache │
    └──────────┘    └─────────────────┘   └─────────────┘

Analysis Orchestration

The analyzer coordinates all analysis phases:

  1. Retrieve cached results if available
  2. Parse source into AST
  3. Generate and solve type constraints
  4. Build CFG and run data flow analysis
  5. Run linter rules
  6. Cache results by URI and version
  7. Return combined diagnostics

Caching occurs at multiple levels. Full document analysis is cached by URI and version. Individual function scope analysis is cached by content hash for incremental updates.

                        ┌─────────────────┐
                        │ Document Change │
                        └────────┬────────┘
                                 │
                                 ▼
                        ┌────────────────┐
                        │  Cache Hit?    │
                        └───┬────────┬───┘
                   Yes      │        │      No
                   ┌────────┘        └────────┐
                   │                          │
                   ▼                          ▼
         ┌──────────────────┐        ┌──────────────┐
         │  Return Cached   │        │  Parse AST   │
         │  Result          │        └──────┬───────┘
         └──────────────────┘               │
                                            ▼
                                   ┌──────────────────────┐
                                   │ Generate Constraints │
                                   └──────────┬───────────┘
                                              │
                                              ▼
                                   ┌──────────────────────┐
                                   │  Solve Constraints   │
                                   └──────────┬───────────┘
                                              │
                                              ▼
                                   ┌──────────────────────┐
                                   │     Build CFG        │
                                   └──────────┬───────────┘
                                              │
                                              ▼
                                   ┌──────────────────────┐
                                   │ Data Flow Analysis   │
                                   └──────────┬───────────┘
                                              │
                                              ▼
                                   ┌──────────────────────┐
                                   │    Run Linter        │
                                   └──────────┬───────────┘
                                              │
                                              ▼
                                   ┌──────────────────────┐
                                   │ Combine Diagnostics  │
                                   └──────────┬───────────┘
                                              │
                                              ▼
                                   ┌──────────────────────┐
                                   │   Cache Result       │
                                   └──────────┬───────────┘
                                              │
                                              ▼
                                   ┌──────────────────────┐
                                   │ Publish Diagnostics  │
                                   └──────────────────────┘

Feature Providers

Each LSP feature is implemented by a dedicated provider:

Diagnostics provider combines parse errors, type errors, linter warnings, and static analysis results into a unified diagnostic list.

Hover provider looks up the symbol at the cursor position in the type map and formats type information for display.

Completion provider searches the symbol table and workspace index for completions, filtering by prefix and ranking by relevance.

Goto definition provider resolves the symbol to its definition location using the symbol table and workspace index.

References provider finds all uses of a symbol across the workspace.

Rename provider validates the rename and computes workspace edits.

Code actions provider offers quick fixes for diagnostics like adding imports or suppressing linter rules.

Semantic tokens provider generates syntax highlighting tokens from the AST.

Inlay hints provider shows inferred types inline in the editor.

Signature help provider displays function parameter information during calls.

Document symbols provider generates an outline tree for the file.

Workspace symbols provider searches all symbols across the workspace.

Folding range provider computes collapsible regions for functions, classes, and imports.

Formatting providers apply code formatting to documents or ranges.

                          ┌─────────────┐
                          │ LSP Request │
                          └──────┬──────┘
                                 │
                                 ▼
                          ┌──────────────┐
                          │ Request Type │
                          └──────┬───────┘
                  ┌──────────────┼──────────────┬─────────────┐
                  │              │              │             │
         Hover    │   Completion │  Definition  │  References │  Formatting
                  │              │              │             │
                  ▼              ▼              ▼             ▼             ▼
         ┌────────────┐  ┌──────────────┐  ┌────────────┐  ┌────────────┐  ┌─────────────┐
         │Type Lookup │  │Symbol Search │  │Symbol      │  │Usage Search│  │Format Code  │
         │            │  │              │  │Resolution  │  │            │  │             │
         └──────┬─────┘  └──────┬───────┘  └──────┬─────┘  └──────┬─────┘  └──────┬──────┘
                │               │                 │               │               │
                └───────────────┴─────────────────┴───────────────┴───────────────┘
                                                  │
                                                  ▼
                                           ┌─────────────┐
                                           │  Response   │
                                           └─────────────┘

Workspace Indexing

The workspace maintains a global index:

On initialization, the server indexes all Python files in the workspace. The index maps module names to file paths and tracks imported symbols. Import resolution uses the index to find definitions in other files.

The module resolver handles Python's import semantics, searching sys.path and resolving relative imports based on package structure.

Configuration

Server behavior is controlled via configuration files:

[tool.beacon.type_checking]
mode = "balanced"  # strict/balanced/relaxed
unsafe_any_depth = 2

[tool.beacon.linting]
enabled = true

[tool.beacon.formatting]
line_length = 88
indent_size = 4

Configuration can be specified in beacon.toml, pyproject.toml, or sent via LSP workspace/didChangeConfiguration.

Limitations

Workspace indexing is synchronous during initialization, which can cause delays on large projects with thousands of Python files.

Multi-root workspaces are not supported. Only a single workspace root is handled.

Some features degrade on files exceeding 10,000 lines of code due to parse and analysis time.

Configuration changes require server restart for some settings. Dynamic reconfiguration is not fully implemented.

Cross-file analysis is limited to import resolution and symbol lookup. Whole-program type inference is not performed.

Memory usage grows with workspace size as the index and caches expand. No automatic memory management or eviction exists for the workspace index.

Key Files

crates/server/src/
├── backend.rs              # Main LSP backend
├── lib.rs                  # Server entry point
└── features/
    ├── diagnostics.rs      # Diagnostic generation
    ├── hover.rs            # Hover information
    ├── completion/         # Auto-completion
    ├── goto_definition.rs  # Jump to definition
    ├── references.rs       # Find references
    ├── rename.rs           # Rename symbol
    ├── code_actions.rs     # Quick fixes
    ├── semantic_tokens.rs  # Syntax highlighting
    └── formatting.rs       # Code formatting

Editor Extensions

Beacon provides language server integration for multiple editors through the Language Server Protocol (LSP). All extensions communicate with the same beacon-lsp server, ensuring feature parity across editors.

Supported Editors

VS Code

Full-featured extension with settings UI and marketplace distribution (planned).

VS Code Extension Documentation

Zed

Native WebAssembly extension with TOML-based configuration.

Zed Extension Documentation

Neovim

Native LSP client integration using built-in LSP support.

Neovim Setup (see below)

Other LSP-Compatible Editors

Beacon works with any editor supporting the Language Server Protocol. See Manual Setup for configuration.

Installation

Prerequisites

All editors require beacon-lsp to be installed and available in your PATH:

# Install from source
cargo install --path crates/server

# Verify installation
which beacon-lsp

Ensure ~/.cargo/bin is in your PATH:

# Add to ~/.zshrc or ~/.bashrc
export PATH="$HOME/.cargo/bin:$PATH"

Neovim Integration

Neovim has built-in LSP support starting from version 0.5.0. Beacon integrates seamlessly with Neovim's native LSP client.

Requirements

  • Neovim ≥ 0.8.0 (recommended 0.10.0+)
  • beacon-lsp installed and in PATH
  • nvim-lspconfig plugin (optional but recommended)

Setup with nvim-lspconfig

Using nvim-lspconfig:

-- ~/.config/nvim/lua/plugins/lsp.lua or init.lua

local lspconfig = require('lspconfig')
local configs = require('lspconfig.configs')

-- Register beacon-lsp if not already registered
if not configs.beacon then
  configs.beacon = {
    default_config = {
      cmd = { 'beacon-lsp' },
      filetypes = { 'python' },
      root_dir = function(fname)
        return lspconfig.util.root_pattern(
          'beacon.toml',
          'pyproject.toml',
          '.git'
        )(fname) or lspconfig.util.path.dirname(fname)
      end,
      settings = {},
      init_options = {
        typeChecking = {
          mode = 'balanced', -- 'strict', 'balanced', or 'relaxed'
        },
        python = {
          version = '3.12',
          stubPaths = { 'stubs', 'typings' },
        },
        workspace = {
          sourceRoots = {},
          excludePatterns = { '**/venv/**', '**/.venv/**' },
        },
        inlayHints = {
          enable = true,
          variableTypes = true,
          functionReturnTypes = true,
          parameterNames = false,
        },
        diagnostics = {
          unresolvedImports = 'warning',
          circularImports = 'warning',
        },
        advanced = {
          incremental = true,
          workspaceAnalysis = true,
          enableCaching = true,
          cacheSize = 100,
        },
      },
    },
  }
end

-- Setup beacon-lsp
lspconfig.beacon.setup({
  on_attach = function(client, bufnr)
    -- Enable completion
    vim.api.nvim_buf_set_option(bufnr, 'omnifunc', 'v:lua.vim.lsp.omnifunc')

    -- Keybindings
    local opts = { noremap = true, silent = true, buffer = bufnr }
    vim.keymap.set('n', 'gD', vim.lsp.buf.declaration, opts)
    vim.keymap.set('n', 'gd', vim.lsp.buf.definition, opts)
    vim.keymap.set('n', 'K', vim.lsp.buf.hover, opts)
    vim.keymap.set('n', 'gi', vim.lsp.buf.implementation, opts)
    vim.keymap.set('n', '<C-k>', vim.lsp.buf.signature_help, opts)
    vim.keymap.set('n', '<space>wa', vim.lsp.buf.add_workspace_folder, opts)
    vim.keymap.set('n', '<space>wr', vim.lsp.buf.remove_workspace_folder, opts)
    vim.keymap.set('n', '<space>wl', function()
      print(vim.inspect(vim.lsp.buf.list_workspace_folders()))
    end, opts)
    vim.keymap.set('n', '<space>D', vim.lsp.buf.type_definition, opts)
    vim.keymap.set('n', '<space>rn', vim.lsp.buf.rename, opts)
    vim.keymap.set({ 'n', 'v' }, '<space>ca', vim.lsp.buf.code_action, opts)
    vim.keymap.set('n', 'gr', vim.lsp.buf.references, opts)
    vim.keymap.set('n', '<space>f', function()
      vim.lsp.buf.format({ async = true })
    end, opts)

    -- Enable inlay hints (Neovim 0.10+)
    if client.server_capabilities.inlayHintProvider then
      vim.lsp.inlay_hint.enable(true, { bufnr = bufnr })
    end
  end,
  capabilities = require('cmp_nvim_lsp').default_capabilities(),
})

Manual Setup (Without nvim-lspconfig)

For minimal configuration without plugins:

-- ~/.config/nvim/init.lua

vim.api.nvim_create_autocmd('FileType', {
  pattern = 'python',
  callback = function()
    vim.lsp.start({
      name = 'beacon-lsp',
      cmd = { 'beacon-lsp' },
      root_dir = vim.fs.dirname(
        vim.fs.find({ 'beacon.toml', 'pyproject.toml', '.git' }, {
          upward = true,
        })[1]
      ),
      settings = {
        typeChecking = { mode = 'balanced' },
        python = { version = '3.12' },
        inlayHints = { enable = true },
      },
    })
  end,
})

-- Keybindings
vim.api.nvim_create_autocmd('LspAttach', {
  callback = function(args)
    local opts = { buffer = args.buf }
    vim.keymap.set('n', 'gd', vim.lsp.buf.definition, opts)
    vim.keymap.set('n', 'K', vim.lsp.buf.hover, opts)
    vim.keymap.set('n', 'gr', vim.lsp.buf.references, opts)
    vim.keymap.set('n', '<space>rn', vim.lsp.buf.rename, opts)
    vim.keymap.set('n', '<space>ca', vim.lsp.buf.code_action, opts)
  end,
})

LazyVim Setup

For LazyVim users:

-- ~/.config/nvim/lua/plugins/beacon.lua

return {
  {
    'neovim/nvim-lspconfig',
    opts = {
      servers = {
        beacon = {
          cmd = { 'beacon-lsp' },
          filetypes = { 'python' },
          root_dir = function(fname)
            local util = require('lspconfig.util')
            return util.root_pattern('beacon.toml', 'pyproject.toml', '.git')(fname)
          end,
          settings = {
            typeChecking = { mode = 'balanced' },
            python = { version = '3.12' },
            inlayHints = {
              enable = true,
              variableTypes = true,
              functionReturnTypes = true,
            },
          },
        },
      },
    },
  },
}

Kickstart.nvim Setup

For kickstart.nvim users:

-- Add to your init.lua servers table

local servers = {
  -- ... other servers
  beacon = {
    cmd = { 'beacon-lsp' },
    filetypes = { 'python' },
    settings = {
      typeChecking = { mode = 'balanced' },
      python = { version = '3.12' },
    },
  },
}

Completion Support

Beacon works with popular completion plugins:

nvim-cmp

-- ~/.config/nvim/lua/plugins/completion.lua

local cmp = require('cmp')
local lspconfig = require('lspconfig')

lspconfig.beacon.setup({
  capabilities = require('cmp_nvim_lsp').default_capabilities(),
})

cmp.setup({
  sources = {
    { name = 'nvim_lsp' },
    { name = 'buffer' },
    { name = 'path' },
  },
})

coq_nvim

local coq = require('coq')
lspconfig.beacon.setup(coq.lsp_ensure_capabilities())

Diagnostics Configuration

Customize diagnostic display:

-- Configure diagnostics display
vim.diagnostic.config({
  virtual_text = {
    prefix = '●',
    source = 'beacon',
  },
  signs = true,
  underline = true,
  update_in_insert = false,
  severity_sort = true,
})

-- Custom diagnostic signs
local signs = { Error = '✘', Warn = '⚠', Hint = '󰌶', Info = 'ℹ' }
for type, icon in pairs(signs) do
  local hl = 'DiagnosticSign' .. type
  vim.fn.sign_define(hl, { text = icon, texthl = hl, numhl = hl })
end

Inlay Hints

Enable inlay hints (Neovim 0.10+):

-- Enable inlay hints globally
vim.lsp.inlay_hint.enable(true)

-- Toggle inlay hints with a keybinding
vim.keymap.set('n', '<leader>th', function()
  vim.lsp.inlay_hint.enable(not vim.lsp.inlay_hint.is_enabled())
end, { desc = 'Toggle Inlay Hints' })

Workspace Configuration

Override settings per project using beacon.toml in your project root.

See Configuration Documentation for complete details on all available options and TOML structure.

Manual Setup

For editors not listed above, configure your LSP client to:

  1. Command: beacon-lsp
  2. File Types: python
  3. Root Patterns: beacon.toml, pyproject.toml, .git
  4. Communication: stdio (stdin/stdout)

Example Configuration

{
    "command": "beacon-lsp",
    "filetypes": ["python"],
    "rootPatterns": ["beacon.toml", "pyproject.toml", ".git"],
    "settings": {
        "typeChecking": { "mode": "balanced" },
        "python": { "version": "3.12" }
    }
}

Feature Comparison

FeatureVS CodeZedNeovimOther
Diagnostics
Hover
Completions
Go to Definition
Find References
Document Symbols
Workspace Symbols
Semantic Tokens
Inlay Hints✓ (0.10+)
Code Actions
Rename
Folding Ranges
Document Highlight
Signature Help
Settings UI---
MarketplacePlanned---

All editors share the same language server, ensuring consistent behavior and feature parity.

Configuration

See Configuration Documentation for complete details.

Resources

VS Code Extension

The Beacon VS Code extension (pkg/vscode/) pairs the Rust language server with the VSCode UI. It activates automatically for Python files and forwards editor requests to the Beacon LSP binary.

Feature Highlights

  • On-type diagnostics for syntax and type errors
  • Hover tooltips with type information
  • Go to definition & find references
  • Document and workspace symbols
  • Semantic tokens for enhanced highlighting
  • Identifier completions and inlay hints
  • (Scaffolded) code actions for quick fixes

These capabilities mirror the features exposed by the Rust server in crates/server.

Repository Layout

pkg/vscode/
├── client/                 # TypeScript client that binds to VS Code APIs
│   ├── src/extension.ts    # Extension entry point; starts the LSP client
│   └── src/test/           # End-to-end tests using the VS Code test runner
├── package.json            # Extension manifest (activation, contributions)
├── tsconfig.json           # TypeScript project references
├── eslint.config.js        # Lint configuration
└── dprint.json             # Formatting config for client sources

The client launches the Beacon server binary from target/debug/beacon-lsp (or target/release/beacon-lsp if present). Ensure one of these binaries exists before activating the extension.

Prerequisites

  • Rust toolchain (stable) with cargo available in PATH
  • Node.js 18+ (aligned with current VS Code requirements)
  • pnpm for dependency management Install globally with npm install -g pnpm
  • VS Code ≥ 1.100 (see package.json engines field)
  • (Optional) vsce or ovsx for packaging/publishing

Installing Dependencies

From the repository root:

pnpm install

This installs dependencies for all packages, including the VS Code extension.

Building The Extension Client

The extension compiles TypeScript into client/out/:

pnpm --filter beacon-lsp compile

For iterative development, run:

pnpm --filter beacon-lsp watch

This keeps the TypeScript project in watch mode so recompiles happen automatically after you edit client files.

Building The Beacon LSP Server

The client resolves the server binary relative to the repository root:

target/debug/beacon-lsp    (default)
target/release/beacon-lsp  (used if available)

Build the server before launching the extension:

cargo build -p beacon-lsp              # debug binary
# or
cargo build -p beacon-lsp --release    # release binary

Running In VS Code

  1. Open pkg/vscode in VS Code.
  2. Select the Run and Debug panel and choose the Beacon LSP launch configuration (provided in .vscode/launch.json).
  3. Press F5 to start the Extension Development Host.
  4. In the new window, open a Python file (the repository’s samples/ directory is a good starting point).

The launch configuration compiles the TypeScript client and relies on the previously built Rust binary. In debug mode, RUST_LOG=beacon_lsp=debug is set automatically so server logs appear in the “Beacon LSP” output channel.

Configuration

The extension provides extensive configuration options accessible through VS Code settings. All settings are under the beacon.* namespace and can be configured per-workspace or globally.

Type Checking

SettingTypeDefaultDescription
beacon.typeChecking.modestring"balanced"Type checking strictness: "strict", "balanced", or "relaxed"

Inlay Hints

SettingTypeDefaultDescription
beacon.inlayHints.enablebooleantrueEnable inlay hints for type information
beacon.inlayHints.variableTypesbooleantrueShow inlay hints for inferred variable types
beacon.inlayHints.functionReturnTypesbooleantrueShow inlay hints for inferred function return types
beacon.inlayHints.parameterNamesbooleanfalseShow inlay hints for parameter names in calls

Python Settings

SettingTypeDefaultDescription
beacon.python.versionstring"3.12"Target Python version: "3.9", "3.10", "3.11", "3.12", "3.13"
beacon.python.interpreterPathstring""Path to Python interpreter for runtime introspection
beacon.python.stubPathsstring[]["stubs"]Additional paths to search for .pyi stub files

Workspace Settings

SettingTypeDefaultDescription
beacon.workspace.sourceRootsstring[][]Source roots for module resolution (in addition to workspace root)
beacon.workspace.excludePatternsstring[][]Patterns to exclude from workspace scanning (e.g., venv/, .venv/)

Diagnostics

SettingTypeDefaultDescription
beacon.diagnostics.unresolvedImportsstring"warning"Severity for unresolved imports: "error", "warning", "info"
beacon.diagnostics.circularImportsstring"warning"Severity for circular imports: "error", "warning", "info"

Advanced

SettingTypeDefaultDescription
beacon.advanced.maxAnyDepthnumber3Maximum depth for Any type propagation (0-10)
beacon.advanced.incrementalbooleantrueEnable incremental type checking
beacon.advanced.workspaceAnalysisbooleantrueEnable workspace-wide analysis
beacon.advanced.enableCachingbooleantrueEnable caching of parse trees and type inference results
beacon.advanced.cacheSizenumber100Maximum number of documents to cache (0-1000)

Debugging

SettingTypeDefaultDescription
beacon.trace.serverstring"off"JSON-RPC tracing: "off", "messages", or "verbose"

Enable messages or verbose while debugging protocol issues; traces are written to the "Beacon LSP" output channel.

Example Configuration

Add these settings to your .vscode/settings.json:

{
  "beacon.typeChecking.mode": "strict",
  "beacon.python.version": "3.12",
  "beacon.python.stubPaths": ["stubs", "typings"],
  "beacon.workspace.sourceRoots": ["src", "lib"],
  "beacon.workspace.excludePatterns": [
      "**/venv/**",
      "**/.venv/**",
      "**/build/**"
  ],
  "beacon.inlayHints.enable": true,
  "beacon.inlayHints.variableTypes": true,
  "beacon.inlayHints.functionReturnTypes": true,
  "beacon.diagnostics.unresolvedImports": "error",
  "beacon.diagnostics.circularImports": "warning"
}

Configuration Precedence

Beacon merges configuration from multiple sources:

  1. Default values - Built-in defaults
  2. TOML file - beacon.toml or pyproject.toml in workspace root
  3. VS Code settings - User/workspace settings (highest precedence)

See Configuration for details on TOML configuration files.

Packaging & Publishing

  1. Ensure the client is built (pnpm --filter beacon-lsp compile) and the server release binary exists (cargo build -p beacon-lsp --release).
  2. From pkg/vscode, run vsce package (or ovsx package) to produce a .vsix.
  3. Publish the package with vsce publish or ovsx publish once authenticated.

The generated .vsix expects the server binary to be shipped alongside the extension or obtainable on the user’s machine. Adjust extension.ts if you plan to bundle the binary differently.

Zed Extension

The Beacon Zed extension (pkg/zed/) integrates the Beacon language server with Zed editor. It activates automatically for Python files and provides Hindley-Milner type checking alongside standard LSP features.

Feature Highlights

  • On-type diagnostics for syntax and type errors
  • Hover tooltips with type information
  • Go to definition & find references
  • Document and workspace symbols
  • Semantic tokens for enhanced highlighting
  • Identifier completions and inlay hints
  • Code actions for quick fixes and refactoring

These capabilities mirror the features exposed by the Rust server in crates/server.

Repository Layout

pkg/zed/
├── src/
│   └── lib.rs       # Extension implementation
├── Cargo.toml       # Rust project manifest
├── extension.toml   # Zed extension metadata
└── README.md        # Installation instructions

The extension is compiled to WebAssembly (wasm32-wasip1) and communicates with the beacon-lsp binary via the Language Server Protocol.

Prerequisites

  • Rust toolchain (stable) with cargo available in PATH
  • wasm32-wasip1 target for Rust (install with rustup target add wasm32-wasip1)
  • beacon-lsp binary installed and available in PATH
  • Zed editor installed

Installing beacon-lsp

The extension requires beacon-lsp to be available in your system PATH:

# From the repository root
cargo install --path crates/server

This installs the beacon-lsp binary to ~/.cargo/bin. Ensure ~/.cargo/bin is in your PATH.

Verify installation:

which beacon-lsp
# Should output: /Users/<username>/.cargo/bin/beacon-lsp

Building The Extension

The extension must be compiled to WebAssembly:

cd pkg/zed
cargo build --target wasm32-wasip1 --release

The compiled extension will be at:

target/wasm32-wasip1/release/beacon_zed.wasm

Installing The Extension

Development Installation

For local development and testing:

  1. Build the extension (see above)

  2. Create a symlink to the extension directory in Zed's extensions folder:

    # macOS
    mkdir -p ~/.config/zed/extensions
    ln -s /path/to/beacon/pkg/zed ~/.config/zed/extensions/beacon
    
  3. Restart Zed or reload the window

  4. Open a Python file to activate the extension

Distribution Installation

To distribute the extension, package it following Zed's extension installation guide.

The extension expects beacon-lsp to be available in the user's PATH. Users should install it via:

cargo install beacon-lsp

Extension Implementation

The extension implements the zed::Extension trait with the following key components:

Language Server Command

Returns the command to launch beacon-lsp:

#![allow(unused)]
fn main() {
fn language_server_command(
    &mut self, _: &zed::LanguageServerId, worktree: &zed::Worktree) -> zed::Result<zed::Command> {
    let command = worktree
        .which("beacon-lsp")
        .ok_or_else(|| "beacon-lsp not found in PATH")?;

    Ok(zed::Command {
        command,
        args: vec![],
        env: vec![("RUST_LOG".to_string(), "info".to_string())],
    })
}
}

Environment Variables

The extension sets RUST_LOG=info to configure logging. Logs are written to stderr and can be viewed in Zed's log panel.

Arguments

beacon-lsp doesn't require command-line arguments as it communicates via stdin/stdout.

Configuration

See Configurationfor details.

Development Workflow

Making Changes

  1. Edit the extension source in pkg/zed/src/lib.rs

  2. Rebuild the extension:

    cargo build --target wasm32-wasip1 --release
    
  3. Restart Zed to load the updated extension

Debugging

Enable detailed logging:

RUST_LOG=beacon_lsp=debug zed

Or set the environment variable in your shell before launching Zed. Logs appear in:

  • macOS: ~/Library/Logs/Zed/Zed.log
  • Linux: ~/.local/share/zed/logs/Zed.log

Testing Changes

  1. Build the language server with your changes:

    cargo build -p beacon-lsp
    cargo install --path crates/server
    
  2. Rebuild the extension if needed

  3. Open a Python project in Zed

  4. Test LSP features:

    • Hover over variables to see type information
    • Use Cmd+Click (macOS) or Ctrl+Click (Linux) for go-to-definition
    • Check the Problems panel for diagnostics
    • Trigger completions with Ctrl+Space

Comparison with VS Code Extension

FeatureZedVS Code
InstallationManual build + PATHMarketplace (planned)
ConfigurationTOML filesVS Code settings UI
DebuggingLog filesOutput panel
Language ServerShared (beacon-lsp)Shared (beacon-lsp)
FeaturesFull LSP supportFull LSP support
PlatformmacOS, LinuxmacOS, Linux, Windows

Both extensions use the same beacon-lsp server, so feature parity is guaranteed.

Resources

Formatting Overview

Beacon provides built-in PEP8-compliant Python code formatting capabilities through its language server. The formatter is designed to produce consistent, readable code while respecting configuration preferences.

Design Principles

The formatter follows these core principles:

PEP8 Compliance: Adheres to Python Enhancement Proposal 8 style guidelines by default, with configurable options for compatibility with Black and autopep8.

AST-Based: Operates on the abstract syntax tree rather than raw text, ensuring formatting preserves semantic meaning and handles edge cases correctly.

Configurable: Supports workspace and project-level configuration through beacon.toml or pyproject.toml files.

Incremental: Formats code caching of already-formatted sources and formatting results to minimize redundant processing.

Formatting Pipeline

The formatter operates in four stages:

  1. Parsing: Source code is parsed into an AST using the Beacon parser
  2. Token Generation: AST nodes are converted into a stream of formatting tokens
  3. Rule Application: Formatting rules are applied based on context and configuration
  4. Output Generation: Formatted code is written with proper whitespace and indentation

Key Features

Whitespace and Indentation

  • Normalizes indentation to 4 spaces (configurable)
  • Removes trailing whitespace
  • Manages blank lines between definitions and statements
  • Controls whitespace around operators, commas, and colons

See Whitespace for detailed formatting rules.

Line Length Management

  • Enforces maximum line length (default: 88 characters, matching Black)
  • Smart line breaking at appropriate boundaries
  • Handles multi-byte Unicode characters correctly
  • Preserves user line breaks when under the limit

See Print Width for line length handling.

Structural Formatting

  • Function call and definition parameter wrapping
  • Collection literal formatting (lists, dicts, sets, tuples)
  • Binary expression breaking
  • Import statement organization and sorting

See Structure and Imports for structural rules.

Suppression Comments

The formatter respects suppression directives:

  • # fmt: skip - Skip formatting for a single line
  • # fmt: off / # fmt: on - Disable formatting for regions

See Suppressions for complete documentation on formatter, linter, and type checker suppressions.

Optimizations

The formatter includes intelligent caching to minimize formatting overhead:

Short-Circuit Cache

The formatter maintains a hash-based cache of already-formatted sources. When formatting is requested:

  1. Source content and configuration are hashed
  2. Cache is checked for this hash combination
  3. If found, formatting is skipped entirely (O(1) operation)
  4. Source is returned unchanged

Incremental Formatting

Formatting results are cached based on:

  • Source content hash
  • Configuration hash
  • Line range

When formatting the same source multiple times (e.g., during editing), cached results are reused if:

  • Source hasn't changed
  • Configuration remains the same
  • Same range is being formatted

The cache uses LRU (Least Recently Used) eviction with configurable size limits to prevent unbounded memory growth.

Cache Configuration

Caching behavior can be controlled through configuration:

[formatting]
cacheEnabled = true        # Enable result caching (default: true)
cacheMaxEntries = 100      # Maximum cache entries (default: 100)

Disabling the cache may be useful in scenarios where:

  • Memory constraints are tight
  • Source changes very frequently
  • Deterministic performance is required

Configuration

Formatting behavior is controlled through settings:

[formatting]
enabled = true
lineLength = 88
indentSize = 4
quoteStyle = "double"
trailingCommas = "multiline"
maxBlankLines = 2
importSorting = "pep8"
compatibilityMode = "black"
cacheEnabled = true
cacheMaxEntries = 100

See the Configuration documentation for complete details.

LSP Integration

The formatter integrates with the Language Server Protocol through:

  • textDocument/formatting: Format entire document
  • textDocument/rangeFormatting: Format selected range
  • textDocument/willSaveWaitUntil: Format on save

Compatibility

The formatter provides compatibility modes for popular formatters:

  • Black: 88-character line length, minimal configuration
  • autopep8: 79-character line length, conservative formatting
  • PEP8: Strict adherence to style guide recommendations

Whitespace and Indentation

This document describes Beacon's whitespace and indentation formatting rules.

Indentation

Beacon normalizes indentation according to PEP8 guidelines.

Indent Size

Default indentation is 4 spaces per level:

def example():
    if condition:
        do_something()

Configure indentation via formatting.indentSize:

[formatting]
indentSize = 2  # Use 2 spaces

Tabs vs Spaces

Spaces are strongly recommended and used by default. Tabs can be enabled but are not PEP8-compliant:

[formatting]
useTabs = true  # Not recommended

Trailing Whitespace

All trailing whitespace is removed from lines:

# Before
def foo():
    return 42

# After
def foo():
    return 42

This applies to all lines, including blank lines.

Blank Lines

Beacon manages blank lines according to PEP8 conventions.

Top-Level Definitions

Two blank lines separate top-level class and function definitions:

def first_function():
    pass


def second_function():
    pass


class MyClass:
    pass

Configure via formatting.blankLineBeforeClass and formatting.blankLineBeforeFunction:

[formatting]
blankLineBeforeClass = true
blankLineBeforeFunction = true

Method Definitions

One blank line separates methods within a class:

class Example:
    def first_method(self):
        pass

    def second_method(self):
        pass

Maximum Consecutive Blank Lines

By default, at most 2 consecutive blank lines are allowed:

# Before
def foo():
    pass




def bar():
    pass

# After
def foo():
    pass


def bar():
    pass

Configure via formatting.maxBlankLines:

[formatting]
maxBlankLines = 1  # Allow only 1 blank line

Operators

Whitespace around operators depends on the operator type.

Binary Operators

Single space on both sides of binary operators when formatting.spacesAroundOperators is enabled (default):

# Arithmetic
result = x + y
quotient = a / b
power = base ** exponent

# Comparison
if value == expected:
    pass

# Logical
condition = flag and other_flag

Unary Operators

No space between unary operator and operand:

negative = -value
inverted = ~bits
boolean = not flag

Assignment Operators

Single space around assignment operators:

x = 10
count += 1
value *= 2

Delimiters

Parentheses, Brackets, Braces

No whitespace immediately inside delimiters:

# Correct
function(arg1, arg2)
items = [1, 2, 3]
mapping = {'key': 'value'}

# Incorrect
function( arg1, arg2 )
items = [ 1, 2, 3 ]
mapping = { 'key': 'value' }

Commas

No space before comma, single space after:

# Correct
items = [1, 2, 3]
function(a, b, c)

# Incorrect
items = [1 ,2 ,3]
function(a,b,c)

Colons

In dictionaries and slices, no space before colon, single space after:

# Dictionary
mapping = {'key': 'value', 'other': 'data'}

# Slice
subset = items[start:end]
every_other = items[::2]

In function annotations, no space before colon, single space after:

def greet(name: str) -> str:
    return f"Hello, {name}"

In class inheritance and control flow, no space before colon:

class Child(Parent):
    pass

if condition:
    pass

Comments

Inline Comments

Inline comments have two spaces before the hash and one space after:

x = x + 1  # Increment

Block Comments

Block comments start at the beginning of a line or at the current indentation level:

# This is a block comment
# spanning multiple lines
def function():
    # Indented block comment
    pass

Configuration Summary

Related configuration options:

[formatting]
indentSize = 4                      # Spaces per indent level
useTabs = false                     # Use spaces, not tabs
maxBlankLines = 2                   # Maximum consecutive blank lines
spacesAroundOperators = true        # Add spaces around binary operators
blankLineBeforeClass = true         # 2 blank lines before top-level classes
blankLineBeforeFunction = true      # 2 blank lines before top-level functions

Line Length and Wrapping

Beacon enforces configurable line length limits and provides smart line breaking for long statements.

Line Length

The default maximum line length is 88 characters, matching Black's default.

Configuration

Set line length via formatting.lineLength:

[formatting]
lineLength = 88  # Black default

# Or for strict PEP8
lineLength = 79

Unicode Width Calculation

Line length is calculated using Unicode display width, not byte count.

# This emoji counts as 2 characters wide
message = "Status: ✅"

Line Breaking

When a line exceeds the configured length, Beacon breaks it the following boundaries:

Break Points

Lines can break at these locations:

Commas: Highest priority break point

# Before
result = function(very_long_arg1, very_long_arg2, very_long_arg3, very_long_arg4)

# After
result = function(
    very_long_arg1,
    very_long_arg2,
    very_long_arg3,
    very_long_arg4
)

Binary Operators: Secondary break point

# Before
total = first_value + second_value + third_value + fourth_value

# After (when nested)
total = (
    first_value
    + second_value
    + third_value
    + fourth_value
)

Opening Brackets: When deeply nested

# Multiple levels of nesting
data = {
    'key': [
        item1,
        item2
    ]
}

Preserving User Breaks

If your manually inserted line breaks keep the code under the limit, they are preserved:

# This will not be reformatted if under line limit
result = function(
    arg1, arg2
)

Wrapping Strategies

Different constructs use different wrapping strategies.

Function Calls

Function calls use one of three strategies based on argument width:

Horizontal: All arguments on one line when they fit

result = function(arg1, arg2, arg3)

Vertical: One argument per line when arguments are long

result = function(
    very_long_argument_name_1,
    very_long_argument_name_2,
    very_long_argument_name_3
)

Mixed: Multiple arguments per line for medium-length arguments

result = function(
    arg1, arg2,
    arg3, arg4,
    arg5
)

Function Definitions

Function parameters wrap similarly to function calls:

def long_function_name(
    parameter1: str,
    parameter2: int,
    parameter3: bool = False
) -> None:
    pass

Hanging Indents

Parameters can align with the opening delimiter:

result = function(argument1,
                  argument2,
                  argument3)

Or use a consistent indent level:

result = function(
    argument1,
    argument2,
    argument3
)

Beacon prefers consistent indent levels for clarity.

Collection Literals

Collections wrap to vertical layout when they exceed line length:

Lists and Tuples:

items = [
    'first',
    'second',
    'third'
]

Dictionaries:

mapping = {
    'key1': 'value1',
    'key2': 'value2',
    'key3': 'value3'
}

Sets:

unique = {
    item1,
    item2,
    item3
}

Binary Expressions

Long binary expressions break before operators:

result = (
    condition1
    and condition2
    or condition3
)

Parenthesized Continuations

Python's implicit line continuation inside parentheses is preferred over backslashes:

# Preferred
total = (
    first_value
    + second_value
    + third_value
)

# Avoid
total = first_value \
    + second_value \
    + third_value

Trailing Commas

Beacon adds trailing commas in multi-line structures when formatting.trailingCommas is set appropriately:

[formatting]
trailingCommas = "multiline"  # Add in multi-line structures (default)
# trailingCommas = "always"   # Always add
# trailingCommas = "never"    # Never add

With multiline setting:

items = [
    'first',
    'second',
    'third',  # Trailing comma added
]

Benefits of trailing commas:

  • Cleaner diffs when adding/removing items
  • Prevents forgetting commas when reordering
  • Consistent formatting

Context-Aware Breaking

Breaking decisions consider context:

Inside Strings and Comments: Never break

# This string won't be broken even if it's very long
message = "This is a very long string that exceeds the line length limit"

Nested Constructs: Allow breaking at higher nesting levels

result = outer(
    inner(
        arg1,
        arg2
    )
)

Statement Boundaries: Prefer breaking between statements

# Break between statements
first = calculate_first()
second = calculate_second()

# Rather than within a statement
first = calculate_first(); second = calculate_second()

Configuration Summary

Related configuration options:

[formatting]
lineLength = 88                     # Maximum line length
trailingCommas = "multiline"        # Trailing comma strategy
compatibilityMode = "black"         # Affects wrapping decisions

String and Comment Formatting

Beacon's formatter provides intelligent string quote normalization and comment formatting while preserving special directives and avoiding unnecessary escaping.

String Quote Normalization

The formatter can normalize string quotes according to your preferred style:

Quote Styles

  • Double quotes (default) - Converts strings to use "
  • Single quotes - Converts strings to use '
  • Preserve - Keeps original quote style

Smart Escaping Avoidance

The formatter intelligently avoids quote normalization when it would introduce escaping:

# Configuration: quote_style = "double"

# Would require escaping, so preserved
'He said "hello" to me'

# No quotes inside, normalized
'simple string' → "simple string"

Prefixed Strings

String prefixes (r, f, rf, etc.) are preserved during normalization:

# Configuration: quote_style = "double"
r'raw string' → r"raw string"
f'formatted {x}' → f"formatted {x}"
rf'raw formatted' → rf"raw formatted"

Docstring Formatting

Triple-quoted strings (docstrings) receive special handling:

Quote Normalization

Docstrings are normalized to the configured quote style unless they contain the target quote sequence:

# Configuration: quote_style = "double"
'''Single quoted docstring''' → """Single quoted docstring"""

# Contains target quotes, preserved
'''String with """quotes""" inside'''

Indentation

Multi-line docstrings maintain consistent indentation:

def example():
    """
    This is a docstring with
    properly normalized indentation
    across all lines
    """

Comment Formatting

Comments are formatted for consistency while preserving special directives.

Standard Comments

Regular comments are formatted with a single space after the #:

#comment → # comment
#  multiple   spaces → # multiple   spaces

Inline Comments

Inline comments (on the same line as code) are preceded by two spaces:

x = 1  # inline comment

Special Directives

Tool-specific comments are preserved exactly as written:

  • # type: ignore - Type checking suppressions
  • # noqa - Linting suppressions
  • # pylint:, # mypy:, # flake8: - Tool-specific directives
  • # fmt: off/on, # black: - Formatter control
x = very_long_line()  # type: ignore  # Preserved exactly

Block Comments

Multi-line block comments at module level may be surrounded by blank lines for better separation.

Configuration

String and comment formatting respects these settings:

  • beacon.formatting.quoteStyle - Quote normalization style (default: "double")
  • beacon.formatting.normalizeDocstringQuotes - Apply quote normalization to docstrings (default: true)

Examples

Before Formatting

def greet(name):
    '''Say hello'''  #function docstring
    message='Hello, ' + name  #create greeting
    return message

After Formatting

def greet(name):
    """Say hello"""  # function docstring
    message = "Hello, " + name  # create greeting
    return message

Import Formatting

Beacon's formatter provides PEP8-compliant import sorting and formatting with intelligent grouping and deduplication.

Import Groups

Imports are automatically organized into three groups following PEP8 style:

  1. Standard library imports - Python's built-in modules (os, sys, json, etc.)
  2. Third-party imports - External packages (numpy, django, requests, etc.)
  3. Local imports - Relative imports from your project (., .., .models, etc.)

Each group is separated by a blank line for clarity.

Sorting Within Groups

Within each group, imports are sorted alphabetically by module name:

  • Simple import statements are sorted before from imports
  • Multiple names in from imports are alphabetically sorted
  • Duplicate imports are automatically removed

Multi-line Imports

When from imports exceed the configured line length, they are automatically wrapped:

# Short enough for one line
from os import environ, path

# Exceeds line length - uses multi-line format
from collections import (
    Counter,
    OrderedDict,
    defaultdict,
    namedtuple,
)

Standard Library Detection

Beacon includes a comprehensive list of Python standard library modules for accurate categorization. Third-party packages are automatically identified when they don't match known stdlib modules.

Configuration

Import formatting respects these configuration options:

  • beacon.formatting.lineLength - Controls when to wrap multi-line imports
  • beacon.formatting.importSorting - Set to pep8 for standard sorting (default)

Example

Input:

from numpy import array
import sys
from .models import User
import os
from django.db import models

Output:

import os
import sys

from django.db import models
from numpy import array

from .models import User

Structural Formatting

Structural formatting rules control the layout of Python constructs beyond basic whitespace and indentation.

Trailing Commas

Trailing commas in multi-line structures improve git diffs and make adding items easier.

Configuration

Controlled by beacon.formatting.trailingCommas:

  • always: Add trailing commas to all multi-line structures
  • multiline: Add trailing commas only to multi-line nested structures (default)
  • never: Never add trailing commas

Behavior

The formatter determines whether to add a trailing comma based on:

  1. The trailing comma configuration setting
  2. Whether the structure spans multiple lines
  3. The nesting depth of the structure

For multiline mode, trailing commas are added when both conditions are met:

  • The structure is multi-line
  • The structure is nested (inside parentheses, brackets, or braces)

Examples

# multiline mode
items = [
    "first",
    "second",
    "third",  # trailing comma added (nested and multiline)
]

func(
    arg1,
    arg2,  # trailing comma added
)

# Top-level, single-line: no trailing comma
top_level = ["a", "b", "c"]

Dictionary Formatting

Dictionary formatting includes key-value spacing and multi-line alignment.

Value Indentation

For multi-line dictionaries, the formatter calculates appropriate indentation:

  • Nested dictionaries: Use base indentation + 1 level
  • Inline dictionaries: Align values with key width + 2 spaces
# Nested multi-line
config = {
    "key": "value",
    "nested": {
        "inner": "data",
    },
}

# Inline alignment
options = {"short": "val", "longer_key": "val"}

Comprehensions

List, dict, set, and generator comprehensions are formatted based on length.

Wrapping Strategy

The formatter chooses between horizontal and vertical layout:

  • Horizontal: Entire comprehension fits on one line
  • Vertical: Comprehension exceeds available line space
# Horizontal (fits on one line)
squares = [x**2 for x in range(10)]

# Vertical (too long)
result = [
    transform(item)
    for item in collection
    if predicate(item)
]

Lambda Expressions

Lambda expressions wrap to multiple lines when they exceed the line length limit.

Wrapping Decision

Determined by: current_column + lambda_width > line_length

# Short lambda: stays on one line
square = lambda x: x**2

# Long lambda: may need refactoring to def
complex = lambda x, y, z: (
    some_complex_calculation(x, y, z)
)

Decorators

Decorators are formatted with one decorator per line and proper spacing.

Rules

  1. Each decorator on its own line
  2. Decorators aligned at the same indentation as the function/class
  3. @ symbol normalized (added if missing)
  4. No blank lines between consecutive decorators
@property
@lru_cache(maxsize=128)
def expensive_computation(self):
    return result

Class Definitions

Class definitions follow PEP 8 spacing conventions.

Blank Lines

Controlled by beacon.formatting.blankLineBeforeClass:

  • Top-level classes: 2 blank lines before (when enabled)
  • Nested classes: 1 blank line before
# Module-level


class TopLevelClass:
    pass


class AnotherTopLevelClass:

    class NestedClass:
        pass

Function Definitions

Function definitions use similar spacing rules to classes.

Blank Lines

Controlled by beacon.formatting.blankLineBeforeFunction:

  • Top-level functions: 2 blank lines before (when enabled)
  • Methods: 1 blank line before
# Module-level


def top_level_function():
    pass


class MyClass:

    def method_one(self):
        pass

    def method_two(self):
        pass

Type Annotations

Type annotation spacing follows PEP 8 guidelines.

Spacing Rules

Returns tuple (before, after) for colon spacing:

  • Variable annotations: No space before colon, one space after
  • Return annotations: Space before and after ->
# Variable annotations
name: str = "value"
count: int = 42

# Function annotations
def greet(name: str, age: int) -> str:
    return f"Hello {name}"

Implementation

Structural formatting rules are implemented in FormattingRules:

  • should_add_trailing_comma(): Determines trailing comma insertion
  • format_decorator(): Normalizes decorator syntax
  • type_annotation_spacing(): Returns spacing tuple
  • should_wrap_lambda(): Decides lambda wrapping
  • dict_value_indent(): Calculates dictionary value indentation
  • comprehension_wrapping_strategy(): Returns wrapping strategy
  • blank_lines_for_class(): Returns required blank lines for classes
  • blank_lines_for_function(): Returns required blank lines for functions

All rules are context-aware and respect the current indentation level and nesting depth.

Suppression/Ignore Comments

Beacon supports suppression/ignore comments to selectively disable the formatter, linter, and type checker on specific lines or regions of code.

Formatter Suppressions

Control when the formatter should skip code sections.

Single Line: # fmt: skip

Skip formatting for a single statement or line.

# This line will be formatted normally
x = 1

# This line preserves exact spacing
y=2+3  # fmt: skip

# Back to normal formatting
z = 4

Region: # fmt: off / # fmt: on

Disable formatting for entire code blocks.

# Normal formatting applies here
formatted_dict = {"key": "value"}

# fmt: off
unformatted_dict={"key":"value","no":"spaces"}
complex_expression=1+2+3+4+5+6+7+8+9
# fmt: on

# Back to formatted code
back_to_normal = {"properly": "formatted"}
  • Multiple # fmt: off/# fmt: on pairs allowed in the same file
  • Unclosed # fmt: off preserves formatting to end of file
  • The directive lines themselves are preserved as-is

Alignment Preservation

Common use case: preserving column alignment in matrices or tables.

# Normal list formatting
matrix = [
    [1, 2, 3],
    [4, 5, 6],
]

# fmt: off
# Preserve column alignment
aligned_matrix = [
    [1,    2,    3],
    [100,  200,  300],
    [10,   20,   30],
]
# fmt: on

Linter Suppressions

Suppress specific linter warnings or all warnings on a line.

Suppress All Warnings: # noqa

Disable all linter checks for a line.

x = 1  # noqa

Suppress Specific Rules: # noqa: CODE

Disable specific linter rules by code.

# Suppress unused import warning
import os  # noqa: BEA015

# Suppress unused variable warning
result = expensive_computation()  # noqa: BEA016

# Suppress multiple specific rules
break  # noqa: BEA005, BEA010

Multiple Rules

Separate multiple rule codes with commas:

# Suppress both undefined name and unused variable
x = undefined_variable  # noqa: BEA001, BEA016

Case Insensitive

Rule codes are case-insensitive:

x = 1  # noqa: bea016  # Same as BEA016

Type Checker Suppressions

Suppress type checking errors.

Suppress All Type Errors: # type: ignore

Disable all type checking for a line.

x: int = "string"  # type: ignore

Suppress Specific Error: # type: ignore[code]

Disable specific type error categories.

# Suppress only assignment type errors
value: str = 42  # type: ignore[assignment]

# Suppress multiple error types
result: int = some_function()  # type: ignore[assignment, call-arg]

Common type error codes:

  • assignment - Type mismatch in assignment
  • arg-type - Incorrect argument type
  • return-value - Return type mismatch
  • call-arg - Function call argument errors
  • attr-defined - Attribute not defined

Combining Suppression/Ignore Comments

Multiple suppression types can be used on the same line in any order

# Suppress both type checker and linter
x: int = "string"  # type: ignore  # noqa: BEA016

# Formatter skip with linter suppression
y=2+3  # fmt: skip  # noqa: BEA020
z = value  # noqa: BEA001  # type: ignore
# Same as:
z = value  # type: ignore  # noqa: BEA001

Quick Reference

CommentScopeApplies ToExample
# fmt: skipSingle lineFormatterx=1 # fmt: skip
# fmt: offStart regionFormatterSee examples above
# fmt: onEnd regionFormatterSee examples above
# noqaSingle lineAll linter rulesx=1 # noqa
# noqa: CODESingle lineSpecific linter rule(s)import os # noqa: BEA015
# type: ignoreSingle lineAll type errorsx: int = "s" # type: ignore
# type: ignore[code]Single lineSpecific type error(s)x: int = "s" # type: ignore[assignment]

See Also

CLI Overview

The Beacon CLI provides command-line tools for parsing, type checking, and analyzing Python code using Hindley-Milner type inference.

Available Commands

Core Commands

  • parse - Parse Python files and display the AST
  • highlight - Syntax highlighting with optional colors
  • check - Validate Python syntax for parse errors
  • resolve - Analyze name resolution and display symbol tables
  • format - Run the Beacon formatter without starting the language server

Static Analysis

  • analyze - Run static analysis on Python code (linting and data flow)
  • lint - Run linter on Python code

Type Checking

  • typecheck - Perform Hindley-Milner type inference and report type errors

Language Server

  • lsp - Start the Beacon Language Server Protocol server

Debug Tools (Debug Builds Only)

  • debug tree - Display tree-sitter CST structure
  • debug ast - Show AST with inferred types
  • debug constraints - Display generated type constraints
  • debug unify - Show unification trace

Installation

Build from source:

cargo build --release

The binary will be available at target/release/beacon-cli.

Basic Usage

All commands accept either a file path or read from stdin:

# From file
beacon-cli typecheck example.py

# From stdin
cat example.py | beacon-cli typecheck

Getting Help

For detailed help on any command:

beacon-cli help <command>

For the complete list of options:

beacon-cli --help

Static Analysis

The analyze command runs static analysis on Python code, including linting and data flow analysis.

Targets

File Analysis

Analyze an entire file:

beacon analyze file ./src/myapp/core.py

Function Analysis

Analyze a specific function in a file:

beacon analyze function ./src/myapp/core.py:process_data

Class Analysis

Analyze a specific class in a file:

beacon analyze class ./src/myapp/models.py:User

Package Analysis (TODO)

Analyze an entire package (directory with init.py):

beacon analyze package ./src/myapp

Project Analysis (TODO)

Analyze an entire project (workspace with multiple packages):

beacon analyze project .

Options

Output Format

Control the output format:

# Human-readable output (default)
beacon analyze file main.py --format human

# JSON output for machine processing
beacon analyze file main.py --format json

# Compact single-line format (file:line:col)
beacon analyze file main.py --format compact

Analysis Filters

Run specific analyses:

# Only run linter
beacon analyze file main.py --lint-only

# Only run data flow analysis
beacon analyze file main.py --dataflow-only

Visualization

Show additional information:

# Show control flow graph visualization (TODO)
beacon analyze file main.py --show-cfg

# Show inferred types (TODO)
beacon analyze file main.py --show-types

Examples

Analyze a Complete File

# calculator.py
import os

def greet(name):
    return f'Hello {name}'

def unused_function():
    x = 1
    x = 2
    return x

class Calculator:
    def add(self, a, b):
        return a + b
$ beacon analyze file calculator.py
✗ 2 issues found in calculator.py

▸ calculator.py:1:1 [BEA015]
  'os' imported but never used
  1 import os
    ^

▸ calculator.py:8:5 [BEA018]
  'x' is redefined before being used
  8     x = 2
        ^

Analyze a Specific Function

$ beacon analyze function calculator.py:greet
✗ 1 issues found in calculator.py

▸ calculator.py:1:1 [BEA015]
  'os' imported but never used
  1 import os
    ^

Analyze a Specific Class

$ beacon analyze class calculator.py:Calculator
✗ 1 issues found in calculator.py

▸ calculator.py:1:1 [BEA015]
  'os' imported but never used
  1 import os
    ^

Lint-Only Mode

Run only linting without data flow analysis:

beacon analyze file main.py --lint-only

JSON Output

Machine-readable output for tooling integration:

beacon analyze class models.py:User --format json

Linting

The lint command runs the Beacon linter on Python code to detect common coding issues, style violations, and potential bugs.

Usage

beacon lint [OPTIONS] [PATHS]...

Accepts:

  • Single file: beacon lint file.py
  • Multiple files: beacon lint file1.py file2.py file3.py
  • Directory: beacon lint src/ (recursively finds all .py files)
  • Stdin: beacon lint (reads from stdin)

Examples

Detecting Unused Imports and Variable Redefinition

# test.py
import os

def greet(name):
    return f'Hello {name}'

def unused_function():
    x = 1
    x = 2  # Redefined before being used
    return x
beacon lint test.py

Output:

✗ 2 issues found in test.py

▸ test.py:1:1 [BEA015]
  'os' imported but never used
  1 import os
    ^

▸ test.py:8:5 [BEA018]
  'x' is redefined before being used
  8     x = 2
        ^

Clean Code - No Issues

# clean.py
def add(x, y):
    return x + y

result = add(1, 2)
print(result)
$ beacon lint clean.py
✓ No issues found

Output Formats

Human-Readable (Default)

Shows issues with context and line numbers (default format):

$ beacon lint test.py
✗ 2 issues found in test.py

▸ test.py:1:1 [BEA015]
  'os' imported but never used
  1 import os
    ^

▸ test.py:8:5 [BEA018]
  'x' is redefined before being used
  8     x = 2
        ^

JSON Format

Machine-readable output for CI/CD integration:

beacon lint test.py --format json

Output:

[
  {
    "rule": "UnusedImport",
    "message": "'os' imported but never used",
    "filename": "test.py",
    "line": 1,
    "col": 1
  },
  {
    "rule": "RedefinedWhileUnused",
    "message": "'x' is redefined before being used",
    "filename": "test.py",
    "line": 8,
    "col": 5
  }
]

Compact Format

Single-line format compatible with many editors:

$ beacon lint test.py --format compact
test.py:1:1: [BEA015] 'os' imported but never used
test.py:8:5: [BEA018] 'x' is redefined before being used

Lint Rules

The linter implements PyFlakes-style rules (BEA001-BEA030):

  • Undefined variables
  • Unused imports and variables
  • Syntax errors in specific contexts
  • Potential bugs (assert on tuple, is vs ==, etc.)
  • Code style issues

For a complete list of rules, see the Lint Rules documentation.

Multiple Files and Directories

Lint all files in a directory

beacon lint src/

Lint multiple specific files

beacon lint src/main.py src/utils.py tests/test_main.py

Lint for CI with JSON output

beacon lint --format json src/ > lint-results.json

Directory Traversal

When a directory is provided, the command:

  • Recursively discovers all .py files
  • Respects .gitignore rules
  • Excludes common patterns: __pycache__/, *.pyc, .pytest_cache/, .mypy_cache/, .ruff_cache/, venv/, .venv/, env/, .env/

Exit Codes

  • 0 - No issues found
  • 1 - Issues found

This makes it easy to use in CI/CD pipelines:

beacon lint src/ || exit 1

Notes

The linter does not fix issues automatically (yet). It only reports them.

Type Checking

The typecheck command performs Hindley-Milner type inference on Python code and reports type errors.

Usage

beacon typecheck [OPTIONS] [PATHS]...

Accepts:

  • Single file: beacon typecheck file.py
  • Multiple files: beacon typecheck file1.py file2.py file3.py
  • Directory: beacon typecheck src/ (recursively finds all .py files)
  • Stdin: beacon typecheck (reads from stdin)

Options

  • -f, --format <FORMAT> - Output format (human, json, compact) [default: human]

Output Formats

Human (Default)

Human-readable output with context and visual pointers:

$ beacon typecheck example.py
Found 1 type error(s):

Error 1: Cannot unify types: Int ~ Str (line 3, col 5)
  --> example.py:3:5
   |
 3 | z = x + y
   |     ^

JSON

Machine-readable JSON format for tooling integration:

$ beacon typecheck --format json example.py
{
  "errors": [
    {
      "error": "Cannot unify types: Int ~ Str",
      "line": 3,
      "col": 5,
      "end_line": null,
      "end_col": null
    }
  ],
  "error_count": 1
}

Compact

Single-line format compatible with editor quickfix lists:

$ beacon typecheck --format compact example.py
example.py:3:5: Cannot unify types: Int ~ Str

Examples

Check a single file

beacon typecheck src/main.py

Check multiple files

beacon typecheck src/main.py src/utils.py tests/test_main.py

Check all files in a directory

beacon typecheck src/

Check with JSON output for CI

beacon typecheck --format json src/ > type-errors.json

Check from stdin

cat src/main.py | beacon typecheck

Exit Codes

  • 0 - No type errors found
  • 1 - Type errors found or analysis failed

Directory Traversal

When a directory is provided, the command:

  • Recursively discovers all .py files
  • Respects .gitignore rules
  • Excludes common patterns: __pycache__/, *.pyc, .pytest_cache/, .mypy_cache/, .ruff_cache/, venv/, .venv/, env/, .env/

Language Server

The lsp command starts the Beacon Language Server Protocol server for editor integration.

Usage

beacon-cli lsp [OPTIONS]

Options

  • --tcp <PORT> - Use TCP on the specified port (TODO: not yet implemented)
  • --log-file <PATH> - Write logs to the specified file

Communication Modes

stdio (Default)

The default mode uses standard input/output for LSP communication. This is the standard mode for editor integration:

beacon-cli lsp

Editors spawn the LSP server and communicate via pipes. This is automatically configured by editor plugins.

TCP Mode (TODO)

TCP mode allows remote LSP connections and easier debugging:

beacon-cli lsp --tcp 9257

Logging

stderr (Default)

By default, logs are written to stderr:

beacon-cli lsp 2> lsp.log

File Logging

Use the --log-file option to write logs to a specific file:

beacon-cli lsp --log-file /tmp/beacon-lsp.log

The log file is created if it doesn't exist and appended to if it does.

Environment Variables

Control log level via the RUST_LOG environment variable:

# Info level (default)
RUST_LOG=info beacon-cli lsp

# Debug level for verbose logging
RUST_LOG=debug beacon-cli lsp

# Trace level for very verbose logging
RUST_LOG=trace beacon-cli lsp

Editor Integration

VS Code

The Beacon VS Code extension automatically spawns the LSP server. No manual configuration needed.

Neovim

Configure nvim-lspconfig:

require'lspconfig'.beacon.setup{
  cmd = { "beacon-cli", "lsp" },
  filetypes = { "python" },
  root_dir = function(fname)
    return vim.fn.getcwd()
  end,
}

Emacs (lsp-mode)

Add to your configuration:

(add-to-list 'lsp-language-id-configuration '(python-mode . "python"))
(lsp-register-client
 (make-lsp-client :new-connection (lsp-stdio-connection '("beacon-cli" "lsp"))
                  :major-modes '(python-mode)
                  :server-id 'beacon))

LSP Features

The Beacon LSP server provides:

  • Full type inference (Hindley-Milner)
  • Hover information with inferred types
  • Go to definition
  • Find references
  • Document/workspace symbols
  • Semantic tokens
  • Inlay hints (type annotations)
  • Code actions
  • Diagnostics (type errors)
  • Auto-completion

See the LSP documentation for detailed feature descriptions.

Formatter CLI

The format command exposes Beacon's Python formatter without having to spin up the language server. It is helpful for debugging formatter behaviour (for example, while comparing samples/capabilities_support.py against the generated samples/capabilities_support_formatted.py).

Usage

beacon format [OPTIONS] [PATHS]...

Accepts:

  • Single file: beacon format file.py
  • Multiple files: beacon format file1.py file2.py file3.py
  • Directory: beacon format src/ (recursively finds all .py files)
  • Stdin: beacon format (reads from stdin)

Options

FlagDescription
--writeOverwrite files with formatted output.
--checkExit with a non-zero status if formatting would change the input.
--output <PATH>Write formatted output to a different file (only works with single file input).

--write conflicts with both --check and --output to prevent accidental combinations.

Examples

Format a single file and display to terminal

beacon format samples/capabilities_support.py

Format file in-place

beacon format samples/capabilities_support.py --write

Format all files in a directory

beacon format src/ --write

Format multiple specific files

beacon format src/main.py src/utils.py tests/test_main.py --write

Check formatting in CI

beacon format src/ --check

Write formatted output to a different file

beacon format samples/capabilities_support.py --output samples/capabilities_support_formatted.py

Directory Traversal

When a directory is provided, the command:

  • Recursively discovers all .py files
  • Respects .gitignore rules
  • Excludes common patterns: __pycache__/, *.pyc, .pytest_cache/, .mypy_cache/, .ruff_cache/, venv/, .venv/, env/, .env/

Suppression Comments

The formatter respects suppression directives in your code:

# Skip formatting for a single line
x=1+2  # fmt: skip

# Skip formatting for a region
# fmt: off
unformatted=code
# fmt: on

See Formatter Suppressions for complete documentation.

Exit Codes

  • 0 - All files are formatted correctly (or formatting succeeded)
  • 1 - Formatting would change files (with --check) or formatting failed

Debug Tools

Debug commands provide low-level inspection of Beacon's parsing and type inference internals. These tools are only available in debug builds.

Availability

Debug commands are compiled only in debug builds:

# Build in debug mode (includes debug commands)
cargo build

# Build in release mode (excludes debug commands)
cargo build --release

Commands

Tree-sitter CST

Display the concrete syntax tree from tree-sitter:

beacon-cli debug tree [OPTIONS] [FILE]

Options:

  • --json - Output in JSON format

Example output (default S-expression style):

Tree-sitter CST:
(module [0, 0] - [3, 0]
  (expression_statement [0, 0] - [0, 6]
    (assignment
      left: (identifier [0, 0] - [0, 1])
      right: (integer [0, 4] - [0, 6]))))

JSON output:

beacon-cli debug tree --json example.py

AST with Types

Show Beacon's AST with inferred types:

beacon-cli debug ast [OPTIONS] [FILE]

Options:

  • --format <FORMAT> - Output format (tree, json) [default: tree]

Example:

$ beacon-cli debug ast example.py
AST with inferred types:

Type mappings: 15 nodes
Position mappings: 12 positions

Type errors: 0

Node types:
  Node 1: Int
  Node 2: (Int, Int) -> Int
  Node 3: Int
  ...

Constraints

Display generated type constraints:

$ beacon-cli debug constraints [FILE]

Generated 23 constraints:

▸  Equal (12 instances)
  1. Equal(τ1, Int)
  2. Equal(τ2, (Int, Int) -> Int)
  3. Equal(τ3, Int)
  ... and 9 more

▸  Call (5 instances)
  1. Call(τ2, [τ1, τ1], {}, τ4)
  2. Call(print, [τ4], {}, τ5)
  ... and 3 more

▸  HasAttr (6 instances)
  1. HasAttr(τ6, "append", τ7)
  2. HasAttr(τ6, "extend", τ8)
  ... and 4 more

Unification

Show unification trace (TODO):

beacon-cli debug unify [FILE]

Diagnostics

Run comprehensive diagnostics (parse errors, lint issues, type errors, static analysis) on Python files:

beacon debug diagnostics [OPTIONS] <PATHS>...

Accepts:

  • Single file: beacon debug diagnostics file.py
  • Multiple files: beacon debug diagnostics file1.py file2.py file3.py
  • Directory: beacon debug diagnostics src/ (recursively finds all .py files)

Options:

  • -f, --format <FORMAT> - Output format (human, json, compact) [default: human]

Example output (human format):

$ beacon debug diagnostics src/

⚡ Running comprehensive diagnostics on 5 file(s)...

✓ 0 Parse Errors

✗ 3 Lint Issues
  ▸ src/main.py:5:1 [BEA015] 'os' imported but never used
    5 import os
      ~
  ▸ src/utils.py:10:5 [BEA018] 'x' is redefined before being used
    10     x = 2
           ~
  ▸ src/helper.py:3:1 [BEA015] 'sys' imported but never used
    3 import sys
      ~

✗ 2 Type Errors
  ▸ src/main.py:12:9 Cannot unify types: Int ~ Str
    12     z = x + y
               ~
  ▸ src/utils.py:20:5 Undefined type variable: τ5
    20     result = unknown_func()
           ~

Summary: 5 total issue(s) found

JSON output:

beacon debug diagnostics --format json src/ > diagnostics.json

Compact output (for editor integration):

beacon debug diagnostics --format compact src/
src/main.py:5:1: [BEA015] 'os' imported but never used
src/utils.py:10:5: [BEA018] 'x' is redefined before being used
src/main.py:12:9: [TYPE] Cannot unify types: Int ~ Str

Research

Reading List

Theory

Hindley–Milner Type Inference

  1. Principal Type-Schemes for Functional Programs - https://doi.org/10.1145/582153.582176
  2. Types and Programming Languages (2002), ch. 22-24
  3. Implementing a Hindley–Milner Type Inference - https://smunix.github.io/dev.stephendiehl.com/fun/006_hindley_milner.html
  4. Typing Haskell in Haskell - https://web.cecs.pdx.edu/~mpj/pubs/thih.html
  5. "Typed Racket: Gradual Typing for Dynamic Languages"
  6. TypeScript Specification - 2–4 (structural subtyping)
  7. PEP 544 - Protocols: Structural subtyping in Python

Implementation-Level Concepts

  1. Tree-sitter docs: https://tree-sitter.github.io/tree-sitter/
  2. "Rust for Rustaceans"
  3. The Rustonomicon - 3 (Type Layout & Lifetimes)
  4. https://jeremymikkola.com/posts/2019_01_01_type_inference_intro.html
  5. MyPy design docs: https://mypy.readthedocs.io/en/stable/internal.html
  6. PyRight internals (analyzer.py)
  7. Expert F# 5.0 (Ch. 9–10).
  8. TypeScript Compiler (specifically checker.ts)

Hindley–Milner Type Systems

Hindley–Milner (HM) is the classical polymorphic type system that powers languages such as ML, OCaml, and early versions of Haskell. It strikes a balance between expressiveness (parametric polymorphism) and tractable, annotation-free type inference.

Overview

Parametric polymorphism: functions can operate uniformly over many types without runtime overhead1.

Type inference: the compiler deduces the most general (principal) type scheme for each expression1.

Declarative typing judgment: The typing judgment \(\Gamma \vdash e : \sigma\) relates a context \( \Gamma \), an expression \( e \), and a type scheme \( \sigma \).

The result is a system where generic programs remain statically typed without drowning the developer in annotations.

Core Concepts

Why HM?

\(\lambda\)-calculus requires explicit annotations to achieve polymorphism. HM extends the calculus with let-polymorphism and carefully restricted generalization so that inference stays decidable and efficient.

Monotypes vs Polytypes

Monotypes (\(\tau\)): concrete types such as \(\alpha\), \(\text{Int} \to \text{Bool}\), or constructor applications \(C,\tau_1\cdots\tau_n\)2.

Polytypes / type schemes (\(\sigma\)): quantifications over monotypes, e.g. \(\forall \alpha.,\alpha \to \alpha\).

Principal type: every well-typed expression has a unique (up to renaming) most general type scheme from which all other valid typings can be instantiated1.

Generalization and Instantiation

Generalization: close a monotype over the free type variables not present in the environment.

Instantiation: specialise a polytype by substituting quantified variables with fresh monotype variables.

Let-Polymorphism

Only let-bound definitions are generalized. Lambda parameters remain monomorphic in HM; this restriction is critical to keep inference decidable1.

Formal Skeleton

Syntax

e ::= x
    | λ x. e
    | e₁ e₂
    | let x = e₁ in e₂

The associated type grammar and typing environments are:

\[ \begin{aligned} \tau &::= \alpha \mid C(\tau_1,\dots,\tau_n) \mid \tau \to \tau \ \sigma &::= \tau \mid \forall \alpha.,\sigma \ \Gamma &::= \emptyset \mid \Gamma, x : \sigma \end{aligned} \]

Typing Rules

Typing judgments take the form \(\Gamma \vdash e : \sigma\). Core rules include:

\[ \frac{x : \sigma \in \Gamma}{\Gamma \vdash x : \sigma} \quad\text{(Var)} \]

\[ \frac{\Gamma, x : \tau \vdash e : \tau'}{\Gamma \vdash \lambda x.,e : \tau \to \tau'} \quad\text{(Abs)} \]

\[ \frac{\Gamma \vdash e_0 : \tau \to \tau' \qquad \Gamma \vdash e_1 : \tau}{\Gamma \vdash e_0,e_1 : \tau'} \quad\text{(App)} \]

\[ \frac{\Gamma \vdash e_0 : \sigma \qquad \Gamma, x : \sigma \vdash e_1 : \tau}{\Gamma \vdash \text{let } x = e_0 \text{ in } e_1 : \tau} \quad\text{(Let)} \]

\[ \frac{\Gamma \vdash e : \sigma' \qquad \sigma' \sqsubseteq \sigma}{\Gamma \vdash e : \sigma} \quad\text{(Inst)} \]

\[ \frac{\Gamma \vdash e : \sigma \qquad \alpha \notin \mathrm{free}(\Gamma)}{\Gamma \vdash e : \forall \alpha.,\sigma} \quad\text{(Gen)} \]

Here \(\sigma' \sqsubseteq \sigma\) means that \(\sigma'\) is an instance of \(\sigma\) (obtained by instantiating quantified variables)1.

Algorithm W (Inference Sketch)

Algorithm W is the archetypal inference engine for HM3.

  1. Annotate sub-expressions with fresh type variables.
  2. Collect constraints when traversing the AST (especially from applications).
  3. Unify constraints to solve for unknown types.
  4. Generalize at each let by quantifying over variables not free in the environment.
  5. Return the principal type scheme produced by the substitutions.

Typical programs are handled in near-linear time, although the theoretical worst case is higher1.

Strengths and Limitations

Strengths

Minimal annotations with strong static guarantees.

Principled parametric polymorphism with predictable runtime behaviour.

A deterministic, well-understood inference algorithm.

Limitations

No native subtyping; adding it naively renders inference undecidable1.

Higher-rank polymorphism (e.g., passing polymorphic functions as arguments) requires extensions that typically sacrifice automatic inference.

Recursive bindings and mutation demand additional care to avoid unsound generalization.

Extensions: Type Classes

Many ML-derived languages extend HM with type classes to model constrained polymorphism4. Type classes capture ad-hoc behavior (equality, ordering, pretty-printing) without abandoning the core inference model.

Motivation

Developers often need functions that work only for types supporting specific operations (equality, ordering, etc.).

Type classes describe those obligations once and then allow generic code to depend on them declaratively.

Integration with HM

A type class \(C\) packages a set of operations. A type \(T\) becomes an instance of \(C\) by providing implementations.

Type schemes gain constraint contexts, e.g. \(\forall a.,(Eq,a) \Rightarrow a \to a\), read as “for all \(a\) that implement Eq, this function maps \(a\) to \(a\)”.

Environments track both type bindings and accumulated constraints, written informally as \(\Gamma \vdash e : \sigma \mid \Delta\).

During generalization, constraints that do not mention the generalized variables can be abstracted over; during instantiation, remaining constraints must be satisfied (dictionary passing, instance resolution, etc.).

Type classes preserve type safety while keeping user code concise, but introduce design questions about coherence (no conflicting instances), instance search termination, and tooling ergonomics.

Extensions: Higher-Rank Types

Higher-rank polymorphism allows universal quantifiers to appear inside function arguments, enabling functions that consume polymorphic functions5.

HM is rank-1: all \(\forall\) quantifiers appear at the outermost level.

Why Higher Rank?

Certain abstractions require accepting polymorphic functions as arguments, e.g.

applyTwice :: (forall a. a -> a) -> Int -> Int
applyTwice f x = f (f x)

HM cannot express this because the quantifier lives to the left of an arrow. Extending to rank-2 (or higher) types unlocks APIs like runST :: ∀a.(∀s. ST s a) -> a6.

Typing Considerations

The grammar generalizes to allow quantified types within arrow positions; checking such programs typically relies on bidirectional type checking7.

Full type inference for arbitrary rank is undecidable; practical compilers require annotations or rely on heuristics8.

Despite the cost, higher-rank types enable powerful encapsulation patterns and stronger invariants.

Design Trade-offs

Pros: Expressiveness for APIs manipulating polymorphic functions; better information hiding (e.g., ST).

Cons: Additional annotations, more complex error messages, heavier implementation burden.

Further Reading

Implementing HM Stimsina

Parametricity and type classes Well-Typed

Language Server Protocol

Why LSP Exists

Before LSP, editor integrations for language tooling (completion, diagnostics, refactors) were bespoke. Every compiler or analyzer needed plug-ins for VS Code, Vim, IntelliJ, Sublime, etc., and each editor duplicated work to support many languages. This matrix of per-language, per-editor plug-ins slowed innovation and made advanced tooling inaccessible outside first-party IDEs.

The Language Server Protocol—initiated by Microsoft for VS Code and now standardized by the Open Source community—solves this coupling. It defines a JSON-RPC protocol so a single language server can speak to any compliant editor. Editors implement the client half once and gain tooling support for every language that implements the server half.

Problems It Solves

  • Shared investment: Language teams implement the protocol once instead of maintaining multiple editor-specific plug-ins.
  • Editor freedom: Developers choose tools without sacrificing language-aware features.
  • Feature parity: Diagnostics, go-to-definition, workspace symbols, rename, and more behave consistently across environments.
  • Incremental updates: The protocol is designed for streaming updates as the user types, enabling responsive experiences.

How LSP Works

  1. Transport: Client and server communicate over stdin/stdout pipes, TCP, or WebSockets. Messages use JSON-RPC 2.0 framed with Content-Length headers.
  2. Initialization: Client sends initialize with capabilities and workspace metadata. Server responds with supported features (ServerCapabilities). A follow-up initialized notification signals readiness.
  3. Document Synchronization: The client streams document lifecycle notifications (didOpen, didChange, didSave, didClose) so the server maintains up-to-date views of open files.
  4. Feature Requests: Once documents are synchronized, the client issues requests such as:
    • textDocument/completion for completion items.
    • textDocument/hover for inline info.
    • textDocument/definition and textDocument/references for navigation.
    • textDocument/documentSymbol and workspace/symbol for structure searches.
    • textDocument/codeAction, textDocument/rename, textDocument/semanticTokens, and more.
  5. Responses and Notifications: Servers send responses with payloads defined in the protocol. They can also push diagnostics (textDocument/publishDiagnostics) or log messages asynchronously.
  6. Shutdown: Clients request graceful shutdown via shutdown followed by exit.

The protocol evolves through versioned specifications (currently 3.x). Beacon targets the subset required for an ergonomic Python workflow, while keeping the implementation modular so new methods can be added as needed.

Tree-sitter

This document contains notes I've compiled based on learnings about tree-sitter.

Tree-sitter is both a parser-generator tool and an incremental parsing library1. It’s optimized for embedding in editors and tooling (rather than being only a compiler backend parser). It supports many languages, with language-specific grammars2.

From the official site:

Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited.

What problems it solves

Here are its key value-propositions and the issues it addresses:

Better than regex/highlight hacks

Traditional editors often use regular expressions or ad-hoc syntax rules for things like syntax highlighting, folding, code navigation. These approaches tend to fail with complex nested constructs or incomplete code (common in live editing). Tree-sitter uses a proper parse tree (Concrete Syntax Tree) rather than purely regex heuristics, giving more accurate structure.

Incremental parsing / live editing

In an editor context, users are typing and modifying files constantly. Re-parsing the entire file on every keystroke is expensive and slow. Tree-sitter supports incremental parsing, meaning it updates only the changed portion of the tree rather than rebuilding everything. This means edits are reflected quickly and the tree remains coherent, which enables features like structured selection, live syntax highlighting, etc.

Unified API / language-agnostic tooling

Because each language has a Tree-sitter grammar, you can build tooling (highlighting, navigation, refactoring) in a language-agnostic way: query the tree, capture nodes of interest, etc. This reduces duplication of effort: editor vendors don’t have to write custom parsing logic per language to support advanced features.

Error-tolerant parsing for editing

Since code is often incomplete/invalid in the middle of editing, a robust parser needs to recover gracefully. Tree-sitter is designed to continue to provide a usable tree under such conditions so editors can rely on the tree structure even when the file is only partially valid.

Enables richer editor tooling

Because you have a full tree, you can support advanced features: structural selection (e.g., select "function" or "if block"), code folding by AST node, refactorings, cross-language injections (e.g., embedded languages). For example, using queries you can capture specific nodes in the tree and apply tooling logic.

Internals

Grammar / Parser Generation

For each language you want support for, you write a grammar file, typically grammar.js (or some variant) describing the language’s syntax in a DSL provided by Tree-sitter. Example: You describe rules like sum: ..., product: ..., define precedence, associativity (via helpers like prec.left, prec.right). You then run the Tree-sitter CLI (or build process) to generate a parser.c file (and possibly scanner.c) that formalizes the grammar into C code. That generated parser becomes the actual runtime component for that language.

Lexer/Tokenization

The generated parser includes a lexer (scanner) component that tokenizes the source code (turning characters into tokens). In some languages, you may supply a custom external scanner to handle tricky lexing cases (e.g., indent-based blocks, embedded languages) via scanner.c.

Parser Engine (GLR / LR)

The core algorithm is a generalized LR (GLR) parser. GLR means it can handle grammars with some ambiguity and still produce valid parse trees. In simple terms, the parser uses a parse table (states × tokens) to decide shift/reduce actions. The grammar defines precedence/associativity to resolve ambiguities. In addition to traditional LR parsing, Tree-sitter is optimized for incremental operation (see next).

Tree Representation & Node Structure

After parsing, you obtain a Concrete Syntax Tree (CST), which is a graph of nodes representing lexical tokens and syntactic constructs. Nodes carry the source-range (start and end positions) information. Nodes can be named, anonymous (underscore prefix in grammar means "anonymous" so it doesn’t appear in the final tree) to keep the tree cleaner.

Incremental Parsing

A key feature: when the source text changes (e.g., editing in an editor), Tree-sitter avoids re-parsing the whole file. Instead it reuses existing subtrees for unchanged regions and re-parses only the changed region plus a small margin around it.

  1. Editor notifies parser of changes (range of changed characters, old/new text)
  2. Parser identifies which nodes’ source ranges are invalidated
  3. It re-parses the minimal region and re-connects to reused nodes outside that region
  4. It produces an updated tree with source ranges corrected.

Querying & Tree Walk / API

Once you have a tree, you can run queries (S-expression style) to find sets of nodes matching patterns. For example, capture all if_statement nodes or function declarations. The API (C API, plus language bindings) allows you to walk nodes, inspect children, get start/end positions, text, etc3. The query system is powerful: you can specify patterns, nested structures, predicates (e.g., #eq? @attr_name "class").

Embedding / Use in Editors & Tools

Tree-sitter is designed to be embedded: the parsing library is written in C, and there are bindings in many languages (Rust, JS, Python, etc.)2. Editor plugins (for example nvim‑treesitter for Neovim) use Tree-sitter for syntax highlighting, structural editing, text-objects.

1

https://tree-sitter.github.io/ "Tree-sitter: Introduction"

PEP8

Philosophy

Purpose

Provide coding conventions for the Python standard library, to enhance readability and consistency1.

Underlying principle

"Code is read much more often than it is written."

Consistency matters

Within a project > within a module > within a function.

Exceptions permitted

When strictly following the guideline reduces clarity or conflicts with surrounding code.

Encoding

Use UTF-8 encoding for source files (in the core distribution).

Avoid non-ASCII identifiers in standard library modules; if used, limit noisy Unicode characters.

Layout

Indentation

Use 4 spaces per indentation level. Tabs are strongly discouraged. Never mix tabs and spaces.

Line Length

Preferred maximum: 79 characters for code.

For long blocks of text (comments/docstrings): ~72 characters.

Blank Lines and Vertical Whitespace

Insert blank lines to separate top-level functions and classes, and within classes to separate method groups.

Avoid extraneous blank lines within code structure.

Imports

Imports at top of file, after module docstring and before module globals/constants.

Group imports in the following order:

  1. Standard library imports
  2. Related third-party imports
  3. Local application/library-specific imports Insert a blank line between each group.

Absolute imports preferred; explicit relative imports acceptable for intra-package use.

Wildcard imports (from module import *) should be avoided except in rare cases (e.g., to publish a public API).

Whitespace

Avoid extra spaces in the following contexts:

  • Immediately inside parentheses, brackets or braces.
  • Between a trailing comma and a closing bracket.
  • Before a comma, semicolon, or colon.
  • More than one space around an assignment operator to align multiple statements (alignment discouraged)

Usage

# Correct:
spam(ham[1], {eggs: 2})

# Avoid:
spam( ham[ 1 ], { eggs: 2 } )

Comments

Good comments improve readability, explain why, not how.

Use full sentences, capitalize first word, leave a space after the #.

Inline comments should be used sparingly and separated by at least two spaces from the statement.

Block comments should align with code indentation and be separated by blank lines where appropriate.

Docstrings

Use triple-quoted strings for modules, functions, classes.

The first line should be a short summary; following lines provide more detail if necessary.

For conventions specific to docstrings see PEP 257 – Docstring Conventions.

Naming Conventions

KindConvention
ModulesShort, lowercase, may use underscores
PackagesAll-lowercase, preferably no underscores
ClassesUse CapWords (CamelCase) convention
ExceptionsTypically CapWords
Functions and methodsLowercase with underscores (snake_case)
VariablesUse lowercase_with_underscores
ConstantsAll UPPERCASE_WITH_UNDERSCORES
Private identifiersOne leading underscore _private; name mangling via __double_leading_underscore.
Type Vars (in generics)CapWords

Avoid single character names like l, O, I (they are easily confused with 1 and 0).

Recommendations

Avoid pointless object wrappers, redundant code; prefer simple, explicit approaches. This matches the ethos "explicit is better than implicit" from The Zen of Python.

When offering interfaces, design them so it is difficult to misuse them (i.e., "avoid programming errors").

Avoid using mutable default arguments in functions.

In comparisons to singletons (e.g., None), use is or is not rather than equality operators.

Exceptions to the Rules

The style guide states that while adherence is recommended, there are legitimate cases for deviation.

Reasons to deviate:

  • Strict adherence would reduce readability in context.
  • Code must remain consistent with surrounding non-PEP8 code (especially legacy).
  • The code predates the rule and rewriting it isn’t justified.

Tooling

Tools exist to help enforce or auto-format code to PEP 8 style (e.g., linters, auto-formatters).

Using such tools helps maintain style consistency especially on teams or open-source projects.

Summary

  1. Readability and consistency are the primary goals.
  2. Follow conventions: 4 spaces, line length ~79 chars, snake_case for functions/variables, CapWords for classes, uppercase for constants.
  3. Imports at top, grouped logically.
  4. Whitespace matters—used meaningfully, not decoratively.
  5. Use comments and docstrings effectively: explain why, not how.
  6. Be pragmatic: if strictly following every rule makes things worse, depart in favour of clarity.
  7. Use automation tools to assist but don’t treat the guide as dogma—interpret intelligently.
1

https://peps.python.org/pep-0008/ "PEP 8 – Style Guide for Python Code"

Type Hints (484) & Annotations (585)

PEP 484 - Type Hints

Overview

PEP 484 introduced a standardized system for adding type hints to Python code. Its goal was not to enforce static typing at runtime but to establish a formal syntax for optional type checking via external tools like mypy, pytype and later Pyright1. This marked a pivotal moment for Python’s type ecosystem — bridging the gap between dynamic and statically analyzable Python. It defined the foundations of the typing module and introduced the concept of gradual typing, where type hints coexist with dynamic typing1.

Concepts

Gradual Typing

Type annotations are optional, enabling progressive adoption without breaking existing code.

Type System Syntax

Function signatures, variables, and class members can be annotated using syntax like def greet(name: str) -> str:1.

typing module

Adds classes like List, Dict, Tuple, Optional, Union, Any, Callable1.

Type Checkers

External tools (e.g., mypy) use these annotations for static analysis, error detection, and IDE autocompletion.

Runtime Neutrality

Annotations are stored in __annotations__ and ignored by Python itself; type enforcement is delegated to external tools1.

Motivation

Before PEP 484, large Python projects (e.g., Dropbox, Google) developed internal type systems to manage complexity. PEP 484 unified these under a common specification inspired by mypy and by research in gradual typing1.

Impact

  • Established a shared foundation for static analysis across the ecosystem.
  • Enabled downstream standards like PEP 561 (distributable type stubs), PEP 563 (deferred evaluation of annotations), and PEP 604/649 (modernized syntax and semantics).

PEP 585 - Type Hinting Generics in Standard Collections

Overview

PEP 585 streamlined the use of generics by allowing the built-in collection types (e.g., list, dict, set) to be used directly as generic types, replacing typing.List, typing.Dict, etc2. For example, code such as:

from typing import List
def f(x: List[int]) -> None: ...

can now be written as:

def f(x: list[int]) -> None: ...

Motivation

PEP 484’s design relied on importing type aliases from the typing module. This indirection created redundancy, confusion, and runtime overhead. By 2020, with from __future__ import annotations and runtime type information improvements, it became viable to use built-ins directly2.

Core Changes

  • Built-in classes (list, dict, tuple, set, etc.) now support subscripting ([]) at runtime.
  • A new types.GenericAlias class is introduced internally to represent these parameterized generics2.
  • Backwards compatibility preserved — typing.List and others remain but are considered deprecated3.
  • Simplified syntax aligns Python with other typed languages’ ergonomics.

Impact

  1. Improved readability and ergonomics: Encourages list[int] over List[int].
  2. Reduces the mental split between runtime and static type worlds.
  3. Opens the door for the removal of redundant wrappers in future releases.

Summary

Together, PEP 484 and PEP 585 represent Python’s maturing type system:

  • PEP 484 built the scaffolding by defining syntax, semantics, and conventions.
  • PEP 585 modernized it by integrating type information natively into Python’s core language model. This reflects a shift from externalized static typing toward first-class optional typing. It preserves Python’s philosophy of flexibility while offering stronger correctness guarantees for large-scale codebases.
1

https://peps.python.org/pep-0484/ "PEP 484 – Type Hints | peps.python.org"

2

https://peps.python.org/pep-0585/ "PEP 585 – Type Hinting Generics In Standard Collections"

3

https://docs.python.org/3/library/typing.html "typing — Support for type hints — Python 3.13.5 documentation"

Distributing and Packaging Python Type Information (.pyi/stubs)

Abstract

PEP 561 establishes a standardized method for distributing and packaging type information in Python. It builds upon PEP 484, addressing the problem of how type information for bboth inline and in separate stub files. Stubs can be discovered, packaged, and used by type checkers across environments.

This allows:

  • Package maintainers to declare their code as typed,
  • Third parties to publish independent stub packages, and
  • Type checkers to resolve imports consistently across mixed environments.

Background

Prior to PEP 561:

  • There was no consistent way to distribute typing information with Python packages.
  • Stub files had to be manually placed in MYPYPATH or equivalent.
  • Community stubs were collected centrally in Typeshed, which became a scalability bottleneck.

The goals are:

  1. To use existing packaging infrastructure (distutils/setuptools).
  2. To provide clear markers for type-aware packages.
  3. To define resolution rules so that tools like mypy, pyright, or pylance can locate and prioritize type information uniformly

PEP 561 recognizes three models: inline-typed, stub-typed, and third-party stub-only packages.

Packaging Type Information

Inline

Inline-typed packages must include a marker file named py.typed inside the package root.

Example setup:

setup(
    name="foopkg",
    packages=["foopkg"],
    package_data={"foopkg": ["py.typed"]},
)

This file signals to type checkers that the package and all its submodules are typed. For namespace packages (PEP 420), the marker should be placed in submodules to avoid conflicts.

Stub-Only

  • Stub-only packages contain .pyi files without any runtime code.
  • Naming convention: foopkg-stubs provides types for foopkg.
  • py.typed is not required for these packages.
  • Version compatibility should be expressed in dependencies (e.g. via install_requires).

Example layout:

shapes-stubs/
└── polygons/
    ├── pentagon/__init__.pyi
    └── hexagon/__init__.pyi

Partial Stubs

Partial stubs (incomplete libraries) must include partial\n inside py.typed.

These instruct type checkers to:

  • Merge the stub directory with the runtime or typeshed directory.
  • Continue searching through later steps in the resolution order.

Module Resolution Order

Type checkers must resolve type information using the following ordered search path:

PrioritySourceDescription
1Manual stubs / MYPYPATHUser-specified patches override all.
2User codeThe project’s own files.
3Stub packages (*-stubs)Distributed stubs take precedence over inline types.
4py.typed packagesInline or bundled types inside installed packages.
5TypeshedFallback for stdlib and untyped third-party libs.

If a stub-only namespace package lacks a desired module, type checkers continue searching through the inline and typeshed steps. When checking against another Python version, the checker must look up that version’s site-packages path.

Conventions

Library Interface

When py.typed is present:

  • All .py and .pyi files are considered importable.
  • Files beginning with _ are private.
  • Public symbols are controlled via __all__.

Valid __all__ idioms include:

__all__ = ['a', 'b']
__all__ += submodule.__all__
__all__.extend(['c', 'd'])

These restrictions allow static determination of public exports by type checkers.

Imports and Re-Exports

Certain import forms signal that an imported symbol should be re-exported as part of the module’s public interface:

import X as X            # re-export X
from Y import X as X     # re-export X
from Y import *          # re-exports __all__ or all public symbols

All other imports are private by default.

Implementation and Tooling

  • mypy implements full PEP 561 resolution, allowing users to inspect installed package metadata (py.typed, stub presence, etc.).

  • Tools like pyright, pylance, and Pytype adopt the same ordering and conventions.

  • Example repositories include:

This design remains fully backward compatible, requiring no changes to Python’s runtime or packaging systems.

Structural Pattern Matching in Python

Structural Pattern Matching extends if/elif logic with declarative, data-shape-based matching. It allows code to deconstruct complex data structures and branch based on both type and content. Unlike switch in other languages, pattern matching inspects structure and value, not just equality1.

match command:
    case ("move", x, y):
        handle_move(x, y)
    case ("stop",):
        handle_stop()
    case _:
        print("Unknown command")

Syntax

Basic

match subject:
    case pattern_1 if guard_1:
        ...
    case pattern_2 if guard_2:
        ...
    case _:
        ...
  • The subject is evaluated once.
  • Each case pattern is tested in order.
  • The first pattern that matches (and whose optional if guard succeeds) executes.
  • The _ pattern matches anything (a wildcard).

Pattern Types

Literal

Match exact constants or values:

case 0 | 1 | 2:
    ...
case "quit":
    ...

Multiple literals can be combined with | (OR patterns).

Capture

Assign matched values to variables:

case ("move", x, y):
    # binds x and y

⚠️ Names in patterns always bind, they do not compare. To compare to an existing variable, use a value pattern:

case Point(x, y) if x == origin.x:

Sequence

Match list or tuple structure:

case [x, y, z]:
    ...
case [first, *rest]:
    ...

Mapping

Match dictionaries:

case {"type": "point", "x": x, "y": y}:
    ...

Keys are matched literally; missing keys cause no match.

Class

Deconstruct class instances via their attributes or positional parameters:

case Point(x, y):
    ...

This uses the class’s __match_args__ attribute to define positional fields.

Example:

class Point:
    __match_args__ = ("x", "y")
    def __init__(self, x, y):
        self.x, self.y = x, y

OR

Combine multiple alternatives:

case "quit" | "exit":
    ...

AS

Bind the entire match while destructuring:

case [x, y] as pair:
    ...

Wildcard

The _ pattern matches anything and never binds.

Guards (if clauses)

Optional if conditions refine matches:

match point:
    case Point(x, y) if x == y:
        print("on diagonal")

Guards are evaluated after successful structural match and can use bound names.

Semantics

ConceptBehavior
Evaluationsubject evaluated once; patterns checked in order
BindingSuccessful match creates new local bindings
FailureNon-matching case continues to next pattern
ExhaustivenessNo implicit else; always include case _: for completeness
GuardsBoolean expressions using pattern-bound variables

Examples2

Algebraic Data Types (ADTs)

Pattern matching elegantly models variant data:

class Node: pass
class Leaf(Node): ...
class Branch(Node):
    __match_args__ = ("left", "right")

def depth(tree):
    match tree:
        case Leaf(): return 1
        case Branch(l, r): return 1 + max(depth(l), depth(r))

Command Parsing

def process(cmd):
    match cmd.split():
        case ["load", filename]:
            load_file(filename)
        case ["quit" | "exit"]:
            sys.exit()
        case _:
            print("Unknown command")

HTTP-like Routing

match (method, path):
    case ("GET", "/"):
        return homepage()
    case ("GET", "/users"):
        return list_users()
    case ("POST", "/users", data):
        return create_user(data)

Design3

Goals

  • Provide clarity and conciseness for branching on structured data.
  • Support static analysis: patterns are explicit and compositional.
  • Encourage declarative code, replacing complex if ladders.

Why Not Switch?

  • Structural, not value-only: matches shape, type, and contents.
  • Integrates with Python’s dynamic typing and destructuring capabilities.

Why Not Functions?

While if statements or dispatch tables can emulate simple branching, pattern matching better communicates intent and is easier to read and verify.

Spec

CategoryRule
Subject typesAny object, including sequences, mappings, and classes
Match protocolFor class patterns, Python checks __match_args__ and attributes
Sequence matchRequires __len__ and __getitem__ methods
Mapping matchRequires .keys() and __getitem__; ignores extra keys
Pattern scopeVariables bound within a case are local to that block
Evaluation orderTop-to-bottom, left-to-right
ErrorsSyntaxError for invalid pattern constructs

Pitfalls

  1. Shadowing: Every bare name in a pattern binds, it doesn’t compare:

    color = "red"
    match color:
        case color:  # always matches and binds new variable!
            ...
    

    Use constants or enums instead:

    match color:
        case "red": ...
    
  2. Ignoring guards: Guards run after matching, not during expensive side effects inside guards are discouraged.

  3. Over-matching: Pattern length must align unless *rest is used.

Tooling

  • Linters: flake8, ruff, and pyright support pattern syntax.
  • Static analyzers: Type checkers can verify exhaustive matches on enums and dataclasses.
  • Refactoring tools: can replace nested if trees with match statements.

Usage Patterns

Use CasePattern Example
Enum dispatchcase Status.OK:
Dataclassescase Point(x, y):
Command tuplescase ("move", x, y):
JSON-like dictscase {"user": name, "id": uid}:
Error handlingcase {"error": msg} if "fatal" in msg:

Backwards Compatibility and Evolution

  • Introduced in Python 3.104.
  • Future extensions may include:
    • Better exhaustiveness checking
    • Improved IDE refactoring tools
    • Expanded type integration for dataclasses and typing constructs

Backward-incompatible syntax changes are unlikely; the match semantics are stable.

Summary

Pattern matching provides:

  • Declarative branching over structured data
  • Readable syntax for destructuring and filtering
  • Powerful composition of match conditions and guards

It is not a replacement for if statements. It is a new control structure for expressing shape-based logic cleanly and expressively.

Contributor Testing Guide

The repository contains several layers of tests that keep the formatter, language server, and static analysis features aligned. Run the suites below before submitting formatter or LSP changes.

Formatter

  • cargo test --package beacon-lsp --test formatting_regression_tests End-to-end regression coverage for real-world Python snippets. Recent additions include:
    • test_typevar_assignments for covariant bounds and keyword spacing.
    • test_walrus_operator_patterns and test_generators_and_yield for modern syntax.
    • test_data_science_method_calls and test_complex_lambda_and_functional for keyword-heavy method chains and nested lambda expressions.
  • cargo test --package beacon-lsp --test formatting_tests Unit tests exercising individual formatting rules (whitespace, imports, doc strings, range formatting, etc.).
  • cargo test --package beacon-lsp --test lsp_formatting_integration_tests Validates document/range formatting via the LSP pipeline.

When debugging a regression, you can run an individual test (for example cargo test --package beacon-lsp --test formatting_regression_tests test_type_annotations_basic) to inspect the formatter output.

Language Server & Analysis

  • cargo test --package beacon-lsp Runs all LSP providers, static analysis (CFG, data-flow, lint rules), and supporting infrastructure. This is the canonical smoke test before opening a pull request.
  • cargo test --workspace Optional full sweep that executes parser, core utilities, and the CLI in addition to the language server crate.

Guidelines

  1. Prefer targeted regression tests for every formatting or analysis bug fix.
  2. Keep tests deterministic: avoid timing assumptions or filesystem-global state.
  3. If a test case documents a known gap, annotate it with #[ignore] and file an issue so it can be tracked explicitly.
  4. Mention the exact command you executed when reporting failures; most suites now emit additional context to help diagnose spacing or tokenization issues.

Running the formatter suites plus cargo test --package beacon-lsp provides confidence that contributor changes respect the documented behavior across the entire toolchain.