Introduction
Beacon is an experimental Python type checker and developer experience platform written in Rust. This documentation set describes the architecture, design decisions, and research that power the project. Whether you are contributing to the codebase, evaluating the language server, or exploring the type system, start here to orient yourself.
Core Capabilities
Beacon provides a complete LSP-based, type-safe development environment for Python:
Type System
- Hindley-Milner type inference with automatic generalization
- Type narrowing through pattern matching and control flow
- Protocol satisfaction with variance checking
- Gradual typing compatibility
Code Intelligence
- Real-time diagnostics for syntax, semantic, and type errors
- Hover tooltips with inferred types and builtin documentation
- Smart completions using symbol table analysis
- Go to definition and find all references
- Workspace and document symbol search with fuzzy matching
Refactoring & Code Actions
- Symbol renaming with workspace-wide validation
- Quick fixes for common issues (unused imports, Optional types, pattern completions)
- Protocol method implementation assistance
- Type annotation insertion from inferred types
Editor Integration
- VS Code and Zed extensions with full feature support
- Compatible with any LSP client (Neovim, Helix, etc.)
- Semantic token highlighting and inlay hints
- Fast incremental analysis with multi-layer caching
What You'll Find
- LSP Overview: A deep dive into our Language Server Protocol implementation, including its goals, building blocks, and feature set.
- Type System Research: Summaries of the academic and practical references influencing Beacon’s approach to Hindley–Milner inference, gradual typing, and structural subtyping.
- Contributor Guides (planned): Setup instructions, style guidelines, and workflows for building and testing Beacon.
Project Vision
Beacon aims to combine precise type checking with interactive tooling that stays responsive for everyday Python development. The project embraces:
- Fast feedback loops enabled by incremental analysis.
- Interoperability with modern editors via LSP.
- A pragmatic blend of theoretical rigor and implementable engineering.
Getting Started
- Clone the repository and install Rust 1.70+ (stable).
- Run
cargo checkfrom the workspace root to verify the build. - Launch the LSP server with
cargo run -p beacon-lspor integrate with an editor using the provided configuration (see the LSP chapter). - Browse the documentation sidebar for in-depth topics.
Contributing
We welcome pull requests and discussions. To get involved:
- Review open issues
- Read the upcoming contributor guide (work in progress).
- Join the conversation in our community channels (details to be added).
Beacon is evolving quickly; expect iteration, experimentation, and plenty of opportunities to help shape the future of type checking for Python.
Configuration
Beacon LSP can be configured through TOML files for standalone usage or through your editor's settings when using an extension.
Configuration Files
Beacon searches for configuration in the following order:
beacon.tomlin your workspace root[tool.beacon]section inpyproject.toml
If multiple configuration files are found, beacon.toml takes precedence.
TOML Structure
Beacon configuration uses TOML sections to organize related settings:
[type_checking]
mode = "balanced"
[python]
version = "3.12"
stub_paths = ["stubs", "typings"]
[workspace]
source_roots = ["src", "lib"]
exclude_patterns = ["**/venv/**"]
[inlay_hints]
enable = true
variable_types = true
[diagnostics]
unresolved_imports = "warning"
circular_imports = "warning"
[formatting]
enabled = true
line_length = 88
quote_style = "double"
trailing_commas = "multiline"
[advanced]
incremental = true
cache_size = 100
Configuration Options
Type Checking
Configure type checking behavior under the [type_checking] section.
mode
Type checking strictness mode. Controls how the type checker handles annotation mismatches and inference.
- Type:
string - Default:
"balanced" - Values:
"strict": Annotation mismatches are hard errors with strict enforcement"balanced": Annotation mismatches are diagnostics with quick fixes, but inference proceeds"relaxed": Annotations supply bounds but can be overridden by inference
[type_checking]
mode = "balanced"
Python
Configure Python-specific settings under the [python] section.
version
Target Python version for feature support (e.g., pattern matching in 3.10+, PEP 695 syntax in 3.12+).
- Type:
string - Default:
"3.12" - Values:
"3.9","3.10","3.11","3.12","3.13"
[python]
version = "3.12"
stub_paths
Additional paths to search for .pyi stub files.
- Type:
array of strings - Default:
["stubs"]
[python]
stub_paths = ["stubs", "typings", "~/.local/share/python-stubs"]
Workspace
Configure workspace settings under the [workspace] section.
source_roots
Source roots for module resolution in addition to workspace root.
- Type:
array of strings - Default:
[]
[workspace]
source_roots = ["src", "lib"]
exclude_patterns
Glob patterns to exclude from workspace scanning.
- Type:
array of strings - Default:
[]
[workspace]
exclude_patterns = ["**/venv/**", "**/.venv/**", "**/node_modules/**"]
Inlay Hints
inlay_hints.enable
Master toggle for all inlay hints.
- Type:
boolean - Default:
true
[inlay_hints]
enable = true
inlay_hints.variable_types
Show inlay hints for inferred variable types on assignments without explicit type annotations.
- Type:
boolean - Default:
true
[inlay_hints]
variable_types = true
inlay_hints.function_return_types
Show inlay hints for inferred function return types on functions without explicit return type annotations.
- Type:
boolean - Default:
true
[inlay_hints]
function_return_types = true
inlay_hints.parameter_names
Show inlay hints for parameter names in function calls to improve readability.
- Type:
boolean - Default:
false
[inlay_hints]
parameter_names = false
Diagnostics
Configure diagnostic severity levels under the [diagnostics] section.
unresolved_imports
Diagnostic severity level for imports that cannot be resolved.
- Type:
string - Default:
"warning" - Values:
"error","warning","info"
[diagnostics]
unresolved_imports = "warning"
circular_imports
Diagnostic severity level for circular import dependencies.
- Type:
string - Default:
"warning" - Values:
"error","warning","info"
[diagnostics]
circular_imports = "warning"
Formatting
Configure code formatting behavior under the [formatting] section. Beacon provides PEP8-compliant formatting through the LSP.
formatting.enabled
Master toggle for code formatting.
- Type:
boolean - Default:
true
[formatting]
enabled = true
formatting.line_length
Maximum line length before wrapping.
- Type:
integer - Default:
88(Black-compatible) - Range:
20-200
[formatting]
line_length = 88
formatting.indent_size
Number of spaces per indentation level.
- Type:
integer - Default:
4 - Range:
2-8
[formatting]
indent_size = 4
formatting.quote_style
String quote style preference.
- Type:
string - Default:
"double" - Values:
"single": Use single quotes for strings"double": Use double quotes for strings"preserve": Keep existing quote style
[formatting]
quote_style = "double"
formatting.trailing_commas
Trailing comma behavior in multi-line structures.
- Type:
string - Default:
"multiline" - Values:
"always": Add trailing commas to all multi-line structures"multiline": Add trailing commas only to multi-line nested structures"never": Never add trailing commas
[formatting]
trailing_commas = "multiline"
formatting.max_blank_lines
Maximum consecutive blank lines allowed.
- Type:
integer - Default:
2 - Range:
0-5
[formatting]
max_blank_lines = 2
formatting.import_sorting
Import statement sorting style.
- Type:
string - Default:
"pep8" - Values:
"pep8": stdlib, third-party, local"isort": isort-compatible sorting"off": Disable import sorting
[formatting]
import_sorting = "pep8"
formatting.compatibility_mode
Compatibility with other Python formatters.
- Type:
string - Default:
"black" - Values:
"black": Black formatter compatibility (88 char line length)"autopep8": autopep8 compatibility (79 char line length)"pep8": Strict PEP8 (79 char line length)
[formatting]
compatibility_mode = "black"
formatting.use_tabs
Use tabs instead of spaces for indentation (not recommended).
- Type:
boolean - Default:
false
[formatting]
use_tabs = false
formatting.normalize_docstring_quotes
Normalize quotes in docstrings to match quote_style.
- Type:
boolean - Default:
true
[formatting]
normalize_docstring_quotes = true
formatting.spaces_around_operators
Add spaces around binary operators.
- Type:
boolean - Default:
true
[formatting]
spaces_around_operators = true
formatting.blank_line_before_class
Add blank lines before class definitions.
- Type:
boolean - Default:
true
[formatting]
blank_line_before_class = true
formatting.blank_line_before_function
Add blank lines before function definitions.
- Type:
boolean - Default:
true
[formatting]
blank_line_before_function = true
Advanced Options
Configure advanced performance and analysis settings under the [advanced] section.
max_any_depth
Maximum depth for Any type propagation before elevating diagnostics. Higher values are more permissive.
- Type:
integer - Default:
3 - Range:
0-10
[advanced]
max_any_depth = 3
incremental
Enable incremental type checking for faster re-analysis.
- Type:
boolean - Default:
true
[advanced]
incremental = true
workspace_analysis
Enable workspace-wide analysis and cross-file type checking.
- Type:
boolean - Default:
true
[advanced]
workspace_analysis = true
enable_caching
Enable multi-layer caching of parse trees, type inference results, and formatting outputs. Caching dramatically improves performance for incremental edits and repeated operations.
Beacon uses four cache layers:
- TypeCache: Node-level type inference (capacity: 100)
- ScopeCache: Scope-level analysis with content hashing (capacity: 200)
- AnalysisCache: Document-level analysis artifacts (capacity: 50)
- IntrospectionCache: Persistent Python introspection (capacity: 1000)
When enabled, Beacon automatically invalidates stale cache entries when documents change. Scope-level content hashing ensures only modified scopes are re-analyzed.
- Type:
boolean - Default:
true
[advanced]
enable_caching = true
For technical details on cache architecture and invalidation strategies, see Caching.
cache_size
Maximum number of documents to cache in the document-level analysis cache. Higher values improve performance for large workspaces at the cost of memory usage.
- Type:
integer - Default:
100 - Range:
0-1000
[advanced]
cache_size = 100
Example Configurations
Basic Configuration (beacon.toml)
[type_checking]
mode = "strict"
[python]
version = "3.12"
[diagnostics]
unresolved_imports = "error"
circular_imports = "warning"
Advanced Configuration (beacon.toml)
[type_checking]
mode = "balanced"
[python]
version = "3.13"
stub_paths = ["stubs", "typings"]
[workspace]
source_roots = ["src", "lib"]
exclude_patterns = ["**/venv/**", "**/.venv/**", "**/build/**"]
[inlay_hints]
enable = true
variable_types = true
function_return_types = true
parameter_names = false
[diagnostics]
unresolved_imports = "warning"
circular_imports = "info"
[formatting]
enabled = true
line_length = 100
indent_size = 4
quote_style = "double"
trailing_commas = "multiline"
import_sorting = "pep8"
[advanced]
max_any_depth = 5
incremental = true
workspace_analysis = true
enable_caching = true
cache_size = 200
Using pyproject.toml
[tool.beacon.type_checking]
mode = "strict"
[tool.beacon.python]
version = "3.12"
stub_paths = ["stubs", "typings"]
[tool.beacon.workspace]
source_roots = ["src"]
exclude_patterns = ["**/venv/**", "**/.venv/**"]
[tool.beacon.diagnostics]
unresolved_imports = "error"
[tool.beacon.formatting]
enabled = true
line_length = 88
quote_style = "double"
trailing_commas = "multiline"
Configuration Precedence
When using Beacon with an editor extension (e.g., VSCode), configuration is merged in the following order (later sources override earlier ones):
- Default values - Built-in defaults
- TOML file -
beacon.tomlorpyproject.toml - Editor settings - VSCode settings, Zed settings, Neovim config, etc.
This allows you to set project-wide defaults in TOML while still being able to override specific settings through your editor.
Beacon Language Server
Beacon's Language Server Protocol (LSP) implementation bridges the Rust-based analyzer with editors such as Zed, VSCode/VSCodium, Neovim, and Helix. This chapter documents the system from high-level goals to feature-by-feature behaviour.
LSP Capabilities Quick Reference
Beacon implements the following LSP features:
- Diagnostics: Real-time syntax, semantic, and type error reporting
- Hover: Context-sensitive type information and documentation
- Completion: Symbol table-based completions
- Navigation: Go to definition, find references, document highlights
- Symbols: Document outline and workspace fuzzy search
- Semantic tokens and inlay hints
- Refactoring: Rename, code actions, quick fixes
See Feature Providers for detailed implementation.
Documentation Overview
Use the sidebar to jump into any topic, or start with the sections below:
- Goals And Scope - what the server delivers today and what is intentionally out of scope.
- Architecture Overview - how shared state, concurrency, and feature wiring are structured.
- Document Pipeline - how file contents become parse trees, ASTs, and symbol tables.
- Caching - multi-layer cache architecture and invalidation strategies for fast incremental updates.
- Feature Providers - the capabilities exposed via LSP requests and notifications.
- Request Lifecycles - end-to-end flows for initialization, diagnostics, completions, and more.
- Workspace Services - cross-file features and emerging workspace indexing plans.
- Testing Strategy - automated coverage for providers and backend flows.
- Current Limitations - known gaps and trade-offs in the current implementation.
- Next Steps - near-term improvements on the roadmap.
If you are new to the Language Server Protocol itself, read the primer in Learn → Language Server Protocol before diving into these implementation details.
Goals
Deliver an incremental, performant, pragmatic Hindley-Milner (HM) type inference and checking engine for Python that integrates with modern editor tooling via the Language Server Protocol (LSP).
The system should support Python’s dynamic features thoughtfully, interoperate with typing hints, and scale to multi-file projects.
Why HM for Python?
HM type systems provide principled inference (no annotations required), compositional reasoning, strong guarantees, & fast unification-based algorithms (Algorithm W family).
Challenges
- Pervasive dynamism (monkey-patching,
__getattr__, metaclasses, duck typing, runtime reflection), - Nominal & structural patterns mixed
- Subtyping-ish expectations (
None, unions, protocols) - First-class classes & functions
- Decorators
- Generators
- Async
- Pattern matching (PEP 634).
Design
HM core + pragmatic extensions, with a gradual boundary to accommodate Python idioms and annotations:
HM for expressions and local bindings.
Controlled subtyping-like features via union/optionals and protocols/structural constraints.
Annotation-aware: treat PEP 484/PEP 695 types as constraints and hints.
Soundness modes: "strict", "balanced", "relaxed" (affecting treatment of Any, unknown attributes, dynamic imports).
┌──────────────────────────────────────────────────┐
│ LSP Frontend │
│ (tower-lsp or custom) using lsp-types for models │
└───────────────▲───────────────────────▲──────────┘
│ │
Requests / Notifications Diagnostics, hovers
│ │
┌──────────────────────────────┼───────────────────────┼────────────────────────────┐
│ Language Server Core │
│ ┌───────────────────────┐ ┌──────────────────────┐ ┌────────────────────────┐ │
│ │ Document Manager │ │ Project Graph │ │ Incremental Index │ │
│ │ (text, versions, TS │ │ (imports, deps, │ │ (symbols, stubs, │ │
│ │ parse trees) │ │ module cache) │ │ types, caches) │ │
│ └──────────▲────────────┘ └──────────▲───────────┘ └──────────▲─────────────┘ │
│ │ │ │ │
│ ┌───────┴────────┐ ┌─────────┴──────────┐ ┌────────┴────────────┐ │
│ │ Tree-sitter │ │ Constraint Gen │ │ Solver / Types │ │
│ │ Parser (Py) │ │ (walk TS AST, │ │ (unification, │ │
│ │ + lossless │ │ produce HM + │ │ polymorphism, │ │
│ │ syntax facts) │ │ extensions) │ │ row/structural) │ │
│ └───────▲────────┘ └─────────▲──────────┘ └────────▲────────────┘ │
│ │ │ │ │
│ └─────── Source -> AST ────┴── Constraints ───────────┘ │
└───────────────────────────────────────────────────────────────────────────────────┘
LSP Implementation Goals
Beacon's LSP focuses on delivering a fast, editor-friendly surface for the Beacon analyzer without overcommitting to unfinished infrastructure. The current goals fall into five themes.
Primary Goals
Immediate feedback: run parsing and type analysis on every edit so diagnostics stay in sync with the buffer.
Core navigation: support hover, go-to-definition, references, and symbol search for rapid code exploration.
Authoring assistance: provide completions, document symbols, inlay hints, and semantic tokens to guide editing.
Refactoring primitives: offer reliable rename support and lay the groundwork for richer code actions.
Modular design: isolate feature logic behind provider traits so contributors can evolve features independently.
Out-of-Scope (For Now)
- Full workspace indexing: we limit operations to open documents until indexing and cache management mature.
- Formatting and linting: formatting endpoints and lint integrations are planned but not part of the initial release.
- Editor-specific UX: we stick to LSP-standard capabilities instead of bespoke VS Code UI components.
Architecture Overview
The language server lives in crates/server and centres on the Backend type, which implements tower_lsp::LanguageServer. The architecture is deliberately modular so feature work and analyzer development can proceed in parallel.
Core Components
- Backend: receives every LSP request/notification and routes it to feature providers. It owns the shared state required by multiple features.
- Client (
tower_lsp::Client): handles outbound communication, including diagnostics, logs, and custom notifications. - DocumentManager: thread-safe cache of open documents. Each
Documentstores:- Source text (
ropey::Ropefor cheap edits). - Tree-sitter parse tree.
- Beacon AST.
- Symbol table produced by the name resolver.
- Source text (
- Analyzer: the Beacon type checker wrapped in an
Arc<RwLock<_>>because many features need mutable access to its caches. - Workspace: tracks the workspace root URI and will later manage module resolution and indexing.
- Features: a simple struct that instantiates each provider with shared dependencies and exposes them to the backend.
Concurrency Model
tower_lsp::LspService drives the backend on the Tokio runtime.
Read-heavy operations borrow documents or analyzer state immutably; diagnostics and rename take write locks to update caches.
Documents store text in a ropey::Rope, so incremental edits only touch the modified spans.
Error Handling
Feature methods typically return Option<T>: None means the feature has no answer for the request rather than hard-failing.
When unrecoverable errors occur (e.g., document not found), providers log via the client instead of crashing the server process.
Extensibility
Adding a new LSP method involves creating a provider (or extending an existing one) and exposing it through the Features struct.
Because providers depend only on DocumentManager and optionally the analyzer, they are easy to test in isolation.
This architecture keeps protocol plumbing concentrated in the backend while feature logic stays modular and testable.
Document Pipeline
The document pipeline keeps Beacon’s view of each open file synchronized with the editor. DocumentManager orchestrates the lifecycle and ensures every feature works from the same parse tree, AST, and symbol table.
Lifecycle Events
- Open (
textDocument/didOpen)- Create a
Documentwith the initial text, version, and URI. - Parse immediately via
LspParserto populate the parse tree, AST, and symbol table. - Insert the document into the manager’s map.
- Create a
- Change (
textDocument/didChange)- Apply full or incremental edits to the document’s rope.
- Re-run the parser to refresh derived data.
- Invalidate analyzer caches so diagnostics and semantic queries recompute with fresh information.
- Save (
textDocument/didSave)- Trigger diagnostics for the new persisted content. Behaviour matches the change handler today.
- Close (
textDocument/didClose)- Remove the document and send an empty diagnostics array to clear markers in the editor.
Data Stored per Document
Text: stored as a ropey::Rope for efficient splicing.
Parse tree: Tree-sitter syntax tree produced by the parser.
AST: Beacon’s simplified abstract syntax tree used by features and the analyzer.
Symbol table: scope-aware mapping created during name resolution.
Version: latest client-supplied document version, echoed back when publishing diagnostics.
Access Patterns
get_document: exposes an immutable snapshot to consumers like hover or completion.
get_document_mut: allows controlled mutation when necessary (rare in practice).
all_documents: lists URIs so workspace-level features can iterate through open files.
By centralizing parsing and symbol management, the pipeline guarantees consistent snapshots across diagnostics, navigation, and refactoring features.
Cache Architecture
Beacon uses a multi-layer caching system to minimize redundant analysis while maintaining correctness.
Cache Layers
The system provides four specialized cache layers, each optimized for different granularities:
TypeCache (Node-Level)
Caches inferred types for specific AST nodes. Each entry maps (uri, node_id, version) to a Type.
Capacity: 100 entries (default)
Eviction: LRU
Use case: Hover requests, completion suggestions, and other features that need type information for a specific node.
ScopeCache (Scope-Level)
Provides granular incremental re-analysis at scope level rather than document level. When only a single function changes in a large file, unchanged scopes retain their cached analysis results.
Cache key: (uri, scope_id, content_hash)
Content hashing: Uses DefaultHasher to compute a deterministic hash of the scope's source text. Different content produces different hashes, enabling precise change detection.
Cached data:
type_map: inferred types for nodes within the scopeposition_map: mapping from source positions to node IDsdependencies: scopes this scope depends on (parent, referenced scopes)
Capacity: 200 entries (default)
Eviction: LRU
Statistics: Tracks hits/misses for performance monitoring.
Use case: Type checking, diagnostics, and semantic analysis that can reuse results from unchanged scopes.
AnalysisCache (Document-Level)
Caches complete analysis artifacts per document version. Each entry maps (uri, version) to full analysis results including type maps, position maps, type errors, and static analysis findings.
Cached data:
- Complete type maps
- Position maps
- Type errors
- Static analysis results
Capacity: 50 entries (default)
Eviction: LRU
Version-based invalidation: New document versions automatically create new cache entries rather than invalidating existing ones.
Use case: Publishing diagnostics, workspace-wide queries, and features that need complete document analysis.
IntrospectionCache (Persistent)
Caches Python introspection results for external modules and the standard library. Persists to disk in .beacon-cache/introspection.json to survive server restarts.
Cached data:
- Function signatures
- Docstrings
- Module metadata
Capacity: 1000 entries
Eviction: LRU (in-memory), write-through to disk
Use case: Hover information for stdlib and third-party modules, completion for imported symbols.
Content Hashing Validation
ScopeCache uses content hashing to detect changes with high precision:
Hash computation:
#![allow(unused)] fn main() { let mut hasher = DefaultHasher::new(); source_content.hash(&mut hasher); let content_hash = hasher.finish(); }
Properties:
- Deterministic: same content always produces the same hash
- Whitespace-sensitive:
x = 1andx=1produce different hashes - Collision-resistant: sufficient for cache validation
Validation: Cache lookups compare the computed content hash against the cached key. Mismatches result in cache misses, forcing re-analysis of the modified scope.
Invalidation Strategies
Version-Based Invalidation
TypeCache checks document version on every access. If the document version differs from the cached entry's version, the entry is treated as stale.
AnalysisCache embeds version in the cache key, so new versions naturally create new entries without explicit invalidation.
Content-Based Invalidation
ScopeCache compares content hashes. When a scope's source changes:
- Compute new content hash from updated source
- Look up cache with new key
- Cache miss if hash differs
- Re-analyze and insert with new hash
Explicit Invalidation
CacheManager provides methods to invalidate specific scopes or entire documents:
invalidate_document: Removes all cache entries for a URI across all layers.
invalidate_scope: Removes entries for a specific scope from ScopeCache.
invalidate_selective: Invalidates specific scopes and returns the set of affected URIs for cascade invalidation.
Cascade Invalidation
When a scope changes, dependent scopes may also need invalidation. ImportDependencyTracker maintains a dependency graph to determine which scopes reference the changed scope, enabling selective cascade invalidation without over-invalidating.
Cache Coordination
CacheManager unifies all cache layers and coordinates invalidation:
On document change:
- Identify changed scopes by comparing content hashes
- Invalidate changed scopes in ScopeCache
- Clear document-level entries in AnalysisCache for the affected URI
- Query dependency tracker to find dependent scopes
- Invalidate dependents selectively
- TypeCache entries naturally become stale via version mismatch
On document close:
- Remove all cache entries for the URI
- Persist IntrospectionCache to disk
Performance Characteristics
Cache hit rates directly impact analysis latency:
Cold cache (first analysis): Full analysis required for all scopes.
Warm cache, no changes: All scopes hit, near-instant response.
Warm cache, localized change: Only changed scopes and dependents miss, dramatic speedup for large files.
ScopeCache statistics provide hit rate monitoring:
#![allow(unused)] fn main() { let stats = cache_manager.scope_cache_stats(); println!("Hit rate: {:.2}%", stats.hit_rate); }
Formatter Cache
The formatter uses a separate two-level cache optimized for formatting requests:
Short-Circuit Cache
Maps (source_hash, config_hash) to unit. Detects already-formatted code in O(1) time, avoiding redundant formatting operations.
Result Cache
Maps (source_hash, config_hash, start_line, end_line) to formatted output. Reuses formatting results for identical source and configuration.
Capacity: 100 entries (default) per layer
Eviction: LRU
Use case: Format-on-save, range formatting, and editor-initiated format requests.
Feature Providers
Each capability exposed by the language server lives in its own provider under crates/server/src/features.
Providers share the DocumentManager and, when needed, the analyzer.
Diagnostics
DiagnosticProvider aggregates:
- Parse errors emitted by the parser.
- Unbound variable checks.
- Type errors and warnings from the analyzer.
- Additional semantic warnings (e.g., annotation mismatches).
Results are published with document versions to prevent stale diagnostics in the editor.
Hover
HoverProvider returns context-sensitive information for the symbol under the cursor—typically inferred types or documentation snippets.
It reads the current AST and analyzer output to assemble Hover responses.
The hover system integrates with the builtin documentation and dunder metadata modules to provide rich information for Python's standard types and magic methods.
Completion
CompletionProvider uses symbol tables to surface in-scope identifiers. Trigger characters (currently ".") allow editors to request completions proactively.
Navigation
GotoDefinitionProvider locates definitions using symbol table lookups.
ReferencesProvider returns all occurrences of a symbol across open documents.
DocumentHighlightProvider highlights all occurrences of a symbol within a single file when the cursor is positioned on it.
The provider walks the AST to identify and classify occurrences:
- Variables: marked as READ or WRITE based on context (assignments are WRITE, usage is READ)
- Function names: highlighted in both definitions and call sites
- Function parameters: highlighted in both the parameter list and within the function body
- Class members: highlighted across the class definition
Symbols
DocumentSymbolsProvider walks the AST to produce hierarchical outlines (classes, functions, variables).
WorkspaceSymbolsProvider scans all open documents, performing case-insensitive matching with fuzzy search scoring.
It falls back to sensible defaults when nested symbols are missing from the symbol table.
The provider supports lazy symbol resolution for LSP clients that request location details on-demand.
Semantic Enhancements
SemanticTokensProvider projects syntax nodes into semantic token types and modifiers, enabling advanced highlighting.
InlayHintsProvider emits type annotations or other inline hints derived from the analyzer.
Refactoring
RenameProvider validates proposed identifiers, gathers edits via both AST traversal and Tree-sitter scans, deduplicates overlapping ranges, and returns a WorkspaceEdit.
Code Actions
CodeActionsProvider provides quick fixes and refactoring actions:
Quick Fixes:
- Removing unused variables and imports
- Wrapping types with
Optionalfor None-related type errors - Automatically adding
from typing import Optionalwhen needed - Adding missing pattern cases in match statements
- Removing unreachable pattern cases
- Implementing missing protocol methods for built-in protocols (Iterable, Iterator, Sized, Callable, Sequence, Mapping)
Refactorings:
- Inserting type annotations from inferred types on variable assignments
- Add missing imports for undefined symbols (coming soon!)
- Extract to function/method refactorings (coming soon!)
- Inline variable refactorings (coming soon!)
Support Modules
The features system includes specialized support modules:
builtin_docs provides embedded documentation for Python built-in types (str, int, list, dict, etc.).
Documentation is loaded from JSON at compile time and includes descriptions, common methods, and links to official Python documentation.
dunders supplies metadata and documentation for Python's magic methods (__init__, __str__, etc.) and builtin variables (__name__, __file__, etc.).
Adding new features typically means introducing a provider that consumes DocumentManager, optionally the analyzer, and wiring it through the Features struct so the backend can route requests.
Request Lifecycles
This section traces how the server handles key LSP interactions from start to finish.
Initialization
initializerequest- Captures the workspace root (
root_uri) from the client. - Builds
ServerCapabilities, advertising supported features: incremental sync, hover, completion, definitions, references, highlights, code actions, inlay hints, semantic tokens (full & range), document/workspace symbols, rename, and workspace symbol resolve. - Returns
InitializeResultwith optionalServerInfo.
- Captures the workspace root (
initializednotification- Currently logs an info message. Future work will kick off workspace scanning or indexing.
Text Synchronization & Diagnostics
didOpen → store the document, parse it, and call publish_diagnostics.
didChange → apply edits, reparse, invalidate analyzer caches, then re-run diagnostics.
didSave → trigger diagnostics again; behaviour matches the change handler.
didClose → remove the document and publish empty diagnostics to clear markers.
publish_diagnostics collects issues via DiagnosticProvider, tagging them with the current document version to avoid race conditions.
Hover, Completion, and Navigation
hover → query HoverProvider, which reads the AST and analyzer to produce Hover content.
completion → call CompletionProvider, returning a CompletionResponse (list or completion list).
gotoDefinition, typeDefinition, references, documentHighlight → use symbol table lookups to answer navigation requests.
These operations are pure reads when possible, avoiding locks beyond short-lived document snapshots.
Symbols
documentSymbol → returns either DocumentSymbol trees or SymbolInformation lists.
workspace/symbol → aggregates symbols from every open document, performing case-insensitive matching.
workspaceSymbol/resolve → currently a no-op passthrough
Semantic Tokens & Inlay Hints
textDocument/semanticTokens/full and /range → run the semantic tokens provider to emit delta-encoded token sequences for supported types/modifiers.
textDocument/inlayHint → acquire a write lock on the analyzer and compute inline hints for the requested range.
Refactoring
textDocument/rename → validate the new identifier, locate the target symbol, collect edits (AST traversal + Tree-sitter identifiers), deduplicate, and return a WorkspaceEdit.
textDocument/codeAction → placeholder; currently returns an empty list until specific actions are implemented.
Shutdown
shutdown returns Ok(()), signalling graceful teardown.
exit follows to terminate the process. We do not persist state yet, so shutdown is effectively stateless.
Workspace Services
While most features operate on individual documents, Beacon’s language server already supports several cross-file capabilities and is laying groundwork for broader workspace awareness.
Workspace Symbols
Iterates over URIs retrieved from DocumentManager::all_documents.
For each document, fetches the AST and symbol table, then performs case-insensitive matching against the query string.
Returns SymbolInformation with ranges, optional container names, and deprecation tags (SymbolTag::DEPRECATED where applicable).
Falls back to reasonable defaults when nested symbols (e.g., class methods) are missing from the symbol table.
Document Symbols
Provides structured outlines per file, organising classes, functions, assignments, and nested items.
Editors use the resulting tree to populate outline panes, breadcrumbs, or navigation search.
Workspace State
- The
Workspacestruct records theroot_urisupplied during initialization.
Notifications and Logging
The backend emits window/logMessage notifications for status updates and window/showMessage for user-facing alerts.
Diagnostics are republished after changes so editors update their inline markers and problems panels.
Long-Term Plans
Implement persistent symbol indexing keyed by the workspace root.
Add background tasks that refresh indexes when files change on disk.
Support multi-root workspaces and remote filesystems where applicable.
Although the current implementation focuses on open buffers, the architecture is designed to scale to full-project workflows as these enhancements land.
PyDoc Retrieval
The language server enriches hover and completion items for third-party Python packages by executing a short-lived Python subprocess to read real docstrings and signatures from the user's environment.
Interpreter Discovery
find_python_interpreter in crates/server/src/interpreter.rs walks common virtual environment managers (Poetry, Pipenv, uv) before falling back to python on the PATH.
Each probe shells out (poetry env info -p, pipenv --venv, uv python find) and returns the interpreter inside the virtual environment when successful.
The search runs per workspace and only logs at debug level on success. Missing tools or failures are tolerated—only a final warn! is emitted if no interpreter can be located.
Interpreter lookups currently rely on external commands and inherit their environment; this will eventually be an explicit path via LSP settings.
Introspection Flow
When a hover needs documentation for module.symbol, we call introspect in crates/server/src/introspection.rs with the discovered interpreter.
introspect constructs a tiny Python script that imports the target module, fetches the attribute, and prints two sentinel sections: SIGSTART (signature) and DOCSTART (docstring).
The async path spawns tokio::process::Command, while introspect_sync uses std::process::Command.
Both share parsing logic via parse_introspection_output.
The script uses inspect.signature and inspect.getdoc, so it respects docstring inheritance and returns cleaned whitespace.
Failures to inspect still return whatever data is available.
Parsing and Error Handling
Results are parsed by scanning for the sentinel lines and trimming the sections, yielding an IntrospectionResult { signature, docstring }.
Timeouts (3 seconds) protect the async path from hanging interpreters.
Other errors—missing module, attribute, or import failure—come back as IntrospectionError::ExecutionFailed with the stderr payload for debugging.
We log subprocess stderr on failure but avoid surfacing internal exceptions directly to the client.
Testing Guarantees
Unit tests cover the parser, confirm the generated script embeds the sentinels, and run best-effort smoke tests against standard library symbols when a Python interpreter is available. Tests skip gracefully if Python cannot be located, keeping CI green on machines without Python.
Static Analyzer
Beacon's language server leans on a modular static-analysis stack housed in crates/server/src/analysis.
The subsystem ingests a parsed document, infers types, builds control-flow graphs, runs pattern exhaustiveness checks, and produces diagnostics that drive editor features like hovers and squiggles.
Pipeline Overview
Analyzer::analyze is the high-level orchestration point:
- Grab a consistent AST + symbol table snapshot from the
DocumentManager. - Check analysis cache for a previously computed result at this document version.
- Extract scopes and check scope-level cache for incremental analysis opportunities.
- Walk the tree to emit lightweight constraints describing how expressions relate (equality, calls, attributes, protocols, patterns).
- Build a class registry containing metadata for all class definitions (fields, methods, protocols, inheritance).
- Invoke the shared
beacon_coreunifier to solve constraints, capturing any mismatches asTypeErrorInfo. - Apply the resulting substitution to refine all inferred types in the type map.
- Build function-level control-flow graphs and run data-flow passes to uncover use-before-def, unreachable code, and unused symbols.
- Package the inputs, inferred data, and diagnostics into an
AnalysisResult, caching at both scope and document level for quick repeat lookups.
The analyzer produces a type_map linking AST node IDs to inferred types and a position_map linking source positions to nodes, enabling hover and type-at-position queries.
Type Inference in Brief
type_env.rs supplies the Hindley–Milner style environment that powers constraint generation.
It seeds built-in symbols, hydrates annotations, and hands out fresh type variables whenever the AST does not provide one.
Each visit to a FunctionDef, assignment, call, or control-flow node updates the environment and records the relationships that must hold; the actual solving is deferred so the analyzer can collect all obligations before touching the unifier.
This keeps the pass linear, side-effect free, and easy to extend with new AST constructs.
The constraint system supports multiple relationship types:
- Equal: Type equality constraints (t1 ~ t2)
- Call: Function application with argument and return types
- HasAttr: Attribute access with method binding and inheritance resolution
- Protocol: Structural conformance checks for both built-in protocols (Iterable, Iterator, Sequence, AsyncIterable, AsyncIterator, Awaitable) and user-defined Protocol classes
- MatchPattern: Pattern matching with binding extraction
- PatternExhaustive: Exhaustiveness checking for match statements
- PatternReachable: Reachability checking to detect unreachable patterns
Once constraints reach solve_constraints, they are unified in order. Successful unifications compose into a substitution map, while failures persist with span metadata so editor clients can render precise diagnostics.
The class registry enables attribute resolution with full inheritance support, overload resolution for methods decorated with @overload, and structural protocol checking for user-defined Protocol classes.
Control & Data Flow
cfg.rs and data_flow.rs provide the structural analyses that complement pure typing:
- The CFG builder splits a function body into
BasicBlocks linked by typed edges (normal flow, branch outcomes, loop exits, exception edges, etc.), mirroring Python semantics closely enough for downstream passes to reason about reachability. - The data-flow analyzer consumes that graph plus the original AST slice to flag common hygiene issues: variables read before assignment, code that cannot execute, and symbols that never get used.
Results surface through
DataFlowResultand end up in the finalAnalysisResult.
This layered approach lets the LSP report both type-level and flow-level problems in a single request, keeping feedback tight while avoiding duplicate walks of the AST.
Class Metadata & Method Resolution
The class_metadata module tracks comprehensive information about class definitions:
- Fields: Inferred from assignments in
__init__and class body - Methods: Including support for overload sets via
@overloaddecorator - Special methods:
__init__and__new__signatures for constructor checking - Decorators:
@property,@classmethod,@staticmethodtracking - Protocols: Marks classes inheriting from
typing.Protocolfor structural conformance checking - Inheritance: Base class tracking with method resolution order for attribute lookup
Method types can be either single signatures or overload sets. When resolving a method call, the analyzer attempts to match argument types against overload signatures before falling back to the implementation signature.
Pattern Matching Support
The pattern and exhaustiveness modules provide comprehensive pattern matching analysis:
- Type checking for all pattern forms (literal, capture, wildcard, sequence, mapping, class, OR, AS)
- Exhaustiveness checking to ensure match statements cover all cases
- Reachability checking to detect unreachable patterns subsumed by earlier cases
- Binding extraction to track variables introduced by patterns
This enables diagnostics like PM001 (non-exhaustive match) and PM002 (unreachable pattern).
Linting & Additional Diagnostics
The linter and rules modules implement static checks beyond type correctness.
Many BEA-series diagnostic codes are implemented, with others awaiting parser or symbol table enhancements.
See the table of linting rules for details.
Utilities
Beyond inference and CFG analysis, the module exposes helpers for locating unbound identifiers, invalidating cached results when documents change, and bridging between symbol-table scopes and LSP positions.
Beacon Linter
The Beacon Rule Engine is a modular static analysis system powering diagnostics in Beacon.
It's foundationally a pure Rust implementation of PyFlakes.
Suppressing Warnings
Individual linter warnings can be suppressed using inline comments:
import os # noqa: BEA015 # Suppress unused import warning
x = undefined # noqa # Suppress all warnings on this line
See Suppressions for complete documentation on suppression comments.
Legend: ⚠ = Warning ✕ = Error ⓘ = Info
| Code | Name / RuleKind | Level | Category | Description |
|---|---|---|---|---|
| BEA001 | UndefinedName | ✕ | Naming | Variable or function used before being defined. |
| BEA002 | DuplicateArgument | ✕ | Functions | Duplicate parameter names in a function definition. |
| BEA003 | ReturnOutsideFunction | ✕ | Flow | return statement outside of a function or method body. |
| BEA004 | YieldOutsideFunction | ✕ | Flow | yield or yield from used outside a function context. |
| BEA005 | BreakOutsideLoop | ✕ | Flow | break used outside a for/while loop. |
| BEA006 | ContinueOutsideLoop | ✕ | Flow | continue used outside a for/while loop. |
| BEA007 | DefaultExceptNotLast | ⚠ | Exception | A bare except: is not the final exception handler in a try block. |
| BEA008 | RaiseNotImplemented | ⚠ | Semantics | Using raise NotImplemented instead of raise NotImplementedError. |
| BEA009 | TwoStarredExpressions | ✕ | Syntax | Two or more * unpacking expressions in assignment. |
| BEA010 | TooManyExpressionsInStarredAssignment | ✕ | Syntax | Too many expressions when unpacking into a starred target. |
| BEA011 | IfTuple | ⚠ | Logic | A tuple literal used as an if condition — always True. |
| BEA012 | AssertTuple | ⚠ | Logic | Assertion always true due to tuple literal. |
| BEA013 | FStringMissingPlaceholders | ⚠ | Strings | f-string declared but contains no {} placeholders. |
| BEA014 | TStringMissingPlaceholders | ⚠ | Strings | t-string declared but contains no placeholders. |
| BEA015 | UnusedImport | ⚠ | Symbols | Import is never used within the file. |
| BEA016 | UnusedVariable | ⚠ | Symbols | Local variable assigned but never used. |
| BEA017 | UnusedAnnotation | ⚠ | Symbols | Annotated variable never referenced. |
| BEA018 | RedefinedWhileUnused | ⚠ | Naming | Variable redefined before original was used. |
| BEA019 | ImportShadowedByLoopVar | ⚠ | Scope | Import name shadowed by a loop variable. |
| BEA020 | ImportStarNotPermitted | ✕ | Imports | from module import * used inside a function or class. |
| BEA021 | ImportStarUsed | ⚠ | Imports | import * prevents detection of undefined names. |
| BEA022 | UnusedIndirectAssignment | ⚠ | Naming | Global or nonlocal declared but never reassigned. |
| BEA023 | ForwardAnnotationSyntaxError | ✕ | Typing | Syntax error in forward type annotation. |
| BEA024 | MultiValueRepeatedKeyLiteral | ⚠ | Dict | Dictionary literal repeats key with different values. |
| BEA025 | PercentFormatInvalidFormat | ⚠ | Strings | Invalid % format string. |
| BEA026 | IsLiteral | ⚠ | Logic | Comparing constants with is or is not instead of ==/!=. |
| BEA027 | DefaultExceptNotLast | ⚠ | Exception | Bare except: must appear last. |
| BEA028 | UnreachableCode | ⚠ | Flow | Code after a return, raise, or break is never executed. |
| BEA029 | RedundantPass | ⓘ | Cleanup | pass used in a block that already has content. |
| BEA030 | EmptyExcept | ⚠ | Exception | except: with no handling code (silent failure). |
Rules
BEA001
Example
print(foo) before foo is defined.
Fix
Define the variable before use or fix the typo.
BEA002
Example
def f(x, x):
pass
Fix
Rename one of the parameters.
BEA003
Example
Top-level return 5 in a module.
Fix
Remove or move inside a function.
BEA004
Example
yield x at module scope.
Fix
Wrap in a generator function.
BEA005
Example
break in global scope or in a function without loop.
Fix
Remove or restructure the code to include a loop.
BEA006
Example
continue in a function with no loop.
Fix
Remove or replace with control flow logic.
BEA007
Example
except: followed by except ValueError:
Fix
Move the except: block to the end of the try.
BEA008
Example
raise NotImplemented
Fix
Replace with raise NotImplementedError.
BEA009
Example
a, *b, *c = d
Fix
Only one starred target is allowed.
BEA010
Example
a, b, c, d = (1, 2, 3)
Fix
Adjust unpacking count.
BEA011
Example
if (x,):
...
Fix
Remove accidental comma or rewrite condition.
BEA012
Example
assert (x, y)
Fix
Remove parentheses or fix expression.
BEA013
Example
f"Hello world"
Fix
Remove the f prefix if unnecessary.
BEA014
Example
t"foo"
Fix
Remove the t prefix if unused.
BEA015
Example
import os not referenced.
Fix
Remove the unused import.
BEA016
Example
x = 10 never referenced again.
Fix
Remove assignment or prefix with _ if intentional.
BEA017
Example
x: int declared but unused.
Fix
Remove or use variable.
BEA018
Example
x = 1; x = 2 without reading x.
Fix
Remove unused definition.
BEA019
Example
import os
for os in range(3):
...
Fix
Rename loop variable.
BEA020
Example
def f():
from math import *
Fix
Move import to module level.
BEA021
Example
from os import *
Fix
Replace with explicit imports.
BEA022
Example
global foo never used.
Fix
Remove redundant declaration.
BEA023
Example
def f() -> "List[int": ...
Fix
Fix or quote properly.
BEA024
Example
{'a': 1, 'a': 2}
Fix
Merge or remove duplicate keys.
BEA025
Example
"%q" % 3
Fix
Correct format specifier.
BEA026
Example
x is 5
Fix
Use ==/!=.
BEA027
Example
As above.
Fix
Reorder exception handlers.
BEA028
Example
return 5; print("unreachable")
Fix
Remove or refactor code.
BEA029
Example
def f():
pass
return 1
Fix
Remove redundant pass.
BEA030
Example
try:
...
except:
pass
Fix
Handle exception or remove block.
Planned
| Name | Kind | Category | Severity | Rationale |
|---|---|---|---|---|
| Mutable Default Argument | MutableDefaultArgument | Semantic | ✕ | Detect functions that use a mutable object (e.g., list, dict, set) as a default argument. |
| Return in Finally | ReturnInFinally | Flow | ✕ | Catch a return, break, or continue inside a finally block: this often suppresses the original exception and leads to subtle bugs. |
| For-Else Without Break | ForElseWithoutBreak | Flow | ⚠ | The for ... else construct where the else never executes a break is confusing and often mis-used. If you have else: on a loop but never break, you may signal confusing logic. |
| Wrong Exception Caught | BroadExceptionCaught | Exception | ⚠ | Catching too broad exceptions (e.g., except Exception: or except:) instead of specific types can hide bugs. You already have empty except; this expands to overly broad catching. |
| Inconsistent Return Types | InconsistentReturnTypes | Function | ⚠ | A function that returns different types on different paths (e.g., return int in one branch, return None in another) may lead to consuming code bugs especially if not annotated. |
| Index / Key Errors Likely | UnsafeIndexOrKeyAccess | Data | ⚠ | Detect patterns that likely lead to IndexError or KeyError, e.g., accessing list/dict without checking length/keys, especially inside loops. |
| Unused Coroutine / Async Function | UnusedCoroutine | Symbol | ⚠ | In async code: a async def function is defined but neither awaited nor returned anywhere — likely a bug. |
| Resource Leak / Unclosed Descriptor | UnclosedResource | Symbol | ⚠ | Detect file or network resource opened (e.g., open(...)) without being closed or managed via context manager (with). |
| Logging Format String Errors | LoggingFormatError | String | ⚠ | Using % or f-string incorrectly in logging calls (e.g., logging format mismatches number of placeholders) can cause runtime exceptions or silent failures. |
| Comparison to None Using == / != | NoneComparison | Logic | ⚠ | Discourage == None or != None in favor of is None / is not None. |
Beacon Diagnostic Codes
Beacon’s Diagnostic provider combines parser feedback, Hindley–Milner type errors, annotation coverage checks, control/data-flow analysis, and workspace import resolution into a single stream of LSP diagnostics.
This guide lists every diagnostic code emitted by that pipeline so you can interpret squiggles quickly and trace them back to the subsystem described in Type Checking, Static Analyzer, and Type Checking Modes.
LSP severity for imports (circular vs. unresolved) remains configurable under [diagnostics] as documented in Configuration.
To temporarily suppress any diagnostic, use the mechanisms described in Suppressions.
Legend:
- ⚠ = Warning
- ✕ = Error
- ⓘ = Info/Hints
Note that per-mode rows show the icon used in strict / balanced / relaxed order
| Code | Name | Level | Category | Description |
|---|---|---|---|---|
| ANY001 | UnsafeAnyUsage | ⚠ | Type Safety | Deep inference found an Any value, reducing type safety. |
| ANN001 | AnnotationMismatch | ✕ ⚠ ⓘ | Annotations | Declared annotation disagrees with the inferred type. |
| ANN002 | MissingVariableAnnotation | ✕ ⚠ | Annotations | Assignment lacks an annotation in strict/balanced modes. |
| ANN003 | ParameterAnnotationMismatch | ✕ ⚠ ⓘ | Annotations | Parameter annotation conflicts with inferred usage. |
| ANN004 | MissingParameterAnnotation | ✕ ⚠ | Annotations | Parameter missing annotation when inference is precise. |
| ANN005 | ReturnAnnotationMismatch | ✕ ⚠ ⓘ | Annotations | Function return annotation disagrees with inference. |
| ANN006 | MissingReturnAnnotation | ✕ ⚠ | Annotations | Function lacks return annotation when inference is concrete. |
| ANN007 | ImplicitAnyParameter | ✕ | Annotations | Strict mode forbids implicit Any on parameters. |
| ANN008 | ImplicitAnyReturn | ✕ | Annotations | Strict mode forbids implicit Any return types. |
| ANN009 | MissingClassAttributeAnnotation | ✕ | Annotations | Strict mode requires explicit annotations on class attributes. |
| ANN010 | BareExceptClause | ✕ | Annotations | Strict mode forbids bare except: clauses without exception types. |
| ANN011 | ParameterImplicitAny | ⚠ | Annotations | Balanced mode warns when parameter type resolves to implicit Any. |
| ANN012 | ReturnImplicitAny | ⚠ | Annotations | Balanced mode warns when return type resolves to implicit Any. |
| DUNDER_INFO | EntryPointGuard | ⓘ | Dunder Patterns | Highlights if __name__ == "__main__": guard blocks. |
| DUNDER001 | MagicMethodOutOfScope | ⚠ | Dunder Patterns | Magic methods defined outside a class. |
| HM001 | TypeMismatch | ✕ | Type System | Hindley–Milner could not unify two types. |
| HM002 | OccursCheckFailed | ✕ | Type System | Recursive type variable detected (infinite type). |
| HM003 | UndefinedTypeVar | ✕ | Type System | Referenced type variable was never declared. |
| HM004 | KindMismatch | ✕ | Type System | Wrong number of type arguments supplied to a generic. |
| HM005 | InfiniteType | ✕ | Type System | Inference produced a non-terminating type (self-referential). |
| HM006 | ProtocolNotSatisfied | ✕ | Type System | Value fails to implement the required protocol methods. |
| HM007 | AttributeNotFound | ✕ | Attributes | Attribute or method does not exist on the receiver type. |
| HM008 | ArgumentCountMismatch | ✕ | Type System | Call site passes too many or too few arguments. |
| HM009 | ArgumentTypeMismatch | ✕ | Type System | Argument type incompatible with the parameter type. |
| HM010 | PatternTypeMismatch | ✕ | Pattern Typing | Match/case pattern cannot match the subject type. |
| HM011 | KeywordArgumentError | ✕ | Type System | Unknown or duplicate keyword arguments in a call. |
| HM012 | GenericTypeError | ✕ | Type System | Catch-all Hindley–Milner error (value restriction, etc.). |
| HM013 | PatternStructureMismatch | ✕ | Pattern Typing | Pattern shape (mapping, class, sequence) differs from subject. |
| HM014 | VarianceError | ✕ | Variance | Invariant/covariant/contravariant constraint violated. |
| MODE_INFO | TypeCheckingMode | ⓘ | Mode | Reminder showing which type-checking mode produced diagnostics. |
| PM001 | PatternNonExhaustive | ✕ | Patterns | Match statement fails to cover every possible case. |
| PM002 | PatternUnreachable | ✕ | Patterns | Later pattern is shadowed by an earlier one. |
| circular-import | CircularImport | ✕ ⚠ ⓘ | Imports | Module participates in an import cycle (severity comes from config). |
| missing-module | MissingModule | ✕ | Imports | Referenced module is absent from the workspace/stubs. |
| shadowed-variable | ShadowedVariable | ⚠ | Scope | Inner scope reuses a name that already exists in an outer scope. |
| undefined-variable | UndefinedVariable | ✕ | Name Resolution | Name used before being defined anywhere. |
| unresolved-import | UnresolvedImport | ✕ ⚠ ⓘ | Imports | Import target cannot be resolved (severity configurable). |
| unreachable-code | UnreachableCode | ⚠ | Data Flow | Code after return, raise, or break never executes. |
| unused-variable | UnusedVariable | ⓘ | Data Flow | Variable assigned but never read. |
| use-before-def | UseBeforeDef | ✕ | Data Flow | Variable read before it is assigned in the current scope. |
Diagnostics by Category
- Type Safety Diagnostics
- Annotation Diagnostics
- Dunder Pattern Diagnostics
- Type System Diagnostics
- Attribute Diagnostics
- Pattern Typing Diagnostics
- Variance Diagnostics
- Mode Diagnostics
- Pattern Diagnostics
- Import Diagnostics
- Scope Diagnostics
- Name Resolution Diagnostics
- Data-Flow Diagnostics
Type Safety Diagnostics
Diagnostics in this category highlight when inference collapses to Any and reduces overall type safety.
ANY001 – UnsafeAnyUsage
Example
from typing import Any
payload: Any = fetch_config()
payload["timeout"] # ANY001 – inference lost precision once `Any` appeared
Guidance
Beacon warns when unchecked Any values flow through the type map (see Special Types).
Replace Any with a precise annotation, cast the value after runtime checks, or refactor APIs so that callers receive concrete types.
Annotations Diagnostics
Annotation diagnostics cover mismatches between declared types and inferred usage along with mode-specific requirements for annotations.
ANN001 – AnnotationMismatch
Example
value: int = "stale" # Annotated as int, inferred as str
Guidance
Annotation/inference mismatches inherit their severity from the active mode (strict → error, balanced → warning, relaxed → hint). Align the annotation with real usage, or change the code so the inferred type matches. See Type Checking and Type Checking Modes for the inference rules.
ANN002 – MissingVariableAnnotation
Example
# beacon: mode=strict
profile = load_profile() # Missing annotation (ANN002)
Guidance
Strict/balanced modes expect assignments with concrete inferred types to be annotated.
Add the appropriate annotation (profile: Profile = load_profile()), or downgrade the file to relaxed mode if intentional (see Type Checking Modes).
ANN003 – ParameterAnnotationMismatch
Example
def greet(name: str) -> str:
return name + 1 # name inferred as int due to arithmetic
Guidance
Parameter annotations must agree with how the function body uses the value. Update the annotation or refactor the body to respect it. Details about how inference follows parameter usage live in Type Checking.
ANN004 – MissingParameterAnnotation
Example
def send_email(address):
...
Balanced/strict modes infer address: str (or similar) and emit ANN004.
Guidance
Add explicit parameter annotations whenever inference is concrete: def send_email(address: str) -> None:.
Relaxed mode skips this check entirely (see Type Checking Modes).
ANN005 – ReturnAnnotationMismatch
Example
def parity(flag: bool) -> bool:
return "odd" # Return annotation mismatch
Guidance
Ensure return annotations reflect every path. Either return the annotated type or adjust the annotation. See Type Checking for how Beacon treats return types.
ANN006 – MissingReturnAnnotation
Example
def total(values):
return sum(values)
Balanced/strict modes infer a concrete return type (e.g., int) and require -> int.
Guidance
Add return annotations when inference is precise and not Any/None: def total(values: list[int]) -> int:.
Relaxed mode suppresses this requirement (see Type Checking Modes).
ANN007 – ImplicitAnyParameter
Example
# beacon: mode=strict
def transform(data):
return data.strip()
Guidance
Strict mode disallows implicit Any on parameters even when inference could deduce a type.
Add annotations for every parameter (data: str). Balanced/relaxed modes emit ANN004 instead or skip the check entirely.
Review Type Checking Modes for severity rules.
ANN008 – ImplicitAnyReturn
Example
# beacon: mode=strict
def make_id():
return uuid.uuid4().hex # Implicit Any return type
Guidance
Strict mode requires explicit return annotations on every function.
Provide the exact type (-> str) or relax the file mode if you intentionally rely on inference.
See Type Checking Modes for override syntax.
ANN009 – MissingClassAttributeAnnotation
Example
# beacon: mode=strict
class Configuration:
host = "localhost" # Missing type annotation
port: int = 8080 # OK: Has annotation
Guidance
Strict mode requires explicit type annotations on all class attributes.
Add the annotation (host: str = "localhost") or use balanced/relaxed mode if gradual typing is preferred.
Note that instance attributes (assigned in __init__ or other methods) are not subject to this check—only class-level attributes defined directly in the class body.
See Type Checking Modes for mode configuration.
ANN010 – BareExceptClause
Example
# beacon: mode=strict
def process_data():
try:
result = risky_operation()
except: # ANN010: Bare except clause not allowed
handle_error()
Guidance
Strict mode requires specific exception types in except clauses to prevent catching system exceptions like KeyboardInterrupt or SystemExit unintentionally.
Replace bare except: with specific exception types:
# Good: Specific exception type
except ValueError:
...
# Good: Multiple exception types
except (ValueError, TypeError):
...
# Good: Catch most exceptions but not system ones
except Exception:
...
Balanced and relaxed modes allow bare except clauses for gradual adoption. See Type Checking Modes for mode configuration.
ANN011 – ParameterImplicitAny
Example
# beacon: mode=balanced
def process_unknown(data, options):
return data # ANN011: 'data' and 'options' have implicit Any type
Guidance
Balanced mode distinguishes between concrete inferred types (which trigger ANN004 with type suggestions) and implicit Any (which triggers ANN011).
When type inference cannot determine a concrete type due to insufficient context, parameters are finalized as Any and this warning is emitted.
Add type annotations to clarify the intended types:
# Good: Explicit annotations remove ambiguity
def process_unknown(data: dict[str, Any], options: dict[str, str]) -> dict[str, Any]:
return data
This diagnostic helps identify truly ambiguous cases where annotations provide the most value. Strict mode reports all missing parameter annotations as ANN007 errors instead. See Type Checking Modes for inference behavior.
ANN012 – ReturnImplicitAny
Example
# beacon: mode=balanced
def handle_dynamic(value):
print(value) # ANN012: Return type is implicit Any
Guidance
When a function's return type cannot be inferred to a concrete type, balanced mode warns with ANN012. This differs from ANN006, which fires when inference determines a concrete type but the annotation is missing.
Add an explicit return type annotation:
# Good: Explicit return type
def handle_dynamic(value: Any) -> None:
print(value)
For functions with implicit Any returns, consider whether:
- The return type should be
None(procedures) - You need to add annotations to parameters to enable better inference
- The function genuinely needs
-> Anydue to dynamic behavior
Strict mode reports all missing return annotations as ANN008 errors instead. See Type Checking Modes for the distinction between concrete inference and implicit Any.
Dunder Patterns Diagnostics
Beacon flags special-casing guidance for dunder blocks and entry-point guards to keep symbol metadata accurate.
DUNDER_INFO – EntryPointGuard
Example
if __name__ == "__main__":
run_cli()
Guidance
This informational hint makes entry-point guards easier to spot. No action needed. The behavior is described in Semantic Enhancements.
DUNDER001 – MagicMethodOutOfScope
Example
def __str__():
return "oops" # Should live inside a class
Guidance
Define magic methods inside classes (class Foo:\n def __str__(self) -> str: ...). This keeps symbol metadata consistent with Python semantics.
See Semantic Enhancements for background on how Beacon tracks dunders.
Type System Diagnostics
Type system diagnostics originate from Hindley–Milner inference, call checking, and generic validation.
HM001 – TypeMismatch
Example
def add(flag: bool) -> int:
return flag + "!" # int vs. str cannot unify
Guidance
Beacon’s Hindley–Milner engine reports HM001 when two types cannot unify (see Subtyping vs Unification). Convert or narrow the values so the operands share a compatible type.
HM002 – OccursCheckFailed
Example
def self_apply(f):
return f(f) # Requires f: T -> T, but T would have to contain itself
Guidance
Occurs-check failures indicate an infinite recursive type. Refactor so values are not applied to themselves without a wrapper type, or introduce generics that break the cycle. See Type Checking for how recursive types are limited.
HM003 – UndefinedTypeVar
Example
def use_unknown(x: U) -> U: # U was never declared via TypeVar
return x
Guidance
Declare every type variable with TypeVar before referencing it: U = TypeVar("U").
The generics workflow is covered in Type Checking.
HM004 – KindMismatch
Example
ids: dict[str] = {} # dict expects two type arguments
Guidance
Provide the correct number of arguments for each generic (dict[str, int]).
Beacon enforces kind arity to avoid ambiguous instantiations.
See Type Checking.
HM005 – InfiniteType
Example
def paradox(x):
return x(paradox) # Leads to an infinite type when inferred
Guidance
Infinite type errors usually stem from higher-order functions that apply un-annotated callables to themselves. Add annotations to break the cycle or restructure the algorithm so a value is not required to contain itself.
HM006 – ProtocolNotSatisfied
Example
from typing import Iterable
def consume(xs: Iterable[str]) -> None:
for item in xs:
print(item.upper())
consume(10) # int does not satisfy Iterable[str]
Guidance
Ensure call arguments implement the required protocol slots or convert them first (wrap values in iterables, implement __iter__, etc.).
Protocol behavior is described in Type Checking.
HM008 – ArgumentCountMismatch
Example
def pair(a: int, b: int) -> None:
...
pair(1) # Missing second positional argument
Guidance
Match the declared arity (positional + keyword-only + variadic). Add or remove arguments, or update the function signature. This follows the call constraint rules in Type Checking.
HM009 – ArgumentTypeMismatch
Example
def square(x: int) -> int:
return x * x
square("ten") # Argument type mismatch
Guidance
Convert arguments to the expected type or adjust the signature to accept a broader type. Beacon pinpoints the offending parameter in the diagnostic.
HM011 – KeywordArgumentError
Example
def connect(host: str, *, ssl: bool) -> None:
...
connect("db", secure=True) # Unknown keyword `secure`
Guidance
Use valid keyword names, avoid duplicates, and respect positional-only/keyword-only markers. Adjust the call site or function signature accordingly.
HM012 – GenericTypeError
Example
def capture() -> int:
cache = []
def inner():
cache.append(inner)
return inner(cache) # Triggers a generic HM012 error about unsafe recursion
Guidance
HM012 is a catch-all for rare Hindley–Milner failures (value restriction violations, unsupported constructs). Inspect the message for context, add annotations to guide inference, or refactor towards supported patterns. See Type Checking.
Attributes Diagnostics
Attribute diagnostics explain when a receiver type does not define the attribute being accessed.
HM007 – AttributeNotFound
Example
count = 10
count.splitlines() # Attribute does not exist on int
Guidance
The analyzer could not find the attribute on the receiver type. Narrow the type, convert the value, or fix typos. Beacon adds contextual hints (e.g., “splitlines is a string method”). See Type Checking for attribute resolution notes.
Pattern Typing Diagnostics
Pattern typing diagnostics focus on structural mismatches that arise during match statement analysis.
HM010 – PatternTypeMismatch
Example
def parse(match_obj):
match match_obj:
case (x, y): # HM010 if match_obj is inferred as str
...
Guidance
Ensure match subjects and patterns agree (use tuples with tuple subjects, mappings with dicts, etc.). Pattern typing is detailed in Pattern Matching Support.
HM013 – PatternStructureMismatch
Example
def report(event):
match event:
case {"kind": kind, "meta": {"user": user}}:
...
If event is inferred as a tuple or class, the mapping pattern structure mismatches.
Guidance
Use patterns whose structure matches the subject (mappings for dicts, class patterns for dataclasses, etc.). Details live in Pattern Matching Support.
Variance Diagnostics
Variance diagnostics describe when mutable containers or position constraints break covariance/contravariance rules.
HM014 – VarianceError
Example
pets: list[object] = ["dog", "cat"] # list is invariant
specific_pets: list[str] = pets # HM014: cannot assign list[str] to list[object]
Guidance
Respect variance constraints.
Mutable containers are invariant, so consider using immutable collections (tuple[str, ...]) or widening the source type.
The diagnostic message includes targeted advice per position (in/out).
See Type Checking.
Mode Diagnostics
Mode diagnostics are informational hints emitted when Beacon reports which type-checking mode produced a set of issues.
MODE_INFO – TypeCheckingMode
Example
Type checking mode: balanced (workspace default) - ...
Guidance
Beacon appends this hint whenever diagnostics appear so you know whether strict/balanced/relaxed rules applied.
Use # beacon: mode=strict (etc.) to override as described in Type Checking Modes.
Patterns Diagnostics
Pattern diagnostics describe exhaustiveness and reachability issues detected in structural pattern matching.
PM001 – PatternNonExhaustive
Example
def handle(flag: bool) -> str:
match flag:
case True:
return "y"
No False case triggers PM001.
Guidance
Add the missing cases (case False: or case _:).
Exhaustiveness checking is covered in Pattern Matching Support.
PM002 – PatternUnreachable
Example
match value:
case _:
return 0
case 1:
return value # Unreachable after wildcard case
Guidance
Reorder or delete subsumed patterns so every case is reachable. See Pattern Matching Support.
Imports Diagnostics
Import diagnostics enumerate issues with module resolution, missing files, and configurable severities for unresolved imports.
circular-import – CircularImport
Example
# module_a.py
from module_b import helper
# module_b.py
from module_a import helper # completes a cycle
Guidance
Break the cycle by moving shared code into a third module, deferring imports inside functions, or rethinking module boundaries
Severity comes from [diagnostics.circular_imports] in Configuration.
missing-module – MissingModule
Example
import backend.plugins.payments # File or package missing from workspace/stubs
Guidance
Add the module to the workspace, fix typos, or adjust your import path. Beacon reports missing modules as errors because runtime execution would fail immediately.
unresolved-import – UnresolvedImport
Example
from services import codecs # services module exists, codecs submodule does not
Guidance
Fix the module path, add missing files, or install the dependency.
Severity is controlled by [diagnostics.unresolved_imports] in Configuration.
Scope Diagnostics
Scope diagnostics call out variable shadowing inside nested scopes.
shadowed-variable – ShadowedVariable
Example
token = "outer"
def handler():
token = "inner" # Shadows outer variable
Guidance
Rename inner variables or move logic closer to usage to avoid surprising shadowing. The static analyzer describes its scope walk in Control & Data Flow.
Name Resolution Diagnostics
Name resolution diagnostics highlight names that were never defined anywhere in the file or workspace.
undefined-variable – UndefinedVariable
Example
print(total) # `total` never defined
Guidance
Define the name, import it, or limit the scope where it’s used.
Unlike use-before-def, this check runs at the file level via Analyzer::find_unbound_variables (see Static Analyzer).
Data Flow Diagnostics
Data-flow diagnostics track unreachable blocks and variable usage ordering issues.
unreachable-code – UnreachableCode
Example
def foo():
return 42
print("never runs") # Unreachable
Guidance
Remove or refactor unreachable statements.
Diagnostics carry the UNNECESSARY tag so editors can gray out the code.
Pipeline details sit in Control & Data Flow.
unused-variable – UnusedVariable
Example
def process():
result = compute() # Never read later
Guidance
Use the variable, prefix with _ to mark as intentionally unused, or delete it.
See Control & Data Flow for how Beacon tracks reads/writes.
use-before-def – UseBeforeDef
Example
def build():
print(total)
total = 10 # total read before assignment in this scope
Guidance
Reorder statements so assignments precede reads, or mark outer-scope variables as nonlocal/global when appropriate.
Data-flow analysis is described in Control & Data Flow.
Type Checking
Beacon provides powerful static type checking for Python code, combining the rigor of Hindley-Milner type inference with the flexibility needed for Python's dynamic features.
Suppressing Type Errors
Type checking errors can be suppressed using inline comments:
x: int = "string" # type: ignore
value: str = 42 # type: ignore[assignment] # Suppress specific error type
See Suppressions for complete documentation on suppression comments.
Type System Philosophy
Beacon's type checker is designed with a core principle: context-aware strictness. It maintains strong type safety for genuinely unsafe operations while being permissive for common, safe Python patterns.
Design Goals
- High Signal-to-Noise Ratio: Report errors that matter, not false positives from valid Python code
- Catch Real Bugs: Focus on type mismatches that lead to runtime errors
- Support Gradual Typing: Work seamlessly with both annotated and unannotated code
- Python-First Semantics: Understand Python idioms rather than forcing ML-style patterns
Union and Optional Types
How Union Types Work
Union types represent values that can be one of several types. Beacon treats union types using subtyping semantics rather than strict structural equality.
# This is valid - None is a member of Optional[int]
def get_value() -> int | None:
return None # No error
# Union members work naturally
x: int | str = 42 # int is a subtype of int | str
y: int | str = "hello" # str is a subtype of int | str
Optional Types
Optional[T] is syntactic sugar for Union[T, None]. Beacon understands that None is a valid value for Optional types without requiring explicit checks:
from typing import Optional
def process(value: Optional[str]) -> None:
# Assigning None to Optional is always valid
result: Optional[str] = None # No error
Type Narrowing
While union types are permissive for assignment, accessing attributes or calling methods requires narrowing:
def process(value: int | None) -> int:
# Error: None doesn't have __add__
return value + 1
def process_safe(value: int | None) -> int:
if value is None:
return 0
# value is narrowed to int here
return value + 1 # OK
Beacon provides several narrowing mechanisms:
- None Checks:
if x is None/if x is not None - isinstance() Guards:
if isinstance(x, int) - Truthiness:
if xnarrows away None and falsy values - Type Guards: User-defined type guard functions
- Match Statements: Pattern matching with exhaustiveness checking
Subtyping vs Unification
Beacon's type checker uses two complementary mechanisms:
Unification (Strict)
Used for non-union types. Requires structural equality:
x: int = 42
y: str = x # Error: cannot unify int with str
Subtyping (Flexible)
Used when union types are involved. Checks semantic compatibility:
x: int = 42
y: int | str = x # OK: int <: int | str
z: int | str | None = None # OK: None <: int | str | None
This hybrid approach provides:
- Strictness where it matters: Direct type mismatches are caught
- Flexibility for unions: Common patterns like Optional work naturally
Type Inference
Beacon infers types even without annotations:
def add(x, y):
return x + y
# Inferred type: (int, int) -> int or (str, str) -> str
# (overloaded based on usage)
numbers = [1, 2, 3]
# Inferred type: list[int]
Value Restriction
Beacon applies the value restriction to prevent unsafe generalization:
empty_list = [] # Type: list[Never] - cannot generalize
# Must provide type hint for empty collections:
numbers: list[int] = [] # Type: list[int]
Special Types
Any
Any is the escape hatch for truly dynamic code. It unifies with all types without errors:
from typing import Any
def dynamic_operation(x: Any) -> Any:
return x.anything() # No type checking
Use Any sparingly - it disables type checking for that value.
Never
Never represents impossible values or code paths that never return:
def unreachable() -> Never:
raise RuntimeError("Never returns")
def example(x: int) -> int:
if x < 0:
unreachable()
return x # Type checker knows we only reach here if x >= 0
Top (⊤)
Top is the supertype of all types. It appears in generic bounds and protocol definitions but is rarely used directly.
Flow-Sensitive Type Narrowing
Beacon tracks type information through control flow:
def process(x: int | str | None) -> int:
if x is None:
return 0
# x: int | str here
if isinstance(x, int):
return x
# x: str here
return len(x) # OK: x is definitely str
This works with:
- If statements
- While loops
- Match statements
- Try-except blocks
- Boolean operators (
and,or)
Exhaustiveness Checking
Match statements and if-elif chains are checked for exhaustiveness:
def handle(x: bool) -> str:
match x:
case True:
return "yes"
case False:
return "no"
# OK: all cases covered
def incomplete(x: int | str) -> str:
if isinstance(x, int):
return "number"
# Warning: str case not handled
Generic Types
Beacon supports generic types with type parameters:
from typing import TypeVar, Generic
T = TypeVar('T')
class Box(Generic[T]):
def __init__(self, value: T) -> None:
self.value = value
def get(self) -> T:
return self.value
# Type inference works:
int_box = Box(42) # Box[int]
str_box = Box("hello") # Box[str]
Protocols
Beacon supports structural typing through protocols:
from typing import Protocol
class Drawable(Protocol):
def draw(self) -> None: ...
def render(obj: Drawable) -> None:
obj.draw() # OK if obj has draw() method
class Circle:
def draw(self) -> None:
print("drawing circle")
render(Circle()) # OK: Circle satisfies Drawable protocol
Common Patterns
Optional Chaining
from typing import Optional
def get_name(user: Optional[dict]) -> Optional[str]:
if user is None:
return None
return user.get("name") # Type checker knows user is dict
Union Type Discrimination
def process(value: int | list[int]) -> int:
if isinstance(value, int):
return value
return sum(value) # value is list[int] here
Type Guard Functions
from typing import TypeGuard
def is_str_list(val: list) -> TypeGuard[list[str]]:
return all(isinstance(x, str) for x in val)
def process(items: list[int | str]) -> None:
if is_str_list(items):
# items: list[str] here
print(",".join(items)) # OK
Error Messages
Beacon provides context-aware error messages:
- String/Int Mixing: Suggests explicit conversion
- None Errors: Explains Optional types and None checks
- Union Errors: Shows which union branches failed and why
- Collection Mismatches: Identifies list vs dict vs tuple confusion
Error messages focus on actionable fixes rather than type theory jargon.
Configuration
Type checking strictness can be controlled via beacon.toml:
[analysis]
# Warn when Any is used (default: false)
warn-on-any = true
# Strict mode: disallow implicit Any (default: false)
strict = false
# Report unused variables (default: true)
unused-variables = true
Best Practices
- Use Optional for nullable values:
Optional[T]is clearer thanT | Nonefor function signatures - Narrow before use: Check for None before accessing attributes
- Leverage type guards: Create reusable type narrowing functions
- Avoid Any: Use Union types or Protocol types for flexibility
- Add type hints to empty collections: Help inference with
list[int]()instead of[] - Trust the type checker: If it says a path is unreachable, it probably is
When Type Checking Fails You
Sometimes the type checker can't infer what you know is true. Use these escape hatches:
from typing import cast, Any
# cast: Assert a type without runtime check
value = cast(int, get_dynamic_value())
# Any: Disable type checking
dynamic: Any = get_unknown_type()
# Type ignore comment (use sparingly)
result = complex_operation() # type: ignore
# Assert narrowing
x: int | None = get_value()
assert x is not None
# x: int here (type checker understands assert)
Use these sparingly and document why the type checker needs help.
Type Checking Modes
Beacon supports three type checking modes that let you balance type safety with development flexibility: strict, balanced, and relaxed.
Configuration
Set the mode in your beacon.toml (see config documentatation for more information):
[type_checking]
mode = "balanced" # or "strict" or "relaxed"
Override the mode for specific files using a comment directive at the top of the file:
# beacon: mode=strict
def calculate(x: int, y: int) -> int:
return x + y
Mode Comparison
| Feature | Strict | Balanced | Relaxed |
|---|---|---|---|
| Annotation mismatches | Error | Warning | Hint |
| Missing annotations (inferred) | Error | Warning | Silent |
| Implicit Any | Error | Warning | Silent |
| Bare except clauses | Error | Allowed | Allowed |
| Class attribute annotations | Required | Optional | Optional |
Strict Mode
Enforces complete type annotation coverage with no type inference allowed. All function parameters and return types must have explicit type annotations.
Characteristics:
- All annotation mismatches are errors
- All function parameters must have explicit type annotations (ANN007)
- All function return types must have explicit type annotations (ANN008)
- All class attributes must have explicit type annotations (ANN009)
- Bare
except:clauses are forbidden (ANN010) - Missing annotations are treated as implicit
Any, which is forbidden in strict mode - Type inference is not allowed as a substitute for explicit annotations
- Best for greenfield projects, type-safe libraries, and critical components
Example:
# beacon: mode=strict
# ✓ Valid - fully annotated
def process(data: list[int]) -> int:
total: int = sum(data)
return total
# ✗ Error - missing return type annotation (ANN008)
# Even though the return type could be inferred as int
def calculate(x: int, y: int):
return x + y
# ✗ Error - parameter 'first' missing annotation (ANN007)
# Strict mode requires explicit annotations, no inference
def format_name(first, last: str) -> str:
return f"{first} {last}"
# ✗ Error - both parameters and return type missing (ANN007, ANN008)
# Strict mode requires all annotations, even when types could be inferred
def add(x, y):
return x + y
# Class attributes also require explicit annotations in strict mode
class Config:
# ✗ Error - class attribute missing annotation (ANN009)
host = "localhost"
# ✓ Valid - annotated class attribute
port: int = 8080
# Exception handling requires specific exception types
def process() -> int:
try:
return risky_operation()
except: # ✗ Error - bare except not allowed (ANN010)
return -1
def safe_process() -> int:
try:
return risky_operation()
except (ValueError, TypeError): # ✓ Valid - specific exception types
return -1
Balanced Mode
Provides helpful warnings while allowing gradual type annotation adoption. Distinguishes between concrete inferred types and implicit Any to guide annotation efforts.
Characteristics:
- Annotation mismatches are warnings (not errors)
- Missing annotations with concrete inferred types trigger warnings showing the inferred type
- Implicit Any types (unresolvable inference) trigger warnings to identify ambiguous cases
- Allows mixing annotated and unannotated code (gradual typing)
- Ideal for incrementally adding types to existing projects
Example:
# beacon: mode=balanced
# ✓ No warnings - fully annotated
def process(data: list[int]) -> int:
return sum(data)
# ⚠ Warning ANN006 - missing return type annotation (inferred as int)
# Suggestion includes the inferred type to guide annotation
def calculate(x: int, y: int):
return x + y
# ⚠ Warning ANN004 - parameter 'first' missing annotation (inferred as str)
# Type can be inferred from usage context
def format_name(first, last: str) -> str:
return f"{first} {last}"
# ⚠ Warning ANN011 - parameter 'data' has implicit Any type
# ⚠ Warning ANN012 - return type is implicit Any
# Type inference couldn't determine concrete types
def process_unknown(data, options):
return data
# ⚠ Warning ANN011 - parameter 'b' has implicit Any type
# Gradual typing: warns only on unannotated parameter
def mixed_params(a: int, b, c: int) -> int:
return a + b + c
Relaxed Mode
Minimally intrusive type checking focused on explicit mismatches.
Characteristics:
- Only explicit annotation mismatches produce hints
- Missing annotations are silent
- Maximum flexibility for exploration and legacy code
- Useful for initial type system adoption
Example:
# beacon: mode=relaxed
# ✓ No diagnostics
def process(data):
return sum(data)
# ℹ Hint only - annotation doesn't match inference (ANN001)
def calculate(x: int, y: int) -> str: # Returns int, not str
return x + y
# ✓ No diagnostics - missing annotations are allowed
def format_name(first, last):
return f"{first} {last}"
Annotation Coverage Diagnostics
Beacon validates type annotations against inferred types and reports missing annotations based on the active mode.
Diagnostic Codes
| Code | Description | Strict | Balanced | Relaxed |
|---|---|---|---|---|
| ANN001 | Annotation mismatch on assignments | Error | Warning | Hint |
| ANN002 | Missing annotation on assignments | Error | Warning | Silent |
| ANN003 | Parameter annotation mismatch | Error | Warning | Hint |
| ANN004 | Missing parameter annotation (inferred type is concrete) | - | Warning | Silent |
| ANN005 | Return type annotation mismatch | Error | Warning | Hint |
| ANN006 | Missing return type annotation (inferred type is concrete) | - | Warning | Silent |
| ANN007 | Parameter missing annotation (strict mode) | Error | - | - |
| ANN008 | Return type missing annotation (strict mode) | Error | - | - |
| ANN009 | Class attribute missing annotation | Error | - | - |
| ANN010 | Bare except clause without exception type | Error | - | - |
| ANN011 | Parameter has implicit Any type | - | Warning | - |
| ANN012 | Return type has implicit Any type | - | Warning | - |
Strict Mode: All missing parameter and return type annotations trigger ANN007/ANN008 errors respectively. Class attributes without annotations trigger ANN009 errors. Bare except clauses trigger ANN010 errors.
Balanced Mode: Distinguishes between concrete inferred types and implicit Any:
- ANN004/ANN006: Missing annotations where type inference determined a concrete type (warns with suggested type)
- ANN011/ANN012: Missing annotations where type inference resulted in implicit Any (warns about ambiguity)
See complete diagnostic codes documentation for more information
Type Inference and Implicit Any
After constraint solving, Beacon finalizes any unresolved type variables as Any enabling balanced mode to distinguish between:
- Concrete inferred types: Type inference successfully determined a specific type (int, str, list, etc.)
- Implicit Any: Type inference couldn't resolve to a concrete type due to insufficient context
- Active type variables: Still in the inference process (no diagnostic yet)
Diagnostic behavior:
- Strict mode: All missing annotations are errors (ANN007/ANN008), regardless of inference
- Balanced mode: Warns on both concrete inferred types (ANN004/ANN006) and implicit Any (ANN011/ANN012)
- Relaxed mode: Silent on missing annotations, only hints on explicit mismatches
- NoneType returns: No diagnostic for procedures with implicit None return (void functions)
Example:
# beacon: mode=balanced
# ⚠ Warning ANN004/ANN006 - concrete inferred type (int)
# Suggestion shows the inferred type
def add(x: int, y: int):
return x + y # Inferred as int
# ⚠ Warning ANN011/ANN012 - implicit Any
# Type inference couldn't determine concrete type
def process(data):
return transform(data) # Unknown transform behavior
# ✓ No diagnostic - procedure with None return
def log_message(msg: str):
print(msg) # Inferred as None, no warning needed
Logging and Observability
Beacon uses structured logging via the tracing ecosystem to provide comprehensive observability for both development and production environments.
Architecture
The logging infrastructure is built on three components:
- Core Logging Module (
beacon-core::logging) - Centralized configuration and initialization - LSP Server Instrumentation - Protocol events, file analysis, and type inference logging
- CLI Logs Command - Real-time log viewing and filtering
Local Development
Running with Logs
Start the LSP server with full tracing enabled:
RUST_LOG=trace cargo run --bin beacon-lsp
Logs are written to two destinations:
- File:
logs/lsp.log(daily rotating, persistent) - stderr: Immediate console output during development
Log Levels
Set the log level using the RUST_LOG environment variable:
# Only errors and panics
RUST_LOG=error cargo run --bin beacon-lsp
# Warnings and errors
RUST_LOG=warn cargo run --bin beacon-lsp
# High-level events (default for releases)
RUST_LOG=info cargo run --bin beacon-lsp
# Detailed operation logs (recommended for development)
RUST_LOG=debug cargo run --bin beacon-lsp
# Full verbosity including protocol messages
RUST_LOG=trace cargo run --bin beacon-lsp
Module-Specific Filtering
Target specific modules for detailed logging:
# Debug level for LSP, trace for analysis
RUST_LOG=beacon_lsp=debug,beacon_lsp::analysis=trace cargo run --bin beacon-lsp
# Trace constraint generation only
RUST_LOG=beacon_constraint=trace cargo run --bin beacon-lsp
Watching Logs
CLI Logs Command
View logs in real-time using the debug logs command:
# Show all current logs
beacon debug logs
# Follow mode - continuously watch for new entries
beacon debug logs --follow
# Filter by pattern (regex supported)
beacon debug logs --follow --filter "ERROR|WARN"
# Filter to specific module
beacon debug logs --follow --filter "analysis"
# Use custom log file
beacon debug logs --follow --path /custom/path/to/log.txt
Log Output Format
Logs are colorized by level for easy scanning:
- ERROR - Bright red
- WARN - Yellow
- INFO - White
- DEBUG - Cyan
- TRACE - Dimmed
Example output:
2025-11-08T12:15:42Z [INFO] beacon_lsp: Starting Beacon LSP server version="0.1.0"
2025-11-08T12:15:43Z [DEBUG] beacon_lsp::backend: Received initialize request root_uri=Some(file:///workspace)
2025-11-08T12:15:44Z [INFO] beacon_lsp::analysis: Starting analysis uri="file:///workspace/main.py"
2025-11-08T12:15:44Z [DEBUG] beacon_lsp::analysis: Generating constraints uri="file:///workspace/main.py"
2025-11-08T12:15:45Z [INFO] beacon_lsp::analysis: Analysis completed uri="file:///workspace/main.py" duration_ms=142 type_count=87 error_count=0
What Gets Logged
Protocol Events
All LSP protocol requests and notifications are logged at appropriate levels:
initialize,shutdown- INFOtextDocument/didOpen,didChange,didClose- INFOworkspace/didChangeConfiguration- INFO- Diagnostics publishing - DEBUG
- Feature requests (hover, completion, etc.) - TRACE
File Analysis
The analysis pipeline logs key stages:
- Analysis Start (INFO) - URI, timestamp
- Document Retrieval (DEBUG) - Version, source length, scope count
- Cache Hit/Miss (DEBUG) - Whether cached results are used
- Constraint Generation (DEBUG) - Number of constraints generated
- Solver Execution (DEBUG) - Constraint count, solver invocation
- Solver Completion (INFO) - Type error count
- Analysis Completion (INFO) - Duration, type count, error count
Example analysis sequence:
INFO Starting analysis uri="file:///app/main.py"
DEBUG Retrieved document data uri="file:///app/main.py" version=3 source_length=1247 scopes=8
DEBUG Generating constraints uri="file:///app/main.py"
DEBUG Constraints generated, starting solver uri="file:///app/main.py" constraint_count=142
INFO Constraint solving completed uri="file:///app/main.py" type_error_count=0
INFO Analysis completed uri="file:///app/main.py" version=3 duration_ms=89 type_count=142 error_count=0
Type Inference
Constraint generation and solving steps are logged:
- Constraint count per file
- Solver initialization
- Type errors encountered
- Substitution application
Workspace Operations
Multi-file workspace operations:
- Workspace indexing start/completion
- Dependency updates
- Module invalidation
- Re-analysis of affected modules
Configuration Changes
Configuration loading and hot-reload:
- Config file discovery
- Validation warnings
- Runtime updates
- Mode changes (strict/balanced/relaxed)
Error Handling
Panic Logging
Panics are automatically captured and logged before termination:
#![allow(unused)] fn main() { PANIC at src/analysis/mod.rs:245:12: Type variable unification failed unexpectedly }
The panic hook logs:
- Panic message
- File location (file:line:column)
- Payload details
Error Propagation
Errors are logged at appropriate points with context:
ERROR Failed to open document uri="file:///invalid.py" error="File not found"
ERROR Failed to update document uri="file:///app.py" error="Invalid UTF-8 in document"
Environment Variables
RUST_LOG
Controls log level filtering using the env_filter syntax:
# Global level
RUST_LOG=debug
# Per-module levels
RUST_LOG=info,beacon_lsp::analysis=trace
# Multiple modules
RUST_LOG=beacon_lsp=debug,beacon_constraint=trace,tower_lsp=warn
LSP_LOG_PATH
Override the default log file location:
LSP_LOG_PATH=/var/log/beacon/custom.log cargo run --bin beacon-lsp
Default: logs/lsp.log
Output Formats
Text (Default)
Human-readable format with timestamps, levels, and targets:
2025-11-08T12:15:42.123Z INFO beacon_lsp::backend: Server initialized version="0.1.0"
JSON
Structured JSON output for machine parsing (configurable via LogFormat::Json):
{
"timestamp": "2025-11-08T12:15:42.123Z",
"level": "INFO",
"target": "beacon_lsp::backend",
"message": "Server initialized",
"version": "0.1.0"
}
Release
In production, logging defaults to WARNING level:
- Minimal performance overhead
- Only errors and warnings logged
- File rotation prevents unbounded growth
- Daily rotation with date-stamped files
Log Rotation
Logs automatically rotate daily:
- Current log:
logs/lsp.log - Rotated logs:
logs/lsp.log.2025-11-07,logs/lsp.log.2025-11-06, etc.
Coverage Gaps
The following areas have limited or no logging coverage (documented in roadmap):
- Detailed symbol resolution tracing
- Per-module diagnostic generation
- Configuration hot-reload events (currently limited)
- WebSocket/TCP transport logging (when using
--tcp) - Stub cache operations and introspection queries
Testing Strategy
Beacon’s LSP crate includes both unit tests and async integration tests to ensure feature behaviour remains stable as the analyzer evolves.
Provider Unit Tests
Each feature module embeds targeted tests that construct in-memory documents via DocumentManager::new().
Common scenarios include rename edits across nested scopes, workspace symbol searches, and diagnostic generation for simple errors.
Because providers operate on real ASTs and symbol tables, these tests exercise production logic without needing a running language server.
Backend Integration Tests
Async tests spin up an in-process tower_lsp::LspService<Backend> to simulate client interactions.
They call methods like initialize, did_open, did_change, hover, and completion, asserting that responses match expectations and no panics occur.
This pattern verifies protocol wiring, capability registration, and shared state management without external tooling.
Command-line Checks
cargo check and cargo check --tests are run frequently for quick feedback.
cargo fmt --check enforces formatting consistency across Rust code.
Documentation changes are validated with mdbook build docs to catch broken links or syntax errors.
Current Limitations
The Beacon language server already covers core workflows but still has notable constraints. Understanding these limitations helps set expectations for contributors and users.
Open-Document Focus
Most features only inspect documents currently open in the editor.
Closed files are invisible until workspace indexing is implemented, so cross-project references or renames may miss targets.
Analyzer Coupling
Rename and references rely on a mix of AST traversal and simple heuristics; deep semantic queries across modules are not yet available.
Analyzer caches are invalidated wholesale after edits. Incremental typing work is on the roadmap but not implemented.
Performance
Tree-sitter reparses the entire document per change. While acceptable for small files, large modules may benefit from incremental parsing.
Workspace symbol searches iterate synchronously over all open documents, which can lag in large sessions.
Feature Gaps
Code actions support basic quick fixes (removing unused variables/imports, wrapping types with Optional) but many advanced refactorings remain unimplemented.
Formatting endpoints (textDocument/formatting, etc.) are unimplemented.
Configuration (Config) is still a stub and does not honor user settings.
Tooling Ergonomics
Error messages from the analyzer can be terse; improving diagnostics and logs is part of future work.
There is no persistence of analysis results across sessions, so large projects require recomputation on startup.
Next Steps
The following projects are planned to evolve Beacon’s language server from a solid MVP into a full-featured development companion.
Analyzer Integration
Tighten the connection between the LSP and analyzer so rename, references, and completions can operate across modules.
Cache analyzer results to avoid repeated full reanalysis after every edit.
Surface richer hover information (e.g., inferred types with provenance, docstrings).
Workspace Indexing
Build a background indexer that scans the workspace root, populating symbol data for unopened files.
Add file watchers to refresh indexes when on-disk files change outside the editor.
Support multi-root workspaces and remote development scenarios.
Tooling Enhancements
Implement formatting (textDocument/formatting, rangeFormatting) and integrate with Beacon's formatting rules.
Expand code actions beyond the current quick fixes (remove unused, wrap with Optional) to include:
- Insert type annotations from inference
- Add missing imports for undefined symbols
- Implement missing protocol methods
- Extract to function/method refactorings
- Inline variable refactorings
Extend semantic tokens with modifier support (documentation, deprecated symbols) and align with editor theming.
Performance & Reliability
Adopt Tree-sitter’s incremental parsing to reduce reparse costs for large files.
Improve logging and telemetry so users can diagnose performance issues or protocol errors.
Harden handling of unexpected client input, ensuring the server degrades gracefully.
Documentation & Ecosystem
Publish editor-specific setup guides (VS Code, Neovim, Helix, Zed) alongside troubleshooting tips.
Automate documentation deployment (see deploy-docs workflow) and version docs with releases.
Encourage community extensions by documenting provider APIs and expected invariants.
Development Quick Start
Installation
Build from source:
cargo build --release
The CLI will be available at target/release/beacon.
Install system-wide:
cargo install --path crates/cli
This installs the beacon binary to ~/.cargo/bin.
Type Checking
Check Python files for type errors using Hindley-Milner inference:
# Check a file
beacon typecheck example.py
# Check with JSON output for CI
beacon typecheck --format json example.py
# Check from stdin
cat example.py | beacon typecheck
Language Server
Install beacon-lsp system-wide:
cargo install --path crates/server
This installs the beacon-lsp binary to ~/.cargo/bin, making it available in your PATH.
Start the LSP server for editor integration:
beacon-lsp
Or use the CLI:
beacon lsp
For debugging, start with file logging:
beacon lsp --log-file /tmp/beacon.log
Debug Tools
Debug builds include additional tools for inspecting the type system:
# Build in debug mode
cargo build
# View tree-sitter CST
target/debug/beacon debug tree example.py
# Show AST with inferred types
target/debug/beacon debug ast example.py
# Display generated constraints
target/debug/beacon debug constraints example.py
# Show unification results
target/debug/beacon debug unify example.py
Note: Debug commands are only available in debug builds (compiled with cargo build), not in release builds.
Full documentation: CLI Tools
Editor Extensions
Beacon supports VS Code, Zed, and Neovim through the Language Server Protocol.
See Editor Extensions Documentation for setup instructions.
Quick Links:
Project Structure
.
├─ crates/
│ ├─ cli/ # `beacon-cli` entry point with clap
│ ├─ server/ # `beacon-lsp` LSP server (tower-lsp or raw) using lsp-types
│ ├─ core/ # `beacon-core` type definitions, solver, unifier
│ ├─ constraints/ # `beacon-constraint` constraint generation
│ └─ parser/ # `beacon-parser` tree-sitter Python adapter
└── pkg/ # Editor extensions & plugins
Typeshed Integration
Beacon integrates Python standard library type stubs from the official python/typeshed repository. The stubs are embedded at build time using a git submodule, providing version-controlled, reproducible type information for the analyzer.
Current Version
Typeshed stubs are tracked as a git submodule at typeshed/. To check the current version:
cd typeshed
git log -1 --format='%H %ci %s'
The submodule points to stormlightlabs/typeshed-stdlib-mirror, which provides a flattened mirror of typeshed's stdlib and _typeshed directories.
Stub Lookup Architecture
Beacon uses a layered stub resolution system with the following priority order:
- Manual stubs - Configured via
config.stub_paths(highest priority) - Stub packages - Directories matching
*-stubspattern - Inline stubs -
.pyifiles located alongside.pyfiles - Typeshed stubs - Embedded stdlib stubs (fallback)
Builtins are loaded upfront during initialization. Other modules are loaded on-demand during constraint generation when imports are encountered.
Updating Typeshed
The typeshed submodule can be updated to pull in newer stub definitions from upstream.
Check Available Updates
View recent commits from python/typeshed:
cd typeshed
./scripts/metadata.sh --limit 10
Options for filtering commits:
--since DATE- Show commits after date (YYYY-MM-DD)--until DATE- Show commits before date (YYYY-MM-DD)--author NAME- Filter by GitHub username or email--grep PATTERN- Search commit messages--sha-only- Output only commit SHAs
Update to Specific Version
Fetch stubs from a specific python/typeshed commit:
cd typeshed
./scripts/fetch.sh <commit-sha>
This fetches the specified commit, flattens the stdlib and _typeshed directories into stubs/, and creates COMMIT.txt with metadata.
Update to Latest
Fetch and commit the latest typeshed version:
cd typeshed
./scripts/metadata.sh --limit 1 --sha-only | xargs ./scripts/fetch.sh
./scripts/commit.sh
cd ..
git add typeshed
git commit -m "chore: update typeshed submodule"
Manual Commit
After fetching stubs, commit changes manually:
cd typeshed
git add stubs COMMIT.txt
git commit -m "Bump typeshed stdlib to <commit-sha>"
git push
cd ..
git add typeshed
git commit -m "chore: update typeshed submodule"
Build Integration
Typeshed stubs are embedded into the Beacon binary at compile time. The build process:
- Reads stub files from
typeshed/stubs/directory - Embeds them using Rust's
include_str!macro - Makes stubs available via
get_embedded_stub(module_name)API
No runtime network access or file system dependency is required for stdlib type information.
Custom Beacon Stubs
Beacon-specific stubs that extend or override standard library behavior are kept in crates/server/stubs/:
capabilities_support.pyi- Beacon-specific protocol definitions
These stubs have higher priority than embedded typeshed stubs due to the layered lookup system.
Testing
Stub resolution is tested through:
- Unit tests - Verify layered stub lookup and module resolution
- Integration tests - Validate stdlib type checking with typeshed stubs
- Analyzer tests - Check method resolution through inheritance chains
Test fixtures that require stub files are located in crates/server/tests/fixtures/.
Related Documentation
Benchmarking
Beacon uses Criterion for performance benchmarking across critical paths.
Running Benchmarks
Execute all benchmarks:
cargo bench
Run specific benchmark suite:
cargo bench --bench type_inference
cargo bench --bench parser_benchmark
cargo bench --bench lsp_handlers
Criterion generates HTML reports in target/criterion/ with detailed statistics, plots, and regression detection.
Benchmark Suites
Type Inference (beacon-core)
Located at crates/core/benches/type_inference.rs, this suite tests the core type system:
- Simple Unification: Concrete types and type variables
- Generic Types: Lists, dictionaries with varying complexity
- Function Types: Simple, multi-argument, and generic functions
- Nested Types: Lists nested to varying depths
- Substitution: Composition and application with different sizes
Parser Performance (beacon-server)
Located at crates/server/benches/parser_benchmark.rs, tests parse operations:
- File Size Scaling: Small, medium, and large Python files
- AST Construction: Parse tree to AST transformation
- Incremental Reparsing: Performance of incremental updates
- Symbol Tables: Generation across different file sizes
LSP Handlers (beacon-server)
Located at crates/server/benches/lsp_handlers.rs, measures LSP operation latency:
- Hover: Variable and function hover info generation
- Completion: Attribute completion and in-body completion
- Go to Definition: Navigation for variables, functions, and classes
- Combined Operations: Parse + hover to measure end-to-end cost
Adding Benchmarks
Create new benchmark files in crates/{crate}/benches/ and register in Cargo.toml:
[[bench]]
name = "my_benchmark"
harness = false
Use Criterion's parametric benchmarks to test performance across different input sizes or scenarios.
Interpreting Results
Criterion provides:
- Throughput and iteration time statistics
- Confidence intervals
- Regression detection against previous runs
- Visual plots in HTML reports
Monitor the benchmark reports to catch performance regressions during development.
Tracing & Observability
Beacon uses the tracing ecosystem for structured logging and instrumentation.
Log Levels
- error: Critical failures that prevent functionality
- warn: Recoverable issues requiring attention
- info: High-level lifecycle events and milestones
- debug: Detailed operational information
- trace: Fine-grained execution details
Enabling Logs
Set RUST_LOG environment variable:
# All debug logs
RUST_LOG=debug beacon lsp
# Module-specific logging
RUST_LOG=beacon_lsp=debug,beacon_core=info beacon lsp
# Trace-level for specific module
RUST_LOG=beacon_lsp::analysis=trace beacon lsp
Instrumentation Points
LSP Lifecycle
Key events in backend.rs:
- Server initialization and configuration loading
- Document lifecycle (open, change, close)
- Shutdown requests
Document Processing
In document.rs:
- Mode directive detection from file comments
- Parse and reparse operations
Analysis Pipeline
Stages in analysis/mod.rs:
- Constraint generation
- Type inference execution
- Analysis result caching
Diagnostic Generation
Each diagnostic phase in features/diagnostics.rs:
- Parse errors
- Linter diagnostics
- Type checking errors
- Unsafe Any warnings
- Import validation
- Variance checking
Caching
Cache operations across cache.rs:
- Hit/miss tracking for type, introspection, and analysis caches
- Invalidation events
- Import dependency tracking
Workspace Operations
In workspace.rs:
- File discovery and indexing
- Stub loading from typeshed and configured paths
- Dependency graph construction
Handler Instrumentation
LSP request handlers use #[tracing::instrument] macro for automatic span creation:
#![allow(unused)] fn main() { #[tracing::instrument(skip(self), level = "debug")] async fn completion(&self, params: CompletionParams) -> Result<Option<CompletionResponse>> { // Automatically logs entry/exit with debug level } }
Typechecker
The typechecker implements a Hindley-Milner type inference engine with extensions for Python's type system. It performs constraint-based type inference with gradual typing support through the Any type.
How It Works
The typechecker operates in five phases:
- Parse source code into an Abstract Syntax Tree
- Resolve symbols and build scopes
- Generate type constraints by walking the AST
- Solve constraints using unification
- Apply final type substitutions
The core algorithm uses Robinson's unification with an occurs check, extended with Python-specific features like union types, protocols, and row-polymorphic records.
┌─────────────┐
│Source Code │
└──────┬──────┘
│
▼
┌─────────────┐
│ Parser │
└──────┬──────┘
│
▼
┌─────────────────────┐
│ AST + Symbol Table │
└──────────┬──────────┘
│
▼
┌──────────────────────┐
│ Constraint Generator │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Constraint Set │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Constraint Solver │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Unifier │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Type Substitutions │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Type Map │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Type Errors │
└──────────────────────┘
Type System
The type system supports:
- Type variables with variance annotations
- Type constructors with kind checking
- Function types with keyword arguments
- Union types for Python's
|operator - Row-polymorphic records for structural typing
- Protocol types for structural subtyping
- Three special types:
Any(gradual typing),Top(universal supertype),Never(bottom type)
Constraint Generation
The constraint generator walks the AST and produces constraints:
Equal(T1, T2)- types must be identicalHasAttr(T, name, AttrType)- attribute accessCall(FuncType, Args, Result)- function callsProtocol(T, ProtocolName, Impl)- protocol conformanceMatchPattern(T, Pattern, Bindings)- pattern matchingNarrowing(var, predicate, T)- flow-sensitive typingJoin(var, types, T)- control flow merge points
┌──────────┐
│ AST Node │
└─────┬────┘
│
▼
┌───────────┐
│ Node Type │
└─────┬─────┘
┌─────────────┬─────┼───────┬────────────┐
│ │ │ │ │
Variable│ Call │ │ Attr │ If/Match │
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌────────────┐ ┌──────────────┐ ┌──────────────┐
│Lookup Type │ │ Call │ │ HasAttr │
│ │ │ Constraint │ │ Constraint │
└──────┬─────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
│ │ │ ┌──────────────┐
│ │ │ │ Narrowing │
│ │ │ │ Constraint │
│ │ │ └──────┬───────┘
│ │ │ │
└──────────────┴─────────────────┴─────────────────┘
│
▼
┌──────────────┐
│Constraint Set│
└──────────────┘
Constraint Solving
The solver processes constraints in order, applying unification:
- Process equality constraints via unification
- Resolve attribute access using type structure
- Check function call compatibility
- Verify protocol conformance via structural matching
- Handle pattern matching exhaustiveness
- Apply type narrowing in control flow
- Join types at control flow merge points
The unifier maintains a substitution map from type variables to concrete types, applying the occurs check to prevent infinite types.
Limitations
Value restriction prevents generalization of mutable references, which can be overly conservative for some Python patterns.
Protocol checking handles basic structural conformance but doesn't fully support complex inheritance hierarchies with method overriding.
Type narrowing in conditionals provides basic flow-sensitive typing but lacks sophisticated constraint propagation for complex boolean expressions.
Performance degrades on files exceeding 5000 lines, though scope-level caching mitigates this for incremental edits.
The gradual typing Any type bypasses type checking, which can hide errors when overused.
Key Files
crates/
├── core/
│ ├── types.rs # Type system implementation
│ ├── unify.rs # Unification algorithm
│ └── subst.rs # Type substitution
├── constraints/
│ └── solver.rs # Constraint solver
└── analyzer/
└── walker/mod.rs # Constraint generation
Static Analyzer
The static analyzer performs control flow and data flow analysis to detect code quality issues beyond type errors. It builds a Control Flow Graph and runs analyses to find unreachable code, use-before-definition errors, and unused variables.
How It Works
The analyzer operates in three phases:
- Build Control Flow Graph from the AST
- Perform data flow analyses on the CFG
- Generate diagnostics from analysis results
The CFG captures all possible execution paths through a function, including normal flow, exception handling, loops, and early returns.
┌─────────────────────┐
│ AST │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ CFG Builder │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Control Flow Graph │
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Data Flow Analyzer │
└──────────┬──────────┘
│
▼
┌──────────────────────┴──────────────────────┐
│ Analysis Type │
└─┬─────────────────┬──────────────────────┬──┘
│ │ │
┌───────────▼──────┐ ┌───────▼────────┐ ┌──────────▼──────────┐
│ Reachability │ │ Use-Def │ │ Liveness │
└───────────┬──────┘ └───────┬────────┘ └─────────┬───────────┘
│ │ │
│ ┌────────────▼─────────┐ │
└────► Diagnostics ◄───────────┘
└──────────────────────┘
Control Flow Graph
Each function is converted into a CFG with basic blocks and edges:
- Basic blocks contain sequential statements
- Edges represent control flow with kinds: Normal, True, False, Exception, Break, Continue, Finally
- Entry and exit blocks mark function boundaries
- Loop headers have back edges for iteration
┌─────────────────┐
│ Entry Block │
└────────┬────────┘
│
▼
┌────────────────┐
┌───┤ If Statement ├───┐
│ └────────────────┘ │
True │ │ False
▼ ▼
┌────────────┐ ┌────────────┐
│ Then Block │ │ Else Block │
└──────┬─────┘ └──────┬─────┘
│ │
└────────┬───────────────┘
▼
┌─────────────┐
│ Merge Block │
└──────┬──────┘
│
▼
┌─────────────┐
┌───┤ While Loop ├──┐
│ └─────────────┘ │
True │ ▲ │ False
│ │ │
▼ │ ▼
┌─────────┐ │ ┌──────────┐
│Loop Body├─────┘ │Exit Block│
│ ├──────────► │
└─────────┘ Break └──────────┘
Normal│ │Continue
└─────┘
CFG Construction
The builder walks the AST recursively:
- If/Elif/Else creates test blocks with True/False edges to branches, merging at a common successor
- For/While loops create headers with back edges from the body and break edges to exit
- Try/Except creates exception edges from try blocks to each handler
- Return/Raise statements jump to the function exit, passing through finally blocks if present
- Break/Continue statements jump to the appropriate loop boundary
Context tracking maintains loop depth and finally block stacks to generate correct edges.
Data Flow Analyses
Use-Before-Definition detection performs forward analysis, tracking which variables are defined at each point. Any use of an undefined variable generates a diagnostic.
Unreachable code detection marks blocks reachable from the entry via depth-first search. Unreachable blocks produce warnings.
Unused variable detection tracks variable definitions and uses. Variables defined but never read generate warnings.
Hoisting analysis collects function and class definitions that are available before their textual position, matching Python's hoisting semantics.
Limitations
CFG construction is function-scoped only. Module-level control flow graphs are not yet implemented, limiting whole-program analyses.
Exception flow is simplified and doesn't track exception types through the CFG. All exception handlers are treated uniformly.
Generator functions and async/await constructs have limited support. Yield points and async boundaries don't create proper CFG edges.
Class method CFGs don't track inheritance or method resolution order, which limits cross-method data flow analysis.
Key Files
crates/
├── analyzer/src/
│ ├── cfg.rs # Control Flow Graph construction
│ └── data_flow.rs # Data flow analyses
└── server/src/
└── analysis/mod.rs # Analysis orchestration
Formatter
The formatter provides PEP8-compliant code formatting with configurable style options. It transforms Python source code into a consistent style while preserving comments and semantic meaning.
How It Works
The formatter operates in four phases:
- Parse source into AST and extract comments
- Sort imports by category
- Generate token stream from AST
- Apply formatting rules and write output
The formatter uses tree-sitter for parsing and comment extraction, ensuring accurate preservation of all source elements.
┌─────────────┐ ┌────────┐ ┌────────────────┐ ┌───────────────┐
│Source Code ├──►│ Parser ├──►│ AST + Comments ├──►│ Import Sorter │
└─────────────┘ └────────┘ └────────────────┘ └───────┬───────┘
│
▼
┌─────────────────┐ ┌──────────────┐ ┌──────────────────────────┐
│Formatted Output │◄──┤Formatting │◄──┤Token Stream Generator │
│ │ │Writer │ │ │
└─────────────────┘ └──────────────┘ └────────┬─────────────────┘
│
▼
┌─────────────────┐
│ Token Stream │
└─────────────────┘
Import Sorting
Imports are categorized and sorted:
- Future imports from
__future__ - Standard library imports
- Third-party package imports
- Local project imports
Within each category, imports are sorted alphabetically. This matches the style of Black and isort.
┌─────────────┐
│All Imports │
└──────┬──────┘
│
▼
┌─────────────┐
│ Categorize │
└──────┬──────┘
┌────────────────┼────────────────┐
│ │ │
▼ ▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────────┐ ┌───────┐
│ Future │ │ Stdlib │ │Third-Party │ │ Local │
└────┬───┘ └────┬───┘ └──────┬─────┘ └───┬───┘
│ │ │ │
└────────────────┴─────────────────┴─────────────┘
│
▼
┌────────────────────┐
│Sort Alphabetically │
└──────────┬─────────┘
│
▼
┌────────────────────┐
│Formatted Imports │
└────────────────────┘
Token Stream Generation
The AST is converted to a stream of tokens:
- Keywords:
def,class,if,for, etc. - Identifiers: variable and function names
- Literals: strings, numbers, booleans
- Operators:
+,-,==,and, etc. - Delimiters:
(,),[,],:,, - Whitespace: newlines, indents, dedents
Comments are attached to appropriate tokens based on their position in the source.
Formatting Rules
The writer applies formatting rules:
- Indentation uses 4 spaces by default, configurable
- Line length defaults to 88 characters, configurable
- Two blank lines separate top-level definitions
- One blank line separates method definitions
- String quotes normalize to double quotes
- Trailing commas added in multi-line structures
- Operators surrounded by single spaces
- No spaces inside brackets/parentheses
┌──────────────┐
│ Token Stream │
└──────┬───────┘
│
▼
┌──────────────┐
│ Token Type │
└──────┬───────┘
┌──────────────────┼──────────────────┬─────────────┐
│ │ │ │
Indent│ Newline│ String│ Delimiter │
│ │ │ │
▼ ▼ ▼ ▼
┌───────────────┐ ┌──────────────┐ ┌──────────────┐ ┌─────────────┐
│Apply Indent │ │Check Line │ │Normalize │ │Add/Remove │
│Width │ │Length │ │Quotes │ │Spaces │
└───────┬───────┘ └──────┬───────┘ └──────┬───────┘ └──────┬──────┘
│ │ │ │
└─────────────────┴─────────────────┴─────────────────┘
│
▼
┌───────────────┐
│Output Buffer │
└───────┬───────┘
│
▼
┌───────────────┐
│Formatted Text │
└───────────────┘
Caching
The formatter uses two cache layers:
Short-circuit cache checks if the source is already formatted by hashing the output. If the hash matches, formatting is skipped entirely.
Result cache stores formatted output keyed by source hash, configuration, and range. This accelerates repeated formatting of the same code.
Both caches use LRU eviction with configurable size limits.
Configuration
Formatting behavior is controlled via beacon.toml or pyproject.toml:
[tool.beacon.formatting]
line_length = 88
indent_size = 4
normalize_quotes = true
trailing_commas = true
Suppression comments disable formatting:
# beacon: fmt: off
ugly_code = {"a":1,"b":2}
# beacon: fmt: on
Limitations
Comment preservation is best-effort. Complex nested structures with interleaved comments may lose some comments or place them incorrectly.
Line length is a soft limit. Some constructs like long string literals or deeply nested expressions may exceed the configured limit.
Format-on-type is basic and only reformats the current statement plus surrounding context. It doesn't perform whole-file formatting.
The formatter doesn't understand semantic equivalence, so it may format code in ways that change behavior for dynamic features like globals() manipulation.
Key Files
crates/server/src/formatting/
├── mod.rs # Main formatter
├── token_stream.rs # Token generation
├── writer.rs # Output writer
├── rules.rs # Formatting rules
├── import.rs # Import sorting
└── cache.rs # Result caching
Linter
The linter performs static code quality checks through 30 rules covering imports, control flow, naming, style, and type usage. It detects common Python mistakes and enforces best practices.
How It Works
The linter operates in three phases:
- Walk the AST tracking context (function depth, loop depth, imports, etc.)
- Check symbol table for unused imports and undefined names
- Filter diagnostics by suppression comments
Each rule is identified by a code from BEA001 to BEA030.
┌──────────────────────┐
│ AST + Symbol Table │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Linter Context Init │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ AST Visitor │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Node Type │
└─┬──────┬────┬────┬──┘
│ │ │ │
┌───────────┘ │ │ └───────────┐
│ │ │ │
│ Import Loop │ │ Function Name │
│ │ │ │
▼ ▼ ▼ ▼
┌─────────┐ ┌─────────────┐ ┌─────────────┐
│Check │ │Check Loop │ │Check Name │
│Import │ │Rules │ │Rules │
│Rules │ ├─────────────┤ └──────┬──────┘
└────┬────┘ │Check │ │
│ │Function │ │
│ │Rules │ │
│ └──────┬──────┘ │
│ │ │
└────────────────┴──────────────────┘
│
▼
┌──────────────────┐
│ Diagnostics │
└──────────┬───────┘
│
▼
┌──────────────────┐
│Suppression Filter│
└──────────┬───────┘
│
▼
┌──────────────────┐
│Final Diagnostics │
└──────────────────┘
Rule Categories
Import rules check for star imports, unused imports, and duplicate imports.
Control flow rules detect break/continue outside loops, return/yield outside functions, and unreachable code after control flow statements.
Name rules find undefined variables, duplicate arguments, improper global/nonlocal usage, and shadowing of builtins.
Style rules flag redundant pass statements, assert on tuples, percent format issues, forward annotations, bare except clauses, and identity comparisons with literals.
Type rules validate except handler types, detect constant conditionals, check for duplicate branches, find loop variable overwrites, and verify dataclass and protocol patterns.
┌───────┐
│ Rules │
└───┬───┘
┌───────────────┼──────────────────┬─────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌────────────┐ ┌──────────────┐ ┌────────────┐ ┌──────────┐
│ Import │ │Control Flow │ │ Names │ │ Style │
│BEA001-003 │ │ BEA004-008 │ │BEA009-014 │ │BEA015-020│
└────────────┘ └──────────────┘ └────────────┘ └────┬─────┘
│
▼
┌────────────┐
│ Type │
│ BEA021-027 │
└────────────┘
Context Tracking
The linter maintains context while walking the AST:
Function depth tracks nesting level to detect return/yield outside functions. Loop depth tracks loop nesting to validate break/continue. Class depth tracks class nesting for dataclass and protocol checks.
Import tracking records all imported names to detect unused imports. Loop variable tracking identifies which variables are bound by loop iteration. Global and nonlocal declaration tracking validates proper usage.
Assigned variable tracking finds variables written to, enabling unused variable detection. Dataclass and protocol scope tracking identifies decorated classes for specialized rules.
Function Context:
┌────────────────┐
│ Enter Function │
└───────┬────────┘
│
▼
┌──────────────────────┐
│Increment Function │
│Depth │
└───────┬──────────────┘
│
▼
┌──────────────────────┐
│Visit Function Body │
└───────┬──────────────┘
│
├──────────────────────────┐
│ │
▼ ▼
┌──────────────┐ ┌───────────────────┐
│Found Return? │ │Decrement Function │
└───────┬──────┘ │Depth │
│ └───────────────────┘
┌───────┴────────┐
│ │
▼ ▼
┌────────┐ ┌─────────────────────────┐
│OK │ │Error: Return Outside │
│(Depth │ │Function (Depth = 0) │
│> 0) │ │ │
└────────┘ └─────────────────────────┘
Loop Context:
┌────────────────┐
│ Enter Loop │
└───────┬────────┘
│
▼
┌──────────────────────┐
│Increment Loop Depth │
└───────┬──────────────┘
│
▼
┌──────────────────────┐
│Visit Loop Body │
└───────┬──────────────┘
│
├──────────────────────────┐
│ │
▼ ▼
┌──────────────┐ ┌───────────────────┐
│Found Break? │ │Decrement Loop │
└───────┬──────┘ │Depth │
│ └───────────────────┘
┌───────┴────────┐
│ │
▼ ▼
┌────────┐ ┌─────────────────────────┐
│OK │ │Error: Break Outside │
│(Depth │ │Loop (Depth = 0) │
│> 0) │ │ │
└────────┘ └─────────────────────────┘
Suppression
Rules can be suppressed with comments:
# beacon: ignore[BEA001] # Suppress specific rule
from module import *
# beacon: ignore # Suppress all rules on next line
undefined_variable
The suppression map tracks which lines have suppressions and which rules are disabled.
Symbol Table Integration
After AST traversal, the linter checks the symbol table:
Unused imports are detected by finding symbols marked as imported but never referenced. Undefined names are found by checking all name references against the symbol table. Shadowing detection compares local names against Python builtins.
Limitations
Some rules use pattern matching on decorators and don't verify actual inheritance or runtime behavior. For example, dataclass detection looks for the decorator but doesn't confirm the class truly uses the dataclass module.
Constant evaluation is limited to simple literal expressions. Complex constant folding involving functions or dynamic attributes is not supported.
Control flow analysis for unreachable code is basic and may miss some cases that require interprocedural analysis.
Exception type checking in except handlers uses name-based heuristics and doesn't perform full type analysis.
The linter doesn't track data flow across statements, so it may miss patterns like conditional initialization followed by usage.
Key Files
crates/analyzer/src/
├── linter.rs # Main linter implementation
├── rules.rs # Rule definitions
└── const_eval.rs # Constant evaluation
LSP Implementation
The LSP server orchestrates all analysis components and exposes them through the Language Server Protocol. It provides 15+ features including diagnostics, hover, completion, goto definition, and formatting.
How It Works
The server operates as a JSON-RPC service:
- Initialize server with client capabilities
- Manage document lifecycle (open, change, close)
- Respond to feature requests (hover, completion, etc.)
- Publish diagnostics when documents change
- Index workspace for cross-file features
The backend uses tower-lsp for protocol handling and implements the LanguageServer trait.
┌───────────────┐
│ Editor Client │
└──────┬────────┘
│ JSON-RPC
▼
┌─────────────┐
│ LSP Backend │
└──────┬──────┘
┌───────────┼───────────┬────────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌─────────────┐ ┌─────────┐ ┌──────────┐ ┌─────────────────┐
│ Document │ │Analyzer │ │Workspace │ │Feature Providers│
│ Manager │ └────┬────┘ └────┬─────┘ └────┬──────┬─────┘
└──────┬──────┘ │ │ │ │
│ │ │ │ │
┌─────────┼─────────┐ │ │ │ │
│ │ │ │ │ │ │
▼ ▼ ▼ ▼ ▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌────────┐ ┌───────────┐ ┌──────────────────────┐
│didOpen │ │didChange │ │didClose│ │Diagnostics│ │Hover, Completion, │
│ │ │ │ │ │ │ │ │Definition, +13 More │
└──────────┘ └──────────┘ └────────┘ └───────────┘ └──────────────────────┘
Document Management
The document manager tracks all open files:
When a document opens, the manager stores the URI, version, and text. When a document changes, it applies incremental or full text updates. When a document closes, it removes the document from tracking.
Each document is parsed into an AST and symbol table on demand. Parse results are cached until the document changes.
┌─────────┐ ┌────────────────┐ ┌───────┐ ┌───────────┐
│ didOpen ├────►│ Store Document ├────►│ Parse ├────►│ Cache AST │
└─────────┘ └────────────────┘ └───────┘ └───────────┘
▲
│
┌───────────┐ ┌───────────────┐ ┌───┴────────────────┐
│ didChange ├──►│ Apply Changes ├────►│ Invalidate Cache │
└───────────┘ └───────────────┘ └────────────────────┘
┌──────────┐ ┌─────────────────┐ ┌─────────────┐
│ didClose ├───►│ Remove Document ├──►│ Clear Cache │
└──────────┘ └─────────────────┘ └─────────────┘
Analysis Orchestration
The analyzer coordinates all analysis phases:
- Retrieve cached results if available
- Parse source into AST
- Generate and solve type constraints
- Build CFG and run data flow analysis
- Run linter rules
- Cache results by URI and version
- Return combined diagnostics
Caching occurs at multiple levels. Full document analysis is cached by URI and version. Individual function scope analysis is cached by content hash for incremental updates.
┌─────────────────┐
│ Document Change │
└────────┬────────┘
│
▼
┌────────────────┐
│ Cache Hit? │
└───┬────────┬───┘
Yes │ │ No
┌────────┘ └────────┐
│ │
▼ ▼
┌──────────────────┐ ┌──────────────┐
│ Return Cached │ │ Parse AST │
│ Result │ └──────┬───────┘
└──────────────────┘ │
▼
┌──────────────────────┐
│ Generate Constraints │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Solve Constraints │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Build CFG │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Data Flow Analysis │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Run Linter │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Combine Diagnostics │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Cache Result │
└──────────┬───────────┘
│
▼
┌──────────────────────┐
│ Publish Diagnostics │
└──────────────────────┘
Feature Providers
Each LSP feature is implemented by a dedicated provider:
Diagnostics provider combines parse errors, type errors, linter warnings, and static analysis results into a unified diagnostic list.
Hover provider looks up the symbol at the cursor position in the type map and formats type information for display.
Completion provider searches the symbol table and workspace index for completions, filtering by prefix and ranking by relevance.
Goto definition provider resolves the symbol to its definition location using the symbol table and workspace index.
References provider finds all uses of a symbol across the workspace.
Rename provider validates the rename and computes workspace edits.
Code actions provider offers quick fixes for diagnostics like adding imports or suppressing linter rules.
Semantic tokens provider generates syntax highlighting tokens from the AST.
Inlay hints provider shows inferred types inline in the editor.
Signature help provider displays function parameter information during calls.
Document symbols provider generates an outline tree for the file.
Workspace symbols provider searches all symbols across the workspace.
Folding range provider computes collapsible regions for functions, classes, and imports.
Formatting providers apply code formatting to documents or ranges.
┌─────────────┐
│ LSP Request │
└──────┬──────┘
│
▼
┌──────────────┐
│ Request Type │
└──────┬───────┘
┌──────────────┼──────────────┬─────────────┐
│ │ │ │
Hover │ Completion │ Definition │ References │ Formatting
│ │ │ │
▼ ▼ ▼ ▼ ▼
┌────────────┐ ┌──────────────┐ ┌────────────┐ ┌────────────┐ ┌─────────────┐
│Type Lookup │ │Symbol Search │ │Symbol │ │Usage Search│ │Format Code │
│ │ │ │ │Resolution │ │ │ │ │
└──────┬─────┘ └──────┬───────┘ └──────┬─────┘ └──────┬─────┘ └──────┬──────┘
│ │ │ │ │
└───────────────┴─────────────────┴───────────────┴───────────────┘
│
▼
┌─────────────┐
│ Response │
└─────────────┘
Workspace Indexing
The workspace maintains a global index:
On initialization, the server indexes all Python files in the workspace. The index maps module names to file paths and tracks imported symbols. Import resolution uses the index to find definitions in other files.
The module resolver handles Python's import semantics, searching sys.path and resolving relative imports based on package structure.
Configuration
Server behavior is controlled via configuration files:
[tool.beacon.type_checking]
mode = "balanced" # strict/balanced/relaxed
unsafe_any_depth = 2
[tool.beacon.linting]
enabled = true
[tool.beacon.formatting]
line_length = 88
indent_size = 4
Configuration can be specified in beacon.toml, pyproject.toml, or sent via LSP workspace/didChangeConfiguration.
Limitations
Workspace indexing is synchronous during initialization, which can cause delays on large projects with thousands of Python files.
Multi-root workspaces are not supported. Only a single workspace root is handled.
Some features degrade on files exceeding 10,000 lines of code due to parse and analysis time.
Configuration changes require server restart for some settings. Dynamic reconfiguration is not fully implemented.
Cross-file analysis is limited to import resolution and symbol lookup. Whole-program type inference is not performed.
Memory usage grows with workspace size as the index and caches expand. No automatic memory management or eviction exists for the workspace index.
Key Files
crates/server/src/
├── backend.rs # Main LSP backend
├── lib.rs # Server entry point
└── features/
├── diagnostics.rs # Diagnostic generation
├── hover.rs # Hover information
├── completion/ # Auto-completion
├── goto_definition.rs # Jump to definition
├── references.rs # Find references
├── rename.rs # Rename symbol
├── code_actions.rs # Quick fixes
├── semantic_tokens.rs # Syntax highlighting
└── formatting.rs # Code formatting
Editor Extensions
Beacon provides language server integration for multiple editors through the Language Server Protocol (LSP).
All extensions communicate with the same beacon-lsp server, ensuring feature parity across editors.
Supported Editors
VS Code
Full-featured extension with settings UI and marketplace distribution (planned).
VS Code Extension Documentation
Zed
Native WebAssembly extension with TOML-based configuration.
Neovim
Native LSP client integration using built-in LSP support.
Neovim Setup (see below)
Other LSP-Compatible Editors
Beacon works with any editor supporting the Language Server Protocol. See Manual Setup for configuration.
Installation
Prerequisites
All editors require beacon-lsp to be installed and available in your PATH:
# Install from source
cargo install --path crates/server
# Verify installation
which beacon-lsp
Ensure ~/.cargo/bin is in your PATH:
# Add to ~/.zshrc or ~/.bashrc
export PATH="$HOME/.cargo/bin:$PATH"
Neovim Integration
Neovim has built-in LSP support starting from version 0.5.0. Beacon integrates seamlessly with Neovim's native LSP client.
Requirements
- Neovim ≥ 0.8.0 (recommended 0.10.0+)
beacon-lspinstalled and in PATHnvim-lspconfigplugin (optional but recommended)
Setup with nvim-lspconfig
Using nvim-lspconfig:
-- ~/.config/nvim/lua/plugins/lsp.lua or init.lua
local lspconfig = require('lspconfig')
local configs = require('lspconfig.configs')
-- Register beacon-lsp if not already registered
if not configs.beacon then
configs.beacon = {
default_config = {
cmd = { 'beacon-lsp' },
filetypes = { 'python' },
root_dir = function(fname)
return lspconfig.util.root_pattern(
'beacon.toml',
'pyproject.toml',
'.git'
)(fname) or lspconfig.util.path.dirname(fname)
end,
settings = {},
init_options = {
typeChecking = {
mode = 'balanced', -- 'strict', 'balanced', or 'relaxed'
},
python = {
version = '3.12',
stubPaths = { 'stubs', 'typings' },
},
workspace = {
sourceRoots = {},
excludePatterns = { '**/venv/**', '**/.venv/**' },
},
inlayHints = {
enable = true,
variableTypes = true,
functionReturnTypes = true,
parameterNames = false,
},
diagnostics = {
unresolvedImports = 'warning',
circularImports = 'warning',
},
advanced = {
incremental = true,
workspaceAnalysis = true,
enableCaching = true,
cacheSize = 100,
},
},
},
}
end
-- Setup beacon-lsp
lspconfig.beacon.setup({
on_attach = function(client, bufnr)
-- Enable completion
vim.api.nvim_buf_set_option(bufnr, 'omnifunc', 'v:lua.vim.lsp.omnifunc')
-- Keybindings
local opts = { noremap = true, silent = true, buffer = bufnr }
vim.keymap.set('n', 'gD', vim.lsp.buf.declaration, opts)
vim.keymap.set('n', 'gd', vim.lsp.buf.definition, opts)
vim.keymap.set('n', 'K', vim.lsp.buf.hover, opts)
vim.keymap.set('n', 'gi', vim.lsp.buf.implementation, opts)
vim.keymap.set('n', '<C-k>', vim.lsp.buf.signature_help, opts)
vim.keymap.set('n', '<space>wa', vim.lsp.buf.add_workspace_folder, opts)
vim.keymap.set('n', '<space>wr', vim.lsp.buf.remove_workspace_folder, opts)
vim.keymap.set('n', '<space>wl', function()
print(vim.inspect(vim.lsp.buf.list_workspace_folders()))
end, opts)
vim.keymap.set('n', '<space>D', vim.lsp.buf.type_definition, opts)
vim.keymap.set('n', '<space>rn', vim.lsp.buf.rename, opts)
vim.keymap.set({ 'n', 'v' }, '<space>ca', vim.lsp.buf.code_action, opts)
vim.keymap.set('n', 'gr', vim.lsp.buf.references, opts)
vim.keymap.set('n', '<space>f', function()
vim.lsp.buf.format({ async = true })
end, opts)
-- Enable inlay hints (Neovim 0.10+)
if client.server_capabilities.inlayHintProvider then
vim.lsp.inlay_hint.enable(true, { bufnr = bufnr })
end
end,
capabilities = require('cmp_nvim_lsp').default_capabilities(),
})
Manual Setup (Without nvim-lspconfig)
For minimal configuration without plugins:
-- ~/.config/nvim/init.lua
vim.api.nvim_create_autocmd('FileType', {
pattern = 'python',
callback = function()
vim.lsp.start({
name = 'beacon-lsp',
cmd = { 'beacon-lsp' },
root_dir = vim.fs.dirname(
vim.fs.find({ 'beacon.toml', 'pyproject.toml', '.git' }, {
upward = true,
})[1]
),
settings = {
typeChecking = { mode = 'balanced' },
python = { version = '3.12' },
inlayHints = { enable = true },
},
})
end,
})
-- Keybindings
vim.api.nvim_create_autocmd('LspAttach', {
callback = function(args)
local opts = { buffer = args.buf }
vim.keymap.set('n', 'gd', vim.lsp.buf.definition, opts)
vim.keymap.set('n', 'K', vim.lsp.buf.hover, opts)
vim.keymap.set('n', 'gr', vim.lsp.buf.references, opts)
vim.keymap.set('n', '<space>rn', vim.lsp.buf.rename, opts)
vim.keymap.set('n', '<space>ca', vim.lsp.buf.code_action, opts)
end,
})
LazyVim Setup
For LazyVim users:
-- ~/.config/nvim/lua/plugins/beacon.lua
return {
{
'neovim/nvim-lspconfig',
opts = {
servers = {
beacon = {
cmd = { 'beacon-lsp' },
filetypes = { 'python' },
root_dir = function(fname)
local util = require('lspconfig.util')
return util.root_pattern('beacon.toml', 'pyproject.toml', '.git')(fname)
end,
settings = {
typeChecking = { mode = 'balanced' },
python = { version = '3.12' },
inlayHints = {
enable = true,
variableTypes = true,
functionReturnTypes = true,
},
},
},
},
},
},
}
Kickstart.nvim Setup
For kickstart.nvim users:
-- Add to your init.lua servers table
local servers = {
-- ... other servers
beacon = {
cmd = { 'beacon-lsp' },
filetypes = { 'python' },
settings = {
typeChecking = { mode = 'balanced' },
python = { version = '3.12' },
},
},
}
Completion Support
Beacon works with popular completion plugins:
nvim-cmp
-- ~/.config/nvim/lua/plugins/completion.lua
local cmp = require('cmp')
local lspconfig = require('lspconfig')
lspconfig.beacon.setup({
capabilities = require('cmp_nvim_lsp').default_capabilities(),
})
cmp.setup({
sources = {
{ name = 'nvim_lsp' },
{ name = 'buffer' },
{ name = 'path' },
},
})
coq_nvim
local coq = require('coq')
lspconfig.beacon.setup(coq.lsp_ensure_capabilities())
Diagnostics Configuration
Customize diagnostic display:
-- Configure diagnostics display
vim.diagnostic.config({
virtual_text = {
prefix = '●',
source = 'beacon',
},
signs = true,
underline = true,
update_in_insert = false,
severity_sort = true,
})
-- Custom diagnostic signs
local signs = { Error = '✘', Warn = '⚠', Hint = '', Info = 'ℹ' }
for type, icon in pairs(signs) do
local hl = 'DiagnosticSign' .. type
vim.fn.sign_define(hl, { text = icon, texthl = hl, numhl = hl })
end
Inlay Hints
Enable inlay hints (Neovim 0.10+):
-- Enable inlay hints globally
vim.lsp.inlay_hint.enable(true)
-- Toggle inlay hints with a keybinding
vim.keymap.set('n', '<leader>th', function()
vim.lsp.inlay_hint.enable(not vim.lsp.inlay_hint.is_enabled())
end, { desc = 'Toggle Inlay Hints' })
Workspace Configuration
Override settings per project using beacon.toml in your project root.
See Configuration Documentation for complete details on all available options and TOML structure.
Manual Setup
For editors not listed above, configure your LSP client to:
- Command:
beacon-lsp - File Types:
python - Root Patterns:
beacon.toml,pyproject.toml,.git - Communication: stdio (stdin/stdout)
Example Configuration
{
"command": "beacon-lsp",
"filetypes": ["python"],
"rootPatterns": ["beacon.toml", "pyproject.toml", ".git"],
"settings": {
"typeChecking": { "mode": "balanced" },
"python": { "version": "3.12" }
}
}
Feature Comparison
| Feature | VS Code | Zed | Neovim | Other |
|---|---|---|---|---|
| Diagnostics | ✓ | ✓ | ✓ | ✓ |
| Hover | ✓ | ✓ | ✓ | ✓ |
| Completions | ✓ | ✓ | ✓ | ✓ |
| Go to Definition | ✓ | ✓ | ✓ | ✓ |
| Find References | ✓ | ✓ | ✓ | ✓ |
| Document Symbols | ✓ | ✓ | ✓ | ✓ |
| Workspace Symbols | ✓ | ✓ | ✓ | ✓ |
| Semantic Tokens | ✓ | ✓ | ✓ | ✓ |
| Inlay Hints | ✓ | ✓ | ✓ (0.10+) | ✓ |
| Code Actions | ✓ | ✓ | ✓ | ✓ |
| Rename | ✓ | ✓ | ✓ | ✓ |
| Folding Ranges | ✓ | ✓ | ✓ | ✓ |
| Document Highlight | ✓ | ✓ | ✓ | ✓ |
| Signature Help | ✓ | ✓ | ✓ | ✓ |
| Settings UI | ✓ | - | - | - |
| Marketplace | Planned | - | - | - |
All editors share the same language server, ensuring consistent behavior and feature parity.
Configuration
See Configuration Documentation for complete details.
Resources
- Language Server Protocol Specification
- Beacon Configuration
- beacon-lsp Architecture
- Neovim LSP Documentation
- nvim-lspconfig
VS Code Extension
The Beacon VS Code extension (pkg/vscode/) pairs the Rust language server with the VSCode UI.
It activates automatically for Python files and forwards editor requests to the Beacon LSP binary.
Feature Highlights
- On-type diagnostics for syntax and type errors
- Hover tooltips with type information
- Go to definition & find references
- Document and workspace symbols
- Semantic tokens for enhanced highlighting
- Identifier completions and inlay hints
- (Scaffolded) code actions for quick fixes
These capabilities mirror the features exposed by the Rust server in crates/server.
Repository Layout
pkg/vscode/
├── client/ # TypeScript client that binds to VS Code APIs
│ ├── src/extension.ts # Extension entry point; starts the LSP client
│ └── src/test/ # End-to-end tests using the VS Code test runner
├── package.json # Extension manifest (activation, contributions)
├── tsconfig.json # TypeScript project references
├── eslint.config.js # Lint configuration
└── dprint.json # Formatting config for client sources
The client launches the Beacon server binary from target/debug/beacon-lsp (or target/release/beacon-lsp if present). Ensure one of these binaries exists before activating the extension.
Prerequisites
- Rust toolchain (stable) with
cargoavailable inPATH - Node.js 18+ (aligned with current VS Code requirements)
- pnpm for dependency management
Install globally with
npm install -g pnpm - VS Code ≥ 1.100 (see
package.jsonenginesfield) - (Optional)
vsceorovsxfor packaging/publishing
Installing Dependencies
From the repository root:
pnpm install
This installs dependencies for all packages, including the VS Code extension.
Building The Extension Client
The extension compiles TypeScript into client/out/:
pnpm --filter beacon-lsp compile
For iterative development, run:
pnpm --filter beacon-lsp watch
This keeps the TypeScript project in watch mode so recompiles happen automatically after you edit client files.
Building The Beacon LSP Server
The client resolves the server binary relative to the repository root:
target/debug/beacon-lsp (default)
target/release/beacon-lsp (used if available)
Build the server before launching the extension:
cargo build -p beacon-lsp # debug binary
# or
cargo build -p beacon-lsp --release # release binary
Running In VS Code
- Open
pkg/vscodein VS Code. - Select the Run and Debug panel and choose the Beacon LSP launch configuration (provided in
.vscode/launch.json). - Press F5 to start the Extension Development Host.
- In the new window, open a Python file (the repository’s
samples/directory is a good starting point).
The launch configuration compiles the TypeScript client and relies on the previously built Rust binary.
In debug mode, RUST_LOG=beacon_lsp=debug is set automatically so server logs appear in the “Beacon LSP” output channel.
Configuration
The extension provides extensive configuration options accessible through VS Code settings.
All settings are under the beacon.* namespace and can be configured per-workspace or globally.
Type Checking
| Setting | Type | Default | Description |
|---|---|---|---|
beacon.typeChecking.mode | string | "balanced" | Type checking strictness: "strict", "balanced", or "relaxed" |
Inlay Hints
| Setting | Type | Default | Description |
|---|---|---|---|
beacon.inlayHints.enable | boolean | true | Enable inlay hints for type information |
beacon.inlayHints.variableTypes | boolean | true | Show inlay hints for inferred variable types |
beacon.inlayHints.functionReturnTypes | boolean | true | Show inlay hints for inferred function return types |
beacon.inlayHints.parameterNames | boolean | false | Show inlay hints for parameter names in calls |
Python Settings
| Setting | Type | Default | Description |
|---|---|---|---|
beacon.python.version | string | "3.12" | Target Python version: "3.9", "3.10", "3.11", "3.12", "3.13" |
beacon.python.interpreterPath | string | "" | Path to Python interpreter for runtime introspection |
beacon.python.stubPaths | string[] | ["stubs"] | Additional paths to search for .pyi stub files |
Workspace Settings
| Setting | Type | Default | Description |
|---|---|---|---|
beacon.workspace.sourceRoots | string[] | [] | Source roots for module resolution (in addition to workspace root) |
beacon.workspace.excludePatterns | string[] | [] | Patterns to exclude from workspace scanning (e.g., venv/, .venv/) |
Diagnostics
| Setting | Type | Default | Description |
|---|---|---|---|
beacon.diagnostics.unresolvedImports | string | "warning" | Severity for unresolved imports: "error", "warning", "info" |
beacon.diagnostics.circularImports | string | "warning" | Severity for circular imports: "error", "warning", "info" |
Advanced
| Setting | Type | Default | Description |
|---|---|---|---|
beacon.advanced.maxAnyDepth | number | 3 | Maximum depth for Any type propagation (0-10) |
beacon.advanced.incremental | boolean | true | Enable incremental type checking |
beacon.advanced.workspaceAnalysis | boolean | true | Enable workspace-wide analysis |
beacon.advanced.enableCaching | boolean | true | Enable caching of parse trees and type inference results |
beacon.advanced.cacheSize | number | 100 | Maximum number of documents to cache (0-1000) |
Debugging
| Setting | Type | Default | Description |
|---|---|---|---|
beacon.trace.server | string | "off" | JSON-RPC tracing: "off", "messages", or "verbose" |
Enable messages or verbose while debugging protocol issues; traces are written to the "Beacon LSP" output channel.
Example Configuration
Add these settings to your .vscode/settings.json:
{
"beacon.typeChecking.mode": "strict",
"beacon.python.version": "3.12",
"beacon.python.stubPaths": ["stubs", "typings"],
"beacon.workspace.sourceRoots": ["src", "lib"],
"beacon.workspace.excludePatterns": [
"**/venv/**",
"**/.venv/**",
"**/build/**"
],
"beacon.inlayHints.enable": true,
"beacon.inlayHints.variableTypes": true,
"beacon.inlayHints.functionReturnTypes": true,
"beacon.diagnostics.unresolvedImports": "error",
"beacon.diagnostics.circularImports": "warning"
}
Configuration Precedence
Beacon merges configuration from multiple sources:
- Default values - Built-in defaults
- TOML file -
beacon.tomlorpyproject.tomlin workspace root - VS Code settings - User/workspace settings (highest precedence)
See Configuration for details on TOML configuration files.
Packaging & Publishing
- Ensure the client is built (
pnpm --filter beacon-lsp compile) and the server release binary exists (cargo build -p beacon-lsp --release). - From
pkg/vscode, runvsce package(orovsx package) to produce a.vsix. - Publish the package with
vsce publishorovsx publishonce authenticated.
The generated .vsix expects the server binary to be shipped alongside the extension or obtainable on the user’s machine.
Adjust extension.ts if you plan to bundle the binary differently.
Zed Extension
The Beacon Zed extension (pkg/zed/) integrates the Beacon language server with Zed editor.
It activates automatically for Python files and provides Hindley-Milner type checking alongside standard LSP features.
Feature Highlights
- On-type diagnostics for syntax and type errors
- Hover tooltips with type information
- Go to definition & find references
- Document and workspace symbols
- Semantic tokens for enhanced highlighting
- Identifier completions and inlay hints
- Code actions for quick fixes and refactoring
These capabilities mirror the features exposed by the Rust server in crates/server.
Repository Layout
pkg/zed/
├── src/
│ └── lib.rs # Extension implementation
├── Cargo.toml # Rust project manifest
├── extension.toml # Zed extension metadata
└── README.md # Installation instructions
The extension is compiled to WebAssembly (wasm32-wasip1) and communicates with the beacon-lsp binary via the Language Server Protocol.
Prerequisites
- Rust toolchain (stable) with
cargoavailable inPATH - wasm32-wasip1 target for Rust (install with
rustup target add wasm32-wasip1) - beacon-lsp binary installed and available in
PATH - Zed editor installed
Installing beacon-lsp
The extension requires beacon-lsp to be available in your system PATH:
# From the repository root
cargo install --path crates/server
This installs the beacon-lsp binary to ~/.cargo/bin. Ensure ~/.cargo/bin is in your PATH.
Verify installation:
which beacon-lsp
# Should output: /Users/<username>/.cargo/bin/beacon-lsp
Building The Extension
The extension must be compiled to WebAssembly:
cd pkg/zed
cargo build --target wasm32-wasip1 --release
The compiled extension will be at:
target/wasm32-wasip1/release/beacon_zed.wasm
Installing The Extension
Development Installation
For local development and testing:
-
Build the extension (see above)
-
Create a symlink to the extension directory in Zed's extensions folder:
# macOS mkdir -p ~/.config/zed/extensions ln -s /path/to/beacon/pkg/zed ~/.config/zed/extensions/beacon -
Restart Zed or reload the window
-
Open a Python file to activate the extension
Distribution Installation
To distribute the extension, package it following Zed's extension installation guide.
The extension expects beacon-lsp to be available in the user's PATH. Users should install it via:
cargo install beacon-lsp
Extension Implementation
The extension implements the zed::Extension trait with the following key components:
Language Server Command
Returns the command to launch beacon-lsp:
#![allow(unused)] fn main() { fn language_server_command( &mut self, _: &zed::LanguageServerId, worktree: &zed::Worktree) -> zed::Result<zed::Command> { let command = worktree .which("beacon-lsp") .ok_or_else(|| "beacon-lsp not found in PATH")?; Ok(zed::Command { command, args: vec![], env: vec![("RUST_LOG".to_string(), "info".to_string())], }) } }
Environment Variables
The extension sets RUST_LOG=info to configure logging. Logs are written to stderr and can be viewed in Zed's log panel.
Arguments
beacon-lsp doesn't require command-line arguments as it communicates via stdin/stdout.
Configuration
See Configurationfor details.
Development Workflow
Making Changes
-
Edit the extension source in
pkg/zed/src/lib.rs -
Rebuild the extension:
cargo build --target wasm32-wasip1 --release -
Restart Zed to load the updated extension
Debugging
Enable detailed logging:
RUST_LOG=beacon_lsp=debug zed
Or set the environment variable in your shell before launching Zed. Logs appear in:
- macOS:
~/Library/Logs/Zed/Zed.log - Linux:
~/.local/share/zed/logs/Zed.log
Testing Changes
-
Build the language server with your changes:
cargo build -p beacon-lsp cargo install --path crates/server -
Rebuild the extension if needed
-
Open a Python project in Zed
-
Test LSP features:
- Hover over variables to see type information
- Use Cmd+Click (macOS) or Ctrl+Click (Linux) for go-to-definition
- Check the Problems panel for diagnostics
- Trigger completions with Ctrl+Space
Comparison with VS Code Extension
| Feature | Zed | VS Code |
|---|---|---|
| Installation | Manual build + PATH | Marketplace (planned) |
| Configuration | TOML files | VS Code settings UI |
| Debugging | Log files | Output panel |
| Language Server | Shared (beacon-lsp) | Shared (beacon-lsp) |
| Features | Full LSP support | Full LSP support |
| Platform | macOS, Linux | macOS, Linux, Windows |
Both extensions use the same beacon-lsp server, so feature parity is guaranteed.
Resources
- Zed Extension Development
- Zed Language Extensions
- Zed Extension API
- Language Server Protocol
- Beacon Configuration
Formatting Overview
Beacon provides built-in PEP8-compliant Python code formatting capabilities through its language server. The formatter is designed to produce consistent, readable code while respecting configuration preferences.
Design Principles
The formatter follows these core principles:
PEP8 Compliance: Adheres to Python Enhancement Proposal 8 style guidelines by default, with configurable options for compatibility with Black and autopep8.
AST-Based: Operates on the abstract syntax tree rather than raw text, ensuring formatting preserves semantic meaning and handles edge cases correctly.
Configurable: Supports workspace and project-level configuration through beacon.toml or pyproject.toml files.
Incremental: Formats code caching of already-formatted sources and formatting results to minimize redundant processing.
Formatting Pipeline
The formatter operates in four stages:
- Parsing: Source code is parsed into an AST using the Beacon parser
- Token Generation: AST nodes are converted into a stream of formatting tokens
- Rule Application: Formatting rules are applied based on context and configuration
- Output Generation: Formatted code is written with proper whitespace and indentation
Key Features
Whitespace and Indentation
- Normalizes indentation to 4 spaces (configurable)
- Removes trailing whitespace
- Manages blank lines between definitions and statements
- Controls whitespace around operators, commas, and colons
See Whitespace for detailed formatting rules.
Line Length Management
- Enforces maximum line length (default: 88 characters, matching Black)
- Smart line breaking at appropriate boundaries
- Handles multi-byte Unicode characters correctly
- Preserves user line breaks when under the limit
See Print Width for line length handling.
Structural Formatting
- Function call and definition parameter wrapping
- Collection literal formatting (lists, dicts, sets, tuples)
- Binary expression breaking
- Import statement organization and sorting
See Structure and Imports for structural rules.
Suppression Comments
The formatter respects suppression directives:
# fmt: skip- Skip formatting for a single line# fmt: off/# fmt: on- Disable formatting for regions
See Suppressions for complete documentation on formatter, linter, and type checker suppressions.
Optimizations
The formatter includes intelligent caching to minimize formatting overhead:
Short-Circuit Cache
The formatter maintains a hash-based cache of already-formatted sources. When formatting is requested:
- Source content and configuration are hashed
- Cache is checked for this hash combination
- If found, formatting is skipped entirely (O(1) operation)
- Source is returned unchanged
Incremental Formatting
Formatting results are cached based on:
- Source content hash
- Configuration hash
- Line range
When formatting the same source multiple times (e.g., during editing), cached results are reused if:
- Source hasn't changed
- Configuration remains the same
- Same range is being formatted
The cache uses LRU (Least Recently Used) eviction with configurable size limits to prevent unbounded memory growth.
Cache Configuration
Caching behavior can be controlled through configuration:
[formatting]
cacheEnabled = true # Enable result caching (default: true)
cacheMaxEntries = 100 # Maximum cache entries (default: 100)
Disabling the cache may be useful in scenarios where:
- Memory constraints are tight
- Source changes very frequently
- Deterministic performance is required
Configuration
Formatting behavior is controlled through settings:
[formatting]
enabled = true
lineLength = 88
indentSize = 4
quoteStyle = "double"
trailingCommas = "multiline"
maxBlankLines = 2
importSorting = "pep8"
compatibilityMode = "black"
cacheEnabled = true
cacheMaxEntries = 100
See the Configuration documentation for complete details.
LSP Integration
The formatter integrates with the Language Server Protocol through:
textDocument/formatting: Format entire documenttextDocument/rangeFormatting: Format selected rangetextDocument/willSaveWaitUntil: Format on save
Compatibility
The formatter provides compatibility modes for popular formatters:
- Black: 88-character line length, minimal configuration
- autopep8: 79-character line length, conservative formatting
- PEP8: Strict adherence to style guide recommendations
Whitespace and Indentation
This document describes Beacon's whitespace and indentation formatting rules.
Indentation
Beacon normalizes indentation according to PEP8 guidelines.
Indent Size
Default indentation is 4 spaces per level:
def example():
if condition:
do_something()
Configure indentation via formatting.indentSize:
[formatting]
indentSize = 2 # Use 2 spaces
Tabs vs Spaces
Spaces are strongly recommended and used by default. Tabs can be enabled but are not PEP8-compliant:
[formatting]
useTabs = true # Not recommended
Trailing Whitespace
All trailing whitespace is removed from lines:
# Before
def foo():
return 42
# After
def foo():
return 42
This applies to all lines, including blank lines.
Blank Lines
Beacon manages blank lines according to PEP8 conventions.
Top-Level Definitions
Two blank lines separate top-level class and function definitions:
def first_function():
pass
def second_function():
pass
class MyClass:
pass
Configure via formatting.blankLineBeforeClass and formatting.blankLineBeforeFunction:
[formatting]
blankLineBeforeClass = true
blankLineBeforeFunction = true
Method Definitions
One blank line separates methods within a class:
class Example:
def first_method(self):
pass
def second_method(self):
pass
Maximum Consecutive Blank Lines
By default, at most 2 consecutive blank lines are allowed:
# Before
def foo():
pass
def bar():
pass
# After
def foo():
pass
def bar():
pass
Configure via formatting.maxBlankLines:
[formatting]
maxBlankLines = 1 # Allow only 1 blank line
Operators
Whitespace around operators depends on the operator type.
Binary Operators
Single space on both sides of binary operators when formatting.spacesAroundOperators is enabled (default):
# Arithmetic
result = x + y
quotient = a / b
power = base ** exponent
# Comparison
if value == expected:
pass
# Logical
condition = flag and other_flag
Unary Operators
No space between unary operator and operand:
negative = -value
inverted = ~bits
boolean = not flag
Assignment Operators
Single space around assignment operators:
x = 10
count += 1
value *= 2
Delimiters
Parentheses, Brackets, Braces
No whitespace immediately inside delimiters:
# Correct
function(arg1, arg2)
items = [1, 2, 3]
mapping = {'key': 'value'}
# Incorrect
function( arg1, arg2 )
items = [ 1, 2, 3 ]
mapping = { 'key': 'value' }
Commas
No space before comma, single space after:
# Correct
items = [1, 2, 3]
function(a, b, c)
# Incorrect
items = [1 ,2 ,3]
function(a,b,c)
Colons
In dictionaries and slices, no space before colon, single space after:
# Dictionary
mapping = {'key': 'value', 'other': 'data'}
# Slice
subset = items[start:end]
every_other = items[::2]
In function annotations, no space before colon, single space after:
def greet(name: str) -> str:
return f"Hello, {name}"
In class inheritance and control flow, no space before colon:
class Child(Parent):
pass
if condition:
pass
Comments
Inline Comments
Inline comments have two spaces before the hash and one space after:
x = x + 1 # Increment
Block Comments
Block comments start at the beginning of a line or at the current indentation level:
# This is a block comment
# spanning multiple lines
def function():
# Indented block comment
pass
Configuration Summary
Related configuration options:
[formatting]
indentSize = 4 # Spaces per indent level
useTabs = false # Use spaces, not tabs
maxBlankLines = 2 # Maximum consecutive blank lines
spacesAroundOperators = true # Add spaces around binary operators
blankLineBeforeClass = true # 2 blank lines before top-level classes
blankLineBeforeFunction = true # 2 blank lines before top-level functions
Line Length and Wrapping
Beacon enforces configurable line length limits and provides smart line breaking for long statements.
Line Length
The default maximum line length is 88 characters, matching Black's default.
Configuration
Set line length via formatting.lineLength:
[formatting]
lineLength = 88 # Black default
# Or for strict PEP8
lineLength = 79
Unicode Width Calculation
Line length is calculated using Unicode display width, not byte count.
# This emoji counts as 2 characters wide
message = "Status: ✅"
Line Breaking
When a line exceeds the configured length, Beacon breaks it the following boundaries:
Break Points
Lines can break at these locations:
Commas: Highest priority break point
# Before
result = function(very_long_arg1, very_long_arg2, very_long_arg3, very_long_arg4)
# After
result = function(
very_long_arg1,
very_long_arg2,
very_long_arg3,
very_long_arg4
)
Binary Operators: Secondary break point
# Before
total = first_value + second_value + third_value + fourth_value
# After (when nested)
total = (
first_value
+ second_value
+ third_value
+ fourth_value
)
Opening Brackets: When deeply nested
# Multiple levels of nesting
data = {
'key': [
item1,
item2
]
}
Preserving User Breaks
If your manually inserted line breaks keep the code under the limit, they are preserved:
# This will not be reformatted if under line limit
result = function(
arg1, arg2
)
Wrapping Strategies
Different constructs use different wrapping strategies.
Function Calls
Function calls use one of three strategies based on argument width:
Horizontal: All arguments on one line when they fit
result = function(arg1, arg2, arg3)
Vertical: One argument per line when arguments are long
result = function(
very_long_argument_name_1,
very_long_argument_name_2,
very_long_argument_name_3
)
Mixed: Multiple arguments per line for medium-length arguments
result = function(
arg1, arg2,
arg3, arg4,
arg5
)
Function Definitions
Function parameters wrap similarly to function calls:
def long_function_name(
parameter1: str,
parameter2: int,
parameter3: bool = False
) -> None:
pass
Hanging Indents
Parameters can align with the opening delimiter:
result = function(argument1,
argument2,
argument3)
Or use a consistent indent level:
result = function(
argument1,
argument2,
argument3
)
Beacon prefers consistent indent levels for clarity.
Collection Literals
Collections wrap to vertical layout when they exceed line length:
Lists and Tuples:
items = [
'first',
'second',
'third'
]
Dictionaries:
mapping = {
'key1': 'value1',
'key2': 'value2',
'key3': 'value3'
}
Sets:
unique = {
item1,
item2,
item3
}
Binary Expressions
Long binary expressions break before operators:
result = (
condition1
and condition2
or condition3
)
Parenthesized Continuations
Python's implicit line continuation inside parentheses is preferred over backslashes:
# Preferred
total = (
first_value
+ second_value
+ third_value
)
# Avoid
total = first_value \
+ second_value \
+ third_value
Trailing Commas
Beacon adds trailing commas in multi-line structures when formatting.trailingCommas is set appropriately:
[formatting]
trailingCommas = "multiline" # Add in multi-line structures (default)
# trailingCommas = "always" # Always add
# trailingCommas = "never" # Never add
With multiline setting:
items = [
'first',
'second',
'third', # Trailing comma added
]
Benefits of trailing commas:
- Cleaner diffs when adding/removing items
- Prevents forgetting commas when reordering
- Consistent formatting
Context-Aware Breaking
Breaking decisions consider context:
Inside Strings and Comments: Never break
# This string won't be broken even if it's very long
message = "This is a very long string that exceeds the line length limit"
Nested Constructs: Allow breaking at higher nesting levels
result = outer(
inner(
arg1,
arg2
)
)
Statement Boundaries: Prefer breaking between statements
# Break between statements
first = calculate_first()
second = calculate_second()
# Rather than within a statement
first = calculate_first(); second = calculate_second()
Configuration Summary
Related configuration options:
[formatting]
lineLength = 88 # Maximum line length
trailingCommas = "multiline" # Trailing comma strategy
compatibilityMode = "black" # Affects wrapping decisions
String and Comment Formatting
Beacon's formatter provides intelligent string quote normalization and comment formatting while preserving special directives and avoiding unnecessary escaping.
String Quote Normalization
The formatter can normalize string quotes according to your preferred style:
Quote Styles
- Double quotes (default) - Converts strings to use
" - Single quotes - Converts strings to use
' - Preserve - Keeps original quote style
Smart Escaping Avoidance
The formatter intelligently avoids quote normalization when it would introduce escaping:
# Configuration: quote_style = "double"
# Would require escaping, so preserved
'He said "hello" to me'
# No quotes inside, normalized
'simple string' → "simple string"
Prefixed Strings
String prefixes (r, f, rf, etc.) are preserved during normalization:
# Configuration: quote_style = "double"
r'raw string' → r"raw string"
f'formatted {x}' → f"formatted {x}"
rf'raw formatted' → rf"raw formatted"
Docstring Formatting
Triple-quoted strings (docstrings) receive special handling:
Quote Normalization
Docstrings are normalized to the configured quote style unless they contain the target quote sequence:
# Configuration: quote_style = "double"
'''Single quoted docstring''' → """Single quoted docstring"""
# Contains target quotes, preserved
'''String with """quotes""" inside'''
Indentation
Multi-line docstrings maintain consistent indentation:
def example():
"""
This is a docstring with
properly normalized indentation
across all lines
"""
Comment Formatting
Comments are formatted for consistency while preserving special directives.
Standard Comments
Regular comments are formatted with a single space after the #:
#comment → # comment
# multiple spaces → # multiple spaces
Inline Comments
Inline comments (on the same line as code) are preceded by two spaces:
x = 1 # inline comment
Special Directives
Tool-specific comments are preserved exactly as written:
# type: ignore- Type checking suppressions# noqa- Linting suppressions# pylint:,# mypy:,# flake8:- Tool-specific directives# fmt: off/on,# black:- Formatter control
x = very_long_line() # type: ignore # Preserved exactly
Block Comments
Multi-line block comments at module level may be surrounded by blank lines for better separation.
Configuration
String and comment formatting respects these settings:
beacon.formatting.quoteStyle- Quote normalization style (default: "double")beacon.formatting.normalizeDocstringQuotes- Apply quote normalization to docstrings (default: true)
Examples
Before Formatting
def greet(name):
'''Say hello''' #function docstring
message='Hello, ' + name #create greeting
return message
After Formatting
def greet(name):
"""Say hello""" # function docstring
message = "Hello, " + name # create greeting
return message
Import Formatting
Beacon's formatter provides PEP8-compliant import sorting and formatting with intelligent grouping and deduplication.
Import Groups
Imports are automatically organized into three groups following PEP8 style:
- Standard library imports - Python's built-in modules (os, sys, json, etc.)
- Third-party imports - External packages (numpy, django, requests, etc.)
- Local imports - Relative imports from your project (., .., .models, etc.)
Each group is separated by a blank line for clarity.
Sorting Within Groups
Within each group, imports are sorted alphabetically by module name:
- Simple
importstatements are sorted beforefromimports - Multiple names in
fromimports are alphabetically sorted - Duplicate imports are automatically removed
Multi-line Imports
When from imports exceed the configured line length, they are automatically wrapped:
# Short enough for one line
from os import environ, path
# Exceeds line length - uses multi-line format
from collections import (
Counter,
OrderedDict,
defaultdict,
namedtuple,
)
Standard Library Detection
Beacon includes a comprehensive list of Python standard library modules for accurate categorization. Third-party packages are automatically identified when they don't match known stdlib modules.
Configuration
Import formatting respects these configuration options:
beacon.formatting.lineLength- Controls when to wrap multi-line importsbeacon.formatting.importSorting- Set topep8for standard sorting (default)
Example
Input:
from numpy import array
import sys
from .models import User
import os
from django.db import models
Output:
import os
import sys
from django.db import models
from numpy import array
from .models import User
Structural Formatting
Structural formatting rules control the layout of Python constructs beyond basic whitespace and indentation.
Trailing Commas
Trailing commas in multi-line structures improve git diffs and make adding items easier.
Configuration
Controlled by beacon.formatting.trailingCommas:
always: Add trailing commas to all multi-line structuresmultiline: Add trailing commas only to multi-line nested structures (default)never: Never add trailing commas
Behavior
The formatter determines whether to add a trailing comma based on:
- The trailing comma configuration setting
- Whether the structure spans multiple lines
- The nesting depth of the structure
For multiline mode, trailing commas are added when both conditions are met:
- The structure is multi-line
- The structure is nested (inside parentheses, brackets, or braces)
Examples
# multiline mode
items = [
"first",
"second",
"third", # trailing comma added (nested and multiline)
]
func(
arg1,
arg2, # trailing comma added
)
# Top-level, single-line: no trailing comma
top_level = ["a", "b", "c"]
Dictionary Formatting
Dictionary formatting includes key-value spacing and multi-line alignment.
Value Indentation
For multi-line dictionaries, the formatter calculates appropriate indentation:
- Nested dictionaries: Use base indentation + 1 level
- Inline dictionaries: Align values with key width + 2 spaces
# Nested multi-line
config = {
"key": "value",
"nested": {
"inner": "data",
},
}
# Inline alignment
options = {"short": "val", "longer_key": "val"}
Comprehensions
List, dict, set, and generator comprehensions are formatted based on length.
Wrapping Strategy
The formatter chooses between horizontal and vertical layout:
- Horizontal: Entire comprehension fits on one line
- Vertical: Comprehension exceeds available line space
# Horizontal (fits on one line)
squares = [x**2 for x in range(10)]
# Vertical (too long)
result = [
transform(item)
for item in collection
if predicate(item)
]
Lambda Expressions
Lambda expressions wrap to multiple lines when they exceed the line length limit.
Wrapping Decision
Determined by: current_column + lambda_width > line_length
# Short lambda: stays on one line
square = lambda x: x**2
# Long lambda: may need refactoring to def
complex = lambda x, y, z: (
some_complex_calculation(x, y, z)
)
Decorators
Decorators are formatted with one decorator per line and proper spacing.
Rules
- Each decorator on its own line
- Decorators aligned at the same indentation as the function/class
- @ symbol normalized (added if missing)
- No blank lines between consecutive decorators
@property
@lru_cache(maxsize=128)
def expensive_computation(self):
return result
Class Definitions
Class definitions follow PEP 8 spacing conventions.
Blank Lines
Controlled by beacon.formatting.blankLineBeforeClass:
- Top-level classes: 2 blank lines before (when enabled)
- Nested classes: 1 blank line before
# Module-level
class TopLevelClass:
pass
class AnotherTopLevelClass:
class NestedClass:
pass
Function Definitions
Function definitions use similar spacing rules to classes.
Blank Lines
Controlled by beacon.formatting.blankLineBeforeFunction:
- Top-level functions: 2 blank lines before (when enabled)
- Methods: 1 blank line before
# Module-level
def top_level_function():
pass
class MyClass:
def method_one(self):
pass
def method_two(self):
pass
Type Annotations
Type annotation spacing follows PEP 8 guidelines.
Spacing Rules
Returns tuple (before, after) for colon spacing:
- Variable annotations: No space before colon, one space after
- Return annotations: Space before and after
->
# Variable annotations
name: str = "value"
count: int = 42
# Function annotations
def greet(name: str, age: int) -> str:
return f"Hello {name}"
Implementation
Structural formatting rules are implemented in FormattingRules:
should_add_trailing_comma(): Determines trailing comma insertionformat_decorator(): Normalizes decorator syntaxtype_annotation_spacing(): Returns spacing tupleshould_wrap_lambda(): Decides lambda wrappingdict_value_indent(): Calculates dictionary value indentationcomprehension_wrapping_strategy(): Returns wrapping strategyblank_lines_for_class(): Returns required blank lines for classesblank_lines_for_function(): Returns required blank lines for functions
All rules are context-aware and respect the current indentation level and nesting depth.
Suppression/Ignore Comments
Beacon supports suppression/ignore comments to selectively disable the formatter, linter, and type checker on specific lines or regions of code.
Formatter Suppressions
Control when the formatter should skip code sections.
Single Line: # fmt: skip
Skip formatting for a single statement or line.
# This line will be formatted normally
x = 1
# This line preserves exact spacing
y=2+3 # fmt: skip
# Back to normal formatting
z = 4
Region: # fmt: off / # fmt: on
Disable formatting for entire code blocks.
# Normal formatting applies here
formatted_dict = {"key": "value"}
# fmt: off
unformatted_dict={"key":"value","no":"spaces"}
complex_expression=1+2+3+4+5+6+7+8+9
# fmt: on
# Back to formatted code
back_to_normal = {"properly": "formatted"}
- Multiple
# fmt: off/# fmt: onpairs allowed in the same file - Unclosed
# fmt: offpreserves formatting to end of file - The directive lines themselves are preserved as-is
Alignment Preservation
Common use case: preserving column alignment in matrices or tables.
# Normal list formatting
matrix = [
[1, 2, 3],
[4, 5, 6],
]
# fmt: off
# Preserve column alignment
aligned_matrix = [
[1, 2, 3],
[100, 200, 300],
[10, 20, 30],
]
# fmt: on
Linter Suppressions
Suppress specific linter warnings or all warnings on a line.
Suppress All Warnings: # noqa
Disable all linter checks for a line.
x = 1 # noqa
Suppress Specific Rules: # noqa: CODE
Disable specific linter rules by code.
# Suppress unused import warning
import os # noqa: BEA015
# Suppress unused variable warning
result = expensive_computation() # noqa: BEA016
# Suppress multiple specific rules
break # noqa: BEA005, BEA010
Multiple Rules
Separate multiple rule codes with commas:
# Suppress both undefined name and unused variable
x = undefined_variable # noqa: BEA001, BEA016
Case Insensitive
Rule codes are case-insensitive:
x = 1 # noqa: bea016 # Same as BEA016
Type Checker Suppressions
Suppress type checking errors.
Suppress All Type Errors: # type: ignore
Disable all type checking for a line.
x: int = "string" # type: ignore
Suppress Specific Error: # type: ignore[code]
Disable specific type error categories.
# Suppress only assignment type errors
value: str = 42 # type: ignore[assignment]
# Suppress multiple error types
result: int = some_function() # type: ignore[assignment, call-arg]
Common type error codes:
assignment- Type mismatch in assignmentarg-type- Incorrect argument typereturn-value- Return type mismatchcall-arg- Function call argument errorsattr-defined- Attribute not defined
Combining Suppression/Ignore Comments
Multiple suppression types can be used on the same line in any order
# Suppress both type checker and linter
x: int = "string" # type: ignore # noqa: BEA016
# Formatter skip with linter suppression
y=2+3 # fmt: skip # noqa: BEA020
z = value # noqa: BEA001 # type: ignore
# Same as:
z = value # type: ignore # noqa: BEA001
Quick Reference
| Comment | Scope | Applies To | Example |
|---|---|---|---|
# fmt: skip | Single line | Formatter | x=1 # fmt: skip |
# fmt: off | Start region | Formatter | See examples above |
# fmt: on | End region | Formatter | See examples above |
# noqa | Single line | All linter rules | x=1 # noqa |
# noqa: CODE | Single line | Specific linter rule(s) | import os # noqa: BEA015 |
# type: ignore | Single line | All type errors | x: int = "s" # type: ignore |
# type: ignore[code] | Single line | Specific type error(s) | x: int = "s" # type: ignore[assignment] |
See Also
- Linter Rules - Complete list of BEA rule codes
- Type Checking - Type system documentation
- Formatter Configuration - Global formatter settings
CLI Overview
The Beacon CLI provides command-line tools for parsing, type checking, and analyzing Python code using Hindley-Milner type inference.
Available Commands
Core Commands
parse- Parse Python files and display the ASThighlight- Syntax highlighting with optional colorscheck- Validate Python syntax for parse errorsresolve- Analyze name resolution and display symbol tablesformat- Run the Beacon formatter without starting the language server
Static Analysis
analyze- Run static analysis on Python code (linting and data flow)lint- Run linter on Python code
Type Checking
typecheck- Perform Hindley-Milner type inference and report type errors
Language Server
lsp- Start the Beacon Language Server Protocol server
Debug Tools (Debug Builds Only)
debug tree- Display tree-sitter CST structuredebug ast- Show AST with inferred typesdebug constraints- Display generated type constraintsdebug unify- Show unification trace
Installation
Build from source:
cargo build --release
The binary will be available at target/release/beacon-cli.
Basic Usage
All commands accept either a file path or read from stdin:
# From file
beacon-cli typecheck example.py
# From stdin
cat example.py | beacon-cli typecheck
Getting Help
For detailed help on any command:
beacon-cli help <command>
For the complete list of options:
beacon-cli --help
Static Analysis
The analyze command runs static analysis on Python code, including linting and data flow analysis.
Targets
File Analysis
Analyze an entire file:
beacon analyze file ./src/myapp/core.py
Function Analysis
Analyze a specific function in a file:
beacon analyze function ./src/myapp/core.py:process_data
Class Analysis
Analyze a specific class in a file:
beacon analyze class ./src/myapp/models.py:User
Package Analysis (TODO)
Analyze an entire package (directory with init.py):
beacon analyze package ./src/myapp
Project Analysis (TODO)
Analyze an entire project (workspace with multiple packages):
beacon analyze project .
Options
Output Format
Control the output format:
# Human-readable output (default)
beacon analyze file main.py --format human
# JSON output for machine processing
beacon analyze file main.py --format json
# Compact single-line format (file:line:col)
beacon analyze file main.py --format compact
Analysis Filters
Run specific analyses:
# Only run linter
beacon analyze file main.py --lint-only
# Only run data flow analysis
beacon analyze file main.py --dataflow-only
Visualization
Show additional information:
# Show control flow graph visualization (TODO)
beacon analyze file main.py --show-cfg
# Show inferred types (TODO)
beacon analyze file main.py --show-types
Examples
Analyze a Complete File
# calculator.py
import os
def greet(name):
return f'Hello {name}'
def unused_function():
x = 1
x = 2
return x
class Calculator:
def add(self, a, b):
return a + b
$ beacon analyze file calculator.py
✗ 2 issues found in calculator.py
▸ calculator.py:1:1 [BEA015]
'os' imported but never used
1 import os
^
▸ calculator.py:8:5 [BEA018]
'x' is redefined before being used
8 x = 2
^
Analyze a Specific Function
$ beacon analyze function calculator.py:greet
✗ 1 issues found in calculator.py
▸ calculator.py:1:1 [BEA015]
'os' imported but never used
1 import os
^
Analyze a Specific Class
$ beacon analyze class calculator.py:Calculator
✗ 1 issues found in calculator.py
▸ calculator.py:1:1 [BEA015]
'os' imported but never used
1 import os
^
Lint-Only Mode
Run only linting without data flow analysis:
beacon analyze file main.py --lint-only
JSON Output
Machine-readable output for tooling integration:
beacon analyze class models.py:User --format json
Linting
The lint command runs the Beacon linter on Python code to detect common coding issues, style violations, and potential bugs.
Usage
beacon lint [OPTIONS] [PATHS]...
Accepts:
- Single file:
beacon lint file.py - Multiple files:
beacon lint file1.py file2.py file3.py - Directory:
beacon lint src/(recursively finds all .py files) - Stdin:
beacon lint(reads from stdin)
Examples
Detecting Unused Imports and Variable Redefinition
# test.py
import os
def greet(name):
return f'Hello {name}'
def unused_function():
x = 1
x = 2 # Redefined before being used
return x
beacon lint test.py
Output:
✗ 2 issues found in test.py
▸ test.py:1:1 [BEA015]
'os' imported but never used
1 import os
^
▸ test.py:8:5 [BEA018]
'x' is redefined before being used
8 x = 2
^
Clean Code - No Issues
# clean.py
def add(x, y):
return x + y
result = add(1, 2)
print(result)
$ beacon lint clean.py
✓ No issues found
Output Formats
Human-Readable (Default)
Shows issues with context and line numbers (default format):
$ beacon lint test.py
✗ 2 issues found in test.py
▸ test.py:1:1 [BEA015]
'os' imported but never used
1 import os
^
▸ test.py:8:5 [BEA018]
'x' is redefined before being used
8 x = 2
^
JSON Format
Machine-readable output for CI/CD integration:
beacon lint test.py --format json
Output:
[
{
"rule": "UnusedImport",
"message": "'os' imported but never used",
"filename": "test.py",
"line": 1,
"col": 1
},
{
"rule": "RedefinedWhileUnused",
"message": "'x' is redefined before being used",
"filename": "test.py",
"line": 8,
"col": 5
}
]
Compact Format
Single-line format compatible with many editors:
$ beacon lint test.py --format compact
test.py:1:1: [BEA015] 'os' imported but never used
test.py:8:5: [BEA018] 'x' is redefined before being used
Lint Rules
The linter implements PyFlakes-style rules (BEA001-BEA030):
- Undefined variables
- Unused imports and variables
- Syntax errors in specific contexts
- Potential bugs (assert on tuple, is vs ==, etc.)
- Code style issues
For a complete list of rules, see the Lint Rules documentation.
Multiple Files and Directories
Lint all files in a directory
beacon lint src/
Lint multiple specific files
beacon lint src/main.py src/utils.py tests/test_main.py
Lint for CI with JSON output
beacon lint --format json src/ > lint-results.json
Directory Traversal
When a directory is provided, the command:
- Recursively discovers all
.pyfiles - Respects
.gitignorerules - Excludes common patterns:
__pycache__/,*.pyc,.pytest_cache/,.mypy_cache/,.ruff_cache/,venv/,.venv/,env/,.env/
Exit Codes
0- No issues found1- Issues found
This makes it easy to use in CI/CD pipelines:
beacon lint src/ || exit 1
Notes
The linter does not fix issues automatically (yet). It only reports them.
Type Checking
The typecheck command performs Hindley-Milner type inference on Python code and reports type errors.
Usage
beacon typecheck [OPTIONS] [PATHS]...
Accepts:
- Single file:
beacon typecheck file.py - Multiple files:
beacon typecheck file1.py file2.py file3.py - Directory:
beacon typecheck src/(recursively finds all .py files) - Stdin:
beacon typecheck(reads from stdin)
Options
-f, --format <FORMAT>- Output format (human, json, compact) [default: human]
Output Formats
Human (Default)
Human-readable output with context and visual pointers:
$ beacon typecheck example.py
Found 1 type error(s):
Error 1: Cannot unify types: Int ~ Str (line 3, col 5)
--> example.py:3:5
|
3 | z = x + y
| ^
JSON
Machine-readable JSON format for tooling integration:
$ beacon typecheck --format json example.py
{
"errors": [
{
"error": "Cannot unify types: Int ~ Str",
"line": 3,
"col": 5,
"end_line": null,
"end_col": null
}
],
"error_count": 1
}
Compact
Single-line format compatible with editor quickfix lists:
$ beacon typecheck --format compact example.py
example.py:3:5: Cannot unify types: Int ~ Str
Examples
Check a single file
beacon typecheck src/main.py
Check multiple files
beacon typecheck src/main.py src/utils.py tests/test_main.py
Check all files in a directory
beacon typecheck src/
Check with JSON output for CI
beacon typecheck --format json src/ > type-errors.json
Check from stdin
cat src/main.py | beacon typecheck
Exit Codes
0- No type errors found1- Type errors found or analysis failed
Directory Traversal
When a directory is provided, the command:
- Recursively discovers all
.pyfiles - Respects
.gitignorerules - Excludes common patterns:
__pycache__/,*.pyc,.pytest_cache/,.mypy_cache/,.ruff_cache/,venv/,.venv/,env/,.env/
Language Server
The lsp command starts the Beacon Language Server Protocol server for editor integration.
Usage
beacon-cli lsp [OPTIONS]
Options
--tcp <PORT>- Use TCP on the specified port (TODO: not yet implemented)--log-file <PATH>- Write logs to the specified file
Communication Modes
stdio (Default)
The default mode uses standard input/output for LSP communication. This is the standard mode for editor integration:
beacon-cli lsp
Editors spawn the LSP server and communicate via pipes. This is automatically configured by editor plugins.
TCP Mode (TODO)
TCP mode allows remote LSP connections and easier debugging:
beacon-cli lsp --tcp 9257
Logging
stderr (Default)
By default, logs are written to stderr:
beacon-cli lsp 2> lsp.log
File Logging
Use the --log-file option to write logs to a specific file:
beacon-cli lsp --log-file /tmp/beacon-lsp.log
The log file is created if it doesn't exist and appended to if it does.
Environment Variables
Control log level via the RUST_LOG environment variable:
# Info level (default)
RUST_LOG=info beacon-cli lsp
# Debug level for verbose logging
RUST_LOG=debug beacon-cli lsp
# Trace level for very verbose logging
RUST_LOG=trace beacon-cli lsp
Editor Integration
VS Code
The Beacon VS Code extension automatically spawns the LSP server. No manual configuration needed.
Neovim
Configure nvim-lspconfig:
require'lspconfig'.beacon.setup{
cmd = { "beacon-cli", "lsp" },
filetypes = { "python" },
root_dir = function(fname)
return vim.fn.getcwd()
end,
}
Emacs (lsp-mode)
Add to your configuration:
(add-to-list 'lsp-language-id-configuration '(python-mode . "python"))
(lsp-register-client
(make-lsp-client :new-connection (lsp-stdio-connection '("beacon-cli" "lsp"))
:major-modes '(python-mode)
:server-id 'beacon))
LSP Features
The Beacon LSP server provides:
- Full type inference (Hindley-Milner)
- Hover information with inferred types
- Go to definition
- Find references
- Document/workspace symbols
- Semantic tokens
- Inlay hints (type annotations)
- Code actions
- Diagnostics (type errors)
- Auto-completion
See the LSP documentation for detailed feature descriptions.
Formatter CLI
The format command exposes Beacon's Python formatter without having to spin up the language server.
It is helpful for debugging formatter behaviour (for example, while comparing
samples/capabilities_support.py against the generated samples/capabilities_support_formatted.py).
Usage
beacon format [OPTIONS] [PATHS]...
Accepts:
- Single file:
beacon format file.py - Multiple files:
beacon format file1.py file2.py file3.py - Directory:
beacon format src/(recursively finds all .py files) - Stdin:
beacon format(reads from stdin)
Options
| Flag | Description |
|---|---|
--write | Overwrite files with formatted output. |
--check | Exit with a non-zero status if formatting would change the input. |
--output <PATH> | Write formatted output to a different file (only works with single file input). |
--write conflicts with both --check and --output to prevent accidental combinations.
Examples
Format a single file and display to terminal
beacon format samples/capabilities_support.py
Format file in-place
beacon format samples/capabilities_support.py --write
Format all files in a directory
beacon format src/ --write
Format multiple specific files
beacon format src/main.py src/utils.py tests/test_main.py --write
Check formatting in CI
beacon format src/ --check
Write formatted output to a different file
beacon format samples/capabilities_support.py --output samples/capabilities_support_formatted.py
Directory Traversal
When a directory is provided, the command:
- Recursively discovers all
.pyfiles - Respects
.gitignorerules - Excludes common patterns:
__pycache__/,*.pyc,.pytest_cache/,.mypy_cache/,.ruff_cache/,venv/,.venv/,env/,.env/
Suppression Comments
The formatter respects suppression directives in your code:
# Skip formatting for a single line
x=1+2 # fmt: skip
# Skip formatting for a region
# fmt: off
unformatted=code
# fmt: on
See Formatter Suppressions for complete documentation.
Exit Codes
0- All files are formatted correctly (or formatting succeeded)1- Formatting would change files (with--check) or formatting failed
Debug Tools
Debug commands provide low-level inspection of Beacon's parsing and type inference internals. These tools are only available in debug builds.
Availability
Debug commands are compiled only in debug builds:
# Build in debug mode (includes debug commands)
cargo build
# Build in release mode (excludes debug commands)
cargo build --release
Commands
Tree-sitter CST
Display the concrete syntax tree from tree-sitter:
beacon-cli debug tree [OPTIONS] [FILE]
Options:
--json- Output in JSON format
Example output (default S-expression style):
Tree-sitter CST:
(module [0, 0] - [3, 0]
(expression_statement [0, 0] - [0, 6]
(assignment
left: (identifier [0, 0] - [0, 1])
right: (integer [0, 4] - [0, 6]))))
JSON output:
beacon-cli debug tree --json example.py
AST with Types
Show Beacon's AST with inferred types:
beacon-cli debug ast [OPTIONS] [FILE]
Options:
--format <FORMAT>- Output format (tree, json) [default: tree]
Example:
$ beacon-cli debug ast example.py
AST with inferred types:
Type mappings: 15 nodes
Position mappings: 12 positions
Type errors: 0
Node types:
Node 1: Int
Node 2: (Int, Int) -> Int
Node 3: Int
...
Constraints
Display generated type constraints:
$ beacon-cli debug constraints [FILE]
Generated 23 constraints:
▸ Equal (12 instances)
1. Equal(τ1, Int)
2. Equal(τ2, (Int, Int) -> Int)
3. Equal(τ3, Int)
... and 9 more
▸ Call (5 instances)
1. Call(τ2, [τ1, τ1], {}, τ4)
2. Call(print, [τ4], {}, τ5)
... and 3 more
▸ HasAttr (6 instances)
1. HasAttr(τ6, "append", τ7)
2. HasAttr(τ6, "extend", τ8)
... and 4 more
Unification
Show unification trace (TODO):
beacon-cli debug unify [FILE]
Diagnostics
Run comprehensive diagnostics (parse errors, lint issues, type errors, static analysis) on Python files:
beacon debug diagnostics [OPTIONS] <PATHS>...
Accepts:
- Single file:
beacon debug diagnostics file.py - Multiple files:
beacon debug diagnostics file1.py file2.py file3.py - Directory:
beacon debug diagnostics src/(recursively finds all .py files)
Options:
-f, --format <FORMAT>- Output format (human, json, compact) [default: human]
Example output (human format):
$ beacon debug diagnostics src/
⚡ Running comprehensive diagnostics on 5 file(s)...
✓ 0 Parse Errors
✗ 3 Lint Issues
▸ src/main.py:5:1 [BEA015] 'os' imported but never used
5 import os
~
▸ src/utils.py:10:5 [BEA018] 'x' is redefined before being used
10 x = 2
~
▸ src/helper.py:3:1 [BEA015] 'sys' imported but never used
3 import sys
~
✗ 2 Type Errors
▸ src/main.py:12:9 Cannot unify types: Int ~ Str
12 z = x + y
~
▸ src/utils.py:20:5 Undefined type variable: τ5
20 result = unknown_func()
~
Summary: 5 total issue(s) found
JSON output:
beacon debug diagnostics --format json src/ > diagnostics.json
Compact output (for editor integration):
beacon debug diagnostics --format compact src/
src/main.py:5:1: [BEA015] 'os' imported but never used
src/utils.py:10:5: [BEA018] 'x' is redefined before being used
src/main.py:12:9: [TYPE] Cannot unify types: Int ~ Str
Research
Reading List
Theory
Hindley–Milner Type Inference
- Principal Type-Schemes for Functional Programs - https://doi.org/10.1145/582153.582176
- Types and Programming Languages (2002), ch. 22-24
- Implementing a Hindley–Milner Type Inference - https://smunix.github.io/dev.stephendiehl.com/fun/006_hindley_milner.html
- Typing Haskell in Haskell - https://web.cecs.pdx.edu/~mpj/pubs/thih.html
- "Typed Racket: Gradual Typing for Dynamic Languages"
- TypeScript Specification - 2–4 (structural subtyping)
- PEP 544 - Protocols: Structural subtyping in Python
Implementation-Level Concepts
- Tree-sitter docs: https://tree-sitter.github.io/tree-sitter/
- "Rust for Rustaceans"
- The Rustonomicon - 3 (Type Layout & Lifetimes)
- https://jeremymikkola.com/posts/2019_01_01_type_inference_intro.html
- MyPy design docs: https://mypy.readthedocs.io/en/stable/internal.html
- PyRight internals (analyzer.py)
- Expert F# 5.0 (Ch. 9–10).
- TypeScript Compiler (specifically
checker.ts)
Hindley–Milner Type Systems
Hindley–Milner (HM) is the classical polymorphic type system that powers languages such as ML, OCaml, and early versions of Haskell. It strikes a balance between expressiveness (parametric polymorphism) and tractable, annotation-free type inference.
Overview
Parametric polymorphism: functions can operate uniformly over many types without runtime overhead1.
Type inference: the compiler deduces the most general (principal) type scheme for each expression1.
Declarative typing judgment: The typing judgment \(\Gamma \vdash e : \sigma\) relates a context \( \Gamma \), an expression \( e \), and a type scheme \( \sigma \).
The result is a system where generic programs remain statically typed without drowning the developer in annotations.
Core Concepts
Why HM?
\(\lambda\)-calculus requires explicit annotations to achieve polymorphism. HM extends the calculus with let-polymorphism and carefully restricted generalization so that inference stays decidable and efficient.
Monotypes vs Polytypes
Monotypes (\(\tau\)): concrete types such as \(\alpha\), \(\text{Int} \to \text{Bool}\), or constructor applications \(C,\tau_1\cdots\tau_n\)2.
Polytypes / type schemes (\(\sigma\)): quantifications over monotypes, e.g. \(\forall \alpha.,\alpha \to \alpha\).
Principal type: every well-typed expression has a unique (up to renaming) most general type scheme from which all other valid typings can be instantiated1.
Generalization and Instantiation
Generalization: close a monotype over the free type variables not present in the environment.
Instantiation: specialise a polytype by substituting quantified variables with fresh monotype variables.
Let-Polymorphism
Only let-bound definitions are generalized. Lambda parameters remain monomorphic in HM; this restriction is critical to keep inference decidable1.
Formal Skeleton
Syntax
e ::= x
| λ x. e
| e₁ e₂
| let x = e₁ in e₂
The associated type grammar and typing environments are:
\[ \begin{aligned} \tau &::= \alpha \mid C(\tau_1,\dots,\tau_n) \mid \tau \to \tau \ \sigma &::= \tau \mid \forall \alpha.,\sigma \ \Gamma &::= \emptyset \mid \Gamma, x : \sigma \end{aligned} \]
Typing Rules
Typing judgments take the form \(\Gamma \vdash e : \sigma\). Core rules include:
\[ \frac{x : \sigma \in \Gamma}{\Gamma \vdash x : \sigma} \quad\text{(Var)} \]
\[ \frac{\Gamma, x : \tau \vdash e : \tau'}{\Gamma \vdash \lambda x.,e : \tau \to \tau'} \quad\text{(Abs)} \]
\[ \frac{\Gamma \vdash e_0 : \tau \to \tau' \qquad \Gamma \vdash e_1 : \tau}{\Gamma \vdash e_0,e_1 : \tau'} \quad\text{(App)} \]
\[ \frac{\Gamma \vdash e_0 : \sigma \qquad \Gamma, x : \sigma \vdash e_1 : \tau}{\Gamma \vdash \text{let } x = e_0 \text{ in } e_1 : \tau} \quad\text{(Let)} \]
\[ \frac{\Gamma \vdash e : \sigma' \qquad \sigma' \sqsubseteq \sigma}{\Gamma \vdash e : \sigma} \quad\text{(Inst)} \]
\[ \frac{\Gamma \vdash e : \sigma \qquad \alpha \notin \mathrm{free}(\Gamma)}{\Gamma \vdash e : \forall \alpha.,\sigma} \quad\text{(Gen)} \]
Here \(\sigma' \sqsubseteq \sigma\) means that \(\sigma'\) is an instance of \(\sigma\) (obtained by instantiating quantified variables)1.
Algorithm W (Inference Sketch)
Algorithm W is the archetypal inference engine for HM3.
- Annotate sub-expressions with fresh type variables.
- Collect constraints when traversing the AST (especially from applications).
- Unify constraints to solve for unknown types.
- Generalize at each
letby quantifying over variables not free in the environment. - Return the principal type scheme produced by the substitutions.
Typical programs are handled in near-linear time, although the theoretical worst case is higher1.
Strengths and Limitations
Strengths
Minimal annotations with strong static guarantees.
Principled parametric polymorphism with predictable runtime behaviour.
A deterministic, well-understood inference algorithm.
Limitations
No native subtyping; adding it naively renders inference undecidable1.
Higher-rank polymorphism (e.g., passing polymorphic functions as arguments) requires extensions that typically sacrifice automatic inference.
Recursive bindings and mutation demand additional care to avoid unsound generalization.
Extensions: Type Classes
Many ML-derived languages extend HM with type classes to model constrained polymorphism4. Type classes capture ad-hoc behavior (equality, ordering, pretty-printing) without abandoning the core inference model.
Motivation
Developers often need functions that work only for types supporting specific operations (equality, ordering, etc.).
Type classes describe those obligations once and then allow generic code to depend on them declaratively.
Integration with HM
A type class \(C\) packages a set of operations. A type \(T\) becomes an instance of \(C\) by providing implementations.
Type schemes gain constraint contexts, e.g. \(\forall a.,(Eq,a) \Rightarrow a \to a\), read as “for all \(a\) that implement Eq, this function maps \(a\) to \(a\)”.
Environments track both type bindings and accumulated constraints, written informally as \(\Gamma \vdash e : \sigma \mid \Delta\).
During generalization, constraints that do not mention the generalized variables can be abstracted over; during instantiation, remaining constraints must be satisfied (dictionary passing, instance resolution, etc.).
Type classes preserve type safety while keeping user code concise, but introduce design questions about coherence (no conflicting instances), instance search termination, and tooling ergonomics.
Extensions: Higher-Rank Types
Higher-rank polymorphism allows universal quantifiers to appear inside function arguments, enabling functions that consume polymorphic functions5.
HM is rank-1: all \(\forall\) quantifiers appear at the outermost level.
Why Higher Rank?
Certain abstractions require accepting polymorphic functions as arguments, e.g.
applyTwice :: (forall a. a -> a) -> Int -> Int
applyTwice f x = f (f x)
HM cannot express this because the quantifier lives to the left of an arrow. Extending to rank-2 (or higher) types unlocks APIs like runST :: ∀a.(∀s. ST s a) -> a6.
Typing Considerations
The grammar generalizes to allow quantified types within arrow positions; checking such programs typically relies on bidirectional type checking7.
Full type inference for arbitrary rank is undecidable; practical compilers require annotations or rely on heuristics8.
Despite the cost, higher-rank types enable powerful encapsulation patterns and stronger invariants.
Design Trade-offs
Pros: Expressiveness for APIs manipulating polymorphic functions; better information hiding (e.g., ST).
Cons: Additional annotations, more complex error messages, heavier implementation burden.
Further Reading
Implementing HM Stimsina
Parametricity and type classes Well-Typed
Language Server Protocol
Why LSP Exists
Before LSP, editor integrations for language tooling (completion, diagnostics, refactors) were bespoke. Every compiler or analyzer needed plug-ins for VS Code, Vim, IntelliJ, Sublime, etc., and each editor duplicated work to support many languages. This matrix of per-language, per-editor plug-ins slowed innovation and made advanced tooling inaccessible outside first-party IDEs.
The Language Server Protocol—initiated by Microsoft for VS Code and now standardized by the Open Source community—solves this coupling. It defines a JSON-RPC protocol so a single language server can speak to any compliant editor. Editors implement the client half once and gain tooling support for every language that implements the server half.
Problems It Solves
- Shared investment: Language teams implement the protocol once instead of maintaining multiple editor-specific plug-ins.
- Editor freedom: Developers choose tools without sacrificing language-aware features.
- Feature parity: Diagnostics, go-to-definition, workspace symbols, rename, and more behave consistently across environments.
- Incremental updates: The protocol is designed for streaming updates as the user types, enabling responsive experiences.
How LSP Works
- Transport: Client and server communicate over stdin/stdout pipes, TCP, or WebSockets. Messages use JSON-RPC 2.0 framed with
Content-Lengthheaders. - Initialization: Client sends
initializewith capabilities and workspace metadata. Server responds with supported features (ServerCapabilities). A follow-upinitializednotification signals readiness. - Document Synchronization: The client streams document lifecycle notifications (
didOpen,didChange,didSave,didClose) so the server maintains up-to-date views of open files. - Feature Requests: Once documents are synchronized, the client issues requests such as:
textDocument/completionfor completion items.textDocument/hoverfor inline info.textDocument/definitionandtextDocument/referencesfor navigation.textDocument/documentSymbolandworkspace/symbolfor structure searches.textDocument/codeAction,textDocument/rename,textDocument/semanticTokens, and more.
- Responses and Notifications: Servers send responses with payloads defined in the protocol. They can also push diagnostics (
textDocument/publishDiagnostics) or log messages asynchronously. - Shutdown: Clients request graceful shutdown via
shutdownfollowed byexit.
The protocol evolves through versioned specifications (currently 3.x). Beacon targets the subset required for an ergonomic Python workflow, while keeping the implementation modular so new methods can be added as needed.
Tree-sitter
This document contains notes I've compiled based on learnings about tree-sitter.
Tree-sitter is both a parser-generator tool and an incremental parsing library1. It’s optimized for embedding in editors and tooling (rather than being only a compiler backend parser). It supports many languages, with language-specific grammars2.
From the official site:
Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited.
What problems it solves
Here are its key value-propositions and the issues it addresses:
Better than regex/highlight hacks
Traditional editors often use regular expressions or ad-hoc syntax rules for things like syntax highlighting, folding, code navigation. These approaches tend to fail with complex nested constructs or incomplete code (common in live editing). Tree-sitter uses a proper parse tree (Concrete Syntax Tree) rather than purely regex heuristics, giving more accurate structure.
Incremental parsing / live editing
In an editor context, users are typing and modifying files constantly. Re-parsing the entire file on every keystroke is expensive and slow. Tree-sitter supports incremental parsing, meaning it updates only the changed portion of the tree rather than rebuilding everything. This means edits are reflected quickly and the tree remains coherent, which enables features like structured selection, live syntax highlighting, etc.
Unified API / language-agnostic tooling
Because each language has a Tree-sitter grammar, you can build tooling (highlighting, navigation, refactoring) in a language-agnostic way: query the tree, capture nodes of interest, etc. This reduces duplication of effort: editor vendors don’t have to write custom parsing logic per language to support advanced features.
Error-tolerant parsing for editing
Since code is often incomplete/invalid in the middle of editing, a robust parser needs to recover gracefully. Tree-sitter is designed to continue to provide a usable tree under such conditions so editors can rely on the tree structure even when the file is only partially valid.
Enables richer editor tooling
Because you have a full tree, you can support advanced features: structural selection (e.g., select "function" or "if block"), code folding by AST node, refactorings, cross-language injections (e.g., embedded languages). For example, using queries you can capture specific nodes in the tree and apply tooling logic.
Internals
Grammar / Parser Generation
For each language you want support for, you write a grammar file, typically grammar.js (or some variant) describing the language’s syntax in a DSL provided by Tree-sitter.
Example: You describe rules like sum: ..., product: ..., define precedence, associativity (via helpers like prec.left, prec.right).
You then run the Tree-sitter CLI (or build process) to generate a parser.c file (and possibly scanner.c) that formalizes the grammar into C code.
That generated parser becomes the actual runtime component for that language.
Lexer/Tokenization
The generated parser includes a lexer (scanner) component that tokenizes the source code (turning characters into tokens).
In some languages, you may supply a custom external scanner to handle tricky lexing cases (e.g., indent-based blocks, embedded languages) via scanner.c.
Parser Engine (GLR / LR)
The core algorithm is a generalized LR (GLR) parser. GLR means it can handle grammars with some ambiguity and still produce valid parse trees. In simple terms, the parser uses a parse table (states × tokens) to decide shift/reduce actions. The grammar defines precedence/associativity to resolve ambiguities. In addition to traditional LR parsing, Tree-sitter is optimized for incremental operation (see next).
Tree Representation & Node Structure
After parsing, you obtain a Concrete Syntax Tree (CST), which is a graph of nodes representing lexical tokens and syntactic constructs. Nodes carry the source-range (start and end positions) information. Nodes can be named, anonymous (underscore prefix in grammar means "anonymous" so it doesn’t appear in the final tree) to keep the tree cleaner.
Incremental Parsing
A key feature: when the source text changes (e.g., editing in an editor), Tree-sitter avoids re-parsing the whole file. Instead it reuses existing subtrees for unchanged regions and re-parses only the changed region plus a small margin around it.
- Editor notifies parser of changes (range of changed characters, old/new text)
- Parser identifies which nodes’ source ranges are invalidated
- It re-parses the minimal region and re-connects to reused nodes outside that region
- It produces an updated tree with source ranges corrected.
Querying & Tree Walk / API
Once you have a tree, you can run queries (S-expression style) to find sets of nodes matching patterns.
For example, capture all if_statement nodes or function declarations.
The API (C API, plus language bindings) allows you to walk nodes, inspect children, get start/end positions, text, etc3.
The query system is powerful: you can specify patterns, nested structures, predicates (e.g., #eq? @attr_name "class").
Embedding / Use in Editors & Tools
Tree-sitter is designed to be embedded: the parsing library is written in C, and there are bindings in many languages (Rust, JS, Python, etc.)2. Editor plugins (for example nvim‑treesitter for Neovim) use Tree-sitter for syntax highlighting, structural editing, text-objects.
https://tree-sitter.github.io/ "Tree-sitter: Introduction"
https://en.wikipedia.org/wiki/Tree-sitter_%28parser_generator%29 "Tree-sitter (parser generator)"
https://tree-sitter.github.io/tree-sitter/using-parsers/ "Using Parsers - Tree-sitter"
PEP8
Philosophy
Purpose
Provide coding conventions for the Python standard library, to enhance readability and consistency1.
Underlying principle
"Code is read much more often than it is written."
Consistency matters
Within a project > within a module > within a function.
Exceptions permitted
When strictly following the guideline reduces clarity or conflicts with surrounding code.
Encoding
Use UTF-8 encoding for source files (in the core distribution).
Avoid non-ASCII identifiers in standard library modules; if used, limit noisy Unicode characters.
Layout
Indentation
Use 4 spaces per indentation level. Tabs are strongly discouraged. Never mix tabs and spaces.
Line Length
Preferred maximum: 79 characters for code.
For long blocks of text (comments/docstrings): ~72 characters.
Blank Lines and Vertical Whitespace
Insert blank lines to separate top-level functions and classes, and within classes to separate method groups.
Avoid extraneous blank lines within code structure.
Imports
Imports at top of file, after module docstring and before module globals/constants.
Group imports in the following order:
- Standard library imports
- Related third-party imports
- Local application/library-specific imports Insert a blank line between each group.
Absolute imports preferred; explicit relative imports acceptable for intra-package use.
Wildcard imports (from module import *) should be avoided except in rare cases (e.g., to publish a public API).
Whitespace
Avoid extra spaces in the following contexts:
- Immediately inside parentheses, brackets or braces.
- Between a trailing comma and a closing bracket.
- Before a comma, semicolon, or colon.
- More than one space around an assignment operator to align multiple statements (alignment discouraged)
Usage
# Correct:
spam(ham[1], {eggs: 2})
# Avoid:
spam( ham[ 1 ], { eggs: 2 } )
Comments
Good comments improve readability, explain why, not how.
Use full sentences, capitalize first word, leave a space after the #.
Inline comments should be used sparingly and separated by at least two spaces from the statement.
Block comments should align with code indentation and be separated by blank lines where appropriate.
Docstrings
Use triple-quoted strings for modules, functions, classes.
The first line should be a short summary; following lines provide more detail if necessary.
For conventions specific to docstrings see PEP 257 – Docstring Conventions.
Naming Conventions
| Kind | Convention |
|---|---|
| Modules | Short, lowercase, may use underscores |
| Packages | All-lowercase, preferably no underscores |
| Classes | Use CapWords (CamelCase) convention |
| Exceptions | Typically CapWords |
| Functions and methods | Lowercase with underscores (snake_case) |
| Variables | Use lowercase_with_underscores |
| Constants | All UPPERCASE_WITH_UNDERSCORES |
| Private identifiers | One leading underscore _private; name mangling via __double_leading_underscore. |
| Type Vars (in generics) | CapWords |
Avoid single character names like l, O, I (they are easily confused with 1 and 0).
Recommendations
Avoid pointless object wrappers, redundant code; prefer simple, explicit approaches. This matches the ethos "explicit is better than implicit" from The Zen of Python.
When offering interfaces, design them so it is difficult to misuse them (i.e., "avoid programming errors").
Avoid using mutable default arguments in functions.
In comparisons to singletons (e.g., None), use is or is not rather than equality operators.
Exceptions to the Rules
The style guide states that while adherence is recommended, there are legitimate cases for deviation.
Reasons to deviate:
- Strict adherence would reduce readability in context.
- Code must remain consistent with surrounding non-PEP8 code (especially legacy).
- The code predates the rule and rewriting it isn’t justified.
Tooling
Tools exist to help enforce or auto-format code to PEP 8 style (e.g., linters, auto-formatters).
Using such tools helps maintain style consistency especially on teams or open-source projects.
Summary
- Readability and consistency are the primary goals.
- Follow conventions: 4 spaces, line length ~79 chars, snake_case for functions/variables, CapWords for classes, uppercase for constants.
- Imports at top, grouped logically.
- Whitespace matters—used meaningfully, not decoratively.
- Use comments and docstrings effectively: explain why, not how.
- Be pragmatic: if strictly following every rule makes things worse, depart in favour of clarity.
- Use automation tools to assist but don’t treat the guide as dogma—interpret intelligently.
https://peps.python.org/pep-0008/ "PEP 8 – Style Guide for Python Code"
Type Hints (484) & Annotations (585)
PEP 484 - Type Hints
Overview
PEP 484 introduced a standardized system for adding type hints to Python code.
Its goal was not to enforce static typing at runtime but to establish a formal syntax for optional type checking via external tools like mypy, pytype and later Pyright1.
This marked a pivotal moment for Python’s type ecosystem — bridging the gap between dynamic and statically analyzable Python.
It defined the foundations of the typing module and introduced the concept of gradual typing, where type hints coexist with dynamic typing1.
Concepts
Gradual Typing
Type annotations are optional, enabling progressive adoption without breaking existing code.
Type System Syntax
Function signatures, variables, and class members can be annotated using syntax like def greet(name: str) -> str:1.
typing module
Adds classes like List, Dict, Tuple, Optional, Union, Any, Callable1.
Type Checkers
External tools (e.g., mypy) use these annotations for static analysis, error detection, and IDE autocompletion.
Runtime Neutrality
Annotations are stored in __annotations__ and ignored by Python itself; type enforcement is delegated to external tools1.
Motivation
Before PEP 484, large Python projects (e.g., Dropbox, Google) developed internal type systems to manage complexity.
PEP 484 unified these under a common specification inspired by mypy and by research in gradual typing1.
Impact
- Established a shared foundation for static analysis across the ecosystem.
- Enabled downstream standards like PEP 561 (distributable type stubs), PEP 563 (deferred evaluation of annotations), and PEP 604/649 (modernized syntax and semantics).
PEP 585 - Type Hinting Generics in Standard Collections
Overview
PEP 585 streamlined the use of generics by allowing the built-in collection types (e.g., list, dict, set) to be used directly as generic types, replacing typing.List, typing.Dict, etc2.
For example, code such as:
from typing import List
def f(x: List[int]) -> None: ...
can now be written as:
def f(x: list[int]) -> None: ...
Motivation
PEP 484’s design relied on importing type aliases from the typing module. This indirection created redundancy, confusion, and runtime overhead.
By 2020, with from __future__ import annotations and runtime type information improvements, it became viable to use built-ins directly2.
Core Changes
- Built-in classes (
list,dict,tuple,set, etc.) now support subscripting ([]) at runtime. - A new
types.GenericAliasclass is introduced internally to represent these parameterized generics2. - Backwards compatibility preserved —
typing.Listand others remain but are considered deprecated3. - Simplified syntax aligns Python with other typed languages’ ergonomics.
Impact
- Improved readability and ergonomics: Encourages
list[int]overList[int]. - Reduces the mental split between runtime and static type worlds.
- Opens the door for the removal of redundant wrappers in future releases.
Summary
Together, PEP 484 and PEP 585 represent Python’s maturing type system:
- PEP 484 built the scaffolding by defining syntax, semantics, and conventions.
- PEP 585 modernized it by integrating type information natively into Python’s core language model. This reflects a shift from externalized static typing toward first-class optional typing. It preserves Python’s philosophy of flexibility while offering stronger correctness guarantees for large-scale codebases.
https://peps.python.org/pep-0484/ "PEP 484 – Type Hints | peps.python.org"
https://peps.python.org/pep-0585/ "PEP 585 – Type Hinting Generics In Standard Collections"
https://docs.python.org/3/library/typing.html "typing — Support for type hints — Python 3.13.5 documentation"
Distributing and Packaging Python Type Information (.pyi/stubs)
Abstract
PEP 561 establishes a standardized method for distributing and packaging type information in Python. It builds upon PEP 484, addressing the problem of how type information for bboth inline and in separate stub files. Stubs can be discovered, packaged, and used by type checkers across environments.
This allows:
- Package maintainers to declare their code as typed,
- Third parties to publish independent stub packages, and
- Type checkers to resolve imports consistently across mixed environments.
Background
Prior to PEP 561:
- There was no consistent way to distribute typing information with Python packages.
- Stub files had to be manually placed in
MYPYPATHor equivalent. - Community stubs were collected centrally in Typeshed, which became a scalability bottleneck.
The goals are:
- To use existing packaging infrastructure (distutils/setuptools).
- To provide clear markers for type-aware packages.
- To define resolution rules so that tools like mypy, pyright, or pylance can locate and prioritize type information uniformly
PEP 561 recognizes three models: inline-typed, stub-typed, and third-party stub-only packages.
Packaging Type Information
Inline
Inline-typed packages must include a marker file named py.typed inside the package root.
Example setup:
setup(
name="foopkg",
packages=["foopkg"],
package_data={"foopkg": ["py.typed"]},
)
This file signals to type checkers that the package and all its submodules are typed. For namespace packages (PEP 420), the marker should be placed in submodules to avoid conflicts.
Stub-Only
- Stub-only packages contain
.pyifiles without any runtime code. - Naming convention:
foopkg-stubsprovides types forfoopkg. py.typedis not required for these packages.- Version compatibility should be expressed in dependencies (e.g. via
install_requires).
Example layout:
shapes-stubs/
└── polygons/
├── pentagon/__init__.pyi
└── hexagon/__init__.pyi
Partial Stubs
Partial stubs (incomplete libraries) must include partial\n inside py.typed.
These instruct type checkers to:
- Merge the stub directory with the runtime or typeshed directory.
- Continue searching through later steps in the resolution order.
Module Resolution Order
Type checkers must resolve type information using the following ordered search path:
| Priority | Source | Description |
|---|---|---|
| 1 | Manual stubs / MYPYPATH | User-specified patches override all. |
| 2 | User code | The project’s own files. |
| 3 | Stub packages (*-stubs) | Distributed stubs take precedence over inline types. |
| 4 | py.typed packages | Inline or bundled types inside installed packages. |
| 5 | Typeshed | Fallback for stdlib and untyped third-party libs. |
If a stub-only namespace package lacks a desired module, type checkers continue searching through the inline and typeshed steps.
When checking against another Python version, the checker must look up that version’s site-packages path.
Conventions
Library Interface
When py.typed is present:
- All
.pyand.pyifiles are considered importable. - Files beginning with
_are private. - Public symbols are controlled via
__all__.
Valid __all__ idioms include:
__all__ = ['a', 'b']
__all__ += submodule.__all__
__all__.extend(['c', 'd'])
These restrictions allow static determination of public exports by type checkers.
Imports and Re-Exports
Certain import forms signal that an imported symbol should be re-exported as part of the module’s public interface:
import X as X # re-export X
from Y import X as X # re-export X
from Y import * # re-exports __all__ or all public symbols
All other imports are private by default.
Implementation and Tooling
-
mypyimplements full PEP 561 resolution, allowing users to inspect installed package metadata (py.typed, stub presence, etc.). -
Tools like pyright, pylance, and Pytype adopt the same ordering and conventions.
-
Example repositories include:
This design remains fully backward compatible, requiring no changes to Python’s runtime or packaging systems.
Structural Pattern Matching in Python
Structural Pattern Matching extends if/elif logic with declarative, data-shape-based matching.
It allows code to deconstruct complex data structures and branch based on both type and content.
Unlike switch in other languages, pattern matching inspects structure and value, not just equality1.
match command:
case ("move", x, y):
handle_move(x, y)
case ("stop",):
handle_stop()
case _:
print("Unknown command")
Syntax
Basic
match subject:
case pattern_1 if guard_1:
...
case pattern_2 if guard_2:
...
case _:
...
- The
subjectis evaluated once. - Each
casepattern is tested in order. - The first pattern that matches (and whose optional
if guardsucceeds) executes. - The
_pattern matches anything (a wildcard).
Pattern Types
Literal
Match exact constants or values:
case 0 | 1 | 2:
...
case "quit":
...
Multiple literals can be combined with | (OR patterns).
Capture
Assign matched values to variables:
case ("move", x, y):
# binds x and y
⚠️ Names in patterns always bind, they do not compare. To compare to an existing variable, use a value pattern:
case Point(x, y) if x == origin.x:
Sequence
Match list or tuple structure:
case [x, y, z]:
...
case [first, *rest]:
...
Mapping
Match dictionaries:
case {"type": "point", "x": x, "y": y}:
...
Keys are matched literally; missing keys cause no match.
Class
Deconstruct class instances via their attributes or positional parameters:
case Point(x, y):
...
This uses the class’s __match_args__ attribute to define positional fields.
Example:
class Point: __match_args__ = ("x", "y") def __init__(self, x, y): self.x, self.y = x, y
OR
Combine multiple alternatives:
case "quit" | "exit":
...
AS
Bind the entire match while destructuring:
case [x, y] as pair:
...
Wildcard
The _ pattern matches anything and never binds.
Guards (if clauses)
Optional if conditions refine matches:
match point:
case Point(x, y) if x == y:
print("on diagonal")
Guards are evaluated after successful structural match and can use bound names.
Semantics
| Concept | Behavior |
|---|---|
| Evaluation | subject evaluated once; patterns checked in order |
| Binding | Successful match creates new local bindings |
| Failure | Non-matching case continues to next pattern |
| Exhaustiveness | No implicit else; always include case _: for completeness |
| Guards | Boolean expressions using pattern-bound variables |
Examples2
Algebraic Data Types (ADTs)
Pattern matching elegantly models variant data:
class Node: pass
class Leaf(Node): ...
class Branch(Node):
__match_args__ = ("left", "right")
def depth(tree):
match tree:
case Leaf(): return 1
case Branch(l, r): return 1 + max(depth(l), depth(r))
Command Parsing
def process(cmd):
match cmd.split():
case ["load", filename]:
load_file(filename)
case ["quit" | "exit"]:
sys.exit()
case _:
print("Unknown command")
HTTP-like Routing
match (method, path):
case ("GET", "/"):
return homepage()
case ("GET", "/users"):
return list_users()
case ("POST", "/users", data):
return create_user(data)
Design3
Goals
- Provide clarity and conciseness for branching on structured data.
- Support static analysis: patterns are explicit and compositional.
- Encourage declarative code, replacing complex
ifladders.
Why Not Switch?
- Structural, not value-only: matches shape, type, and contents.
- Integrates with Python’s dynamic typing and destructuring capabilities.
Why Not Functions?
While if statements or dispatch tables can emulate simple branching,
pattern matching better communicates intent and is easier to read and verify.
Spec
| Category | Rule |
|---|---|
| Subject types | Any object, including sequences, mappings, and classes |
| Match protocol | For class patterns, Python checks __match_args__ and attributes |
| Sequence match | Requires __len__ and __getitem__ methods |
| Mapping match | Requires .keys() and __getitem__; ignores extra keys |
| Pattern scope | Variables bound within a case are local to that block |
| Evaluation order | Top-to-bottom, left-to-right |
| Errors | SyntaxError for invalid pattern constructs |
Pitfalls
-
Shadowing: Every bare name in a pattern binds, it doesn’t compare:
color = "red" match color: case color: # always matches and binds new variable! ...Use constants or enums instead:
match color: case "red": ... -
Ignoring guards: Guards run after matching, not during expensive side effects inside guards are discouraged.
-
Over-matching: Pattern length must align unless
*restis used.
Tooling
- Linters:
flake8,ruff, andpyrightsupport pattern syntax. - Static analyzers: Type checkers can verify exhaustive matches on enums and dataclasses.
- Refactoring tools: can replace nested
iftrees withmatchstatements.
Usage Patterns
| Use Case | Pattern Example |
|---|---|
| Enum dispatch | case Status.OK: |
| Dataclasses | case Point(x, y): |
| Command tuples | case ("move", x, y): |
| JSON-like dicts | case {"user": name, "id": uid}: |
| Error handling | case {"error": msg} if "fatal" in msg: |
Backwards Compatibility and Evolution
- Introduced in Python 3.104.
- Future extensions may include:
- Better exhaustiveness checking
- Improved IDE refactoring tools
- Expanded type integration for dataclasses and
typingconstructs
Backward-incompatible syntax changes are unlikely; the match semantics are stable.
Summary
Pattern matching provides:
- Declarative branching over structured data
- Readable syntax for destructuring and filtering
- Powerful composition of match conditions and guards
It is not a replacement for if statements. It is a new control structure for expressing shape-based logic cleanly and expressively.
Contributor Testing Guide
The repository contains several layers of tests that keep the formatter, language server, and static analysis features aligned. Run the suites below before submitting formatter or LSP changes.
Formatter
cargo test --package beacon-lsp --test formatting_regression_testsEnd-to-end regression coverage for real-world Python snippets. Recent additions include:test_typevar_assignmentsfor covariant bounds and keyword spacing.test_walrus_operator_patternsandtest_generators_and_yieldfor modern syntax.test_data_science_method_callsandtest_complex_lambda_and_functionalfor keyword-heavy method chains and nested lambda expressions.
cargo test --package beacon-lsp --test formatting_testsUnit tests exercising individual formatting rules (whitespace, imports, doc strings, range formatting, etc.).cargo test --package beacon-lsp --test lsp_formatting_integration_testsValidates document/range formatting via the LSP pipeline.
When debugging a regression, you can run an individual test (for example cargo test --package beacon-lsp --test formatting_regression_tests test_type_annotations_basic) to inspect the formatter output.
Language Server & Analysis
cargo test --package beacon-lspRuns all LSP providers, static analysis (CFG, data-flow, lint rules), and supporting infrastructure. This is the canonical smoke test before opening a pull request.cargo test --workspaceOptional full sweep that executes parser, core utilities, and the CLI in addition to the language server crate.
Guidelines
- Prefer targeted regression tests for every formatting or analysis bug fix.
- Keep tests deterministic: avoid timing assumptions or filesystem-global state.
- If a test case documents a known gap, annotate it with
#[ignore]and file an issue so it can be tracked explicitly. - Mention the exact command you executed when reporting failures; most suites now emit additional context to help diagnose spacing or tokenization issues.
Running the formatter suites plus cargo test --package beacon-lsp provides confidence that contributor changes respect the documented behavior across the entire toolchain.