Introduction

Beacon is an experimental Python type checker and developer experience platform written in Rust. This documentation set describes the architecture, design decisions, and research that power the project. Whether you are contributing to the codebase, evaluating the language server, or exploring the type system, start here to orient yourself.

What You’ll Find

LSP Overview: A deep dive into our Language Server Protocol implementation, including its goals, building blocks, and feature set.
Type System Research: Summaries of the academic and practical references influencing Beacon’s approach to Hindley–Milner inference, gradual typing, and structural subtyping.
Contributor Guides (planned): Setup instructions, style guidelines, and workflows for building and testing Beacon.

Project Vision

Beacon aims to combine precise type checking with interactive tooling that stays responsive for everyday Python development. The project embraces:

Fast feedback loops enabled by incremental analysis.
Interoperability with modern editors via LSP.
A pragmatic blend of theoretical rigor and implementable engineering.

Getting Started

Clone the repository and install Rust 1.70+ (stable).
Run cargo check from the workspace root to verify the build.
Launch the LSP server with cargo run -p beacon-lsp or integrate with an editor using the provided configuration (see the LSP chapter).
Browse the documentation sidebar for in-depth topics.

Contributing

We welcome pull requests and discussions. To get involved:

Review open issues
Read the upcoming contributor guide (work in progress).
Join the conversation in our community channels (details to be added).

Beacon is evolving quickly; expect iteration, experimentation, and plenty of opportunities to help shape the future of type checking for Python.

Beacon Language Server

Beacon’s Language Server Protocol (LSP) implementation bridges the Rust-based analyzer with editors such as VS Code, Neovim, and Helix. This chapter documents the system from high-level goals to feature-by-feature behaviour.

Use the sidebar to jump into any topic, or start with the sections below:

Goals And Scope - what the server delivers today and what is intentionally out of scope.
Architecture Overview - how shared state, concurrency, and feature wiring are structured.
Document Pipeline - how file contents become parse trees, ASTs, and symbol tables.
Feature Providers - the capabilities exposed via LSP requests and notifications.
Request Lifecycles - end-to-end flows for initialization, diagnostics, completions, and more.
Workspace Services - cross-file features and emerging workspace indexing plans.
Testing Strategy - automated coverage for providers and backend flows.
Current Limitations - known gaps and trade-offs in the current implementation.
Next Steps - near-term improvements on the roadmap.

If you are new to the Language Server Protocol itself, read the primer in Learn → Language Server Protocol before diving into these implementation details.

Goals And Scope

Beacon’s LSP focuses on delivering a fast, editor-friendly surface for the Beacon analyzer without overcommitting to unfinished infrastructure. The current goals fall into five themes.

Primary Goals

Immediate feedback: run parsing and type analysis on every edit so diagnostics stay in sync with the buffer.

Core navigation: support hover, go-to-definition, references, and symbol search for rapid code exploration.

Authoring assistance: provide completions, document symbols, inlay hints, and semantic tokens to guide editing.

Refactoring primitives: offer reliable rename support and lay the groundwork for richer code actions.

Modular design: isolate feature logic behind provider traits so contributors can evolve features independently.

Out-of-Scope (For Now)

Full workspace indexing: we limit operations to open documents until indexing and cache management mature.
Formatting and linting: formatting endpoints and lint integrations are planned but not part of the initial release.
Editor-specific UX: we stick to LSP-standard capabilities instead of bespoke VS Code UI components.
Heavy configuration: configuration parsing is minimal; user options will be respected in a future milestone.

Architecture Overview

The language server lives in crates/server and centres on the Backend type, which implements tower_lsp::LanguageServer. The architecture is deliberately modular so feature work and analyzer development can proceed in parallel.

Core Components

Backend: receives every LSP request/notification and routes it to feature providers. It owns the shared state required by multiple features.
Client (tower_lsp::Client): handles outbound communication, including diagnostics, logs, and custom notifications.
DocumentManager: thread-safe cache of open documents. Each Document stores:
- Source text (ropey::Rope for cheap edits).
- Tree-sitter parse tree.
- Beacon AST.
- Symbol table produced by the name resolver.
Analyzer: the Beacon type checker wrapped in an Arc<RwLock<_>> because many features need mutable access to its caches.
Workspace: tracks the workspace root URI and will later manage module resolution and indexing.
Features: a simple struct that instantiates each provider with shared dependencies and exposes them to the backend.

Concurrency Model

tower_lsp::LspService drives the backend on the Tokio runtime.

Read-heavy operations borrow documents or analyzer state immutably; diagnostics and rename take write locks to update caches.

Documents store text in a ropey::Rope, so incremental edits only touch the modified spans.

Error Handling

Feature methods typically return Option<T>: None means the feature has no answer for the request rather than hard-failing.

When unrecoverable errors occur (e.g., document not found), providers log via the client instead of crashing the server process.

Extensibility

Adding a new LSP method involves creating a provider (or extending an existing one) and exposing it through the Features struct.

Because providers depend only on DocumentManager and optionally the analyzer, they are easy to test in isolation.

This architecture keeps protocol plumbing concentrated in the backend while feature logic stays modular and testable.

Document Pipeline

The document pipeline keeps Beacon’s view of each open file synchronized with the editor. DocumentManager orchestrates the lifecycle and ensures every feature works from the same parse tree, AST, and symbol table.

Lifecycle Events

Open (textDocument/didOpen)
- Create a Document with the initial text, version, and URI.
- Parse immediately via LspParser to populate the parse tree, AST, and symbol table.
- Insert the document into the manager’s map.
Change (textDocument/didChange)
- Apply full or incremental edits to the document’s rope.
- Re-run the parser to refresh derived data.
- Invalidate analyzer caches so diagnostics and semantic queries recompute with fresh information.
Save (textDocument/didSave)
- Trigger diagnostics for the new persisted content. Behaviour matches the change handler today.
Close (textDocument/didClose)
- Remove the document and send an empty diagnostics array to clear markers in the editor.

Data Stored per Document

Text: stored as a ropey::Rope for efficient splicing.

Parse tree: Tree-sitter syntax tree produced by the parser.

AST: Beacon’s simplified abstract syntax tree used by features and the analyzer.

Symbol table: scope-aware mapping created during name resolution.

Version: latest client-supplied document version, echoed back when publishing diagnostics.

Access Patterns

get_document: exposes an immutable snapshot to consumers like hover or completion.

get_document_mut: allows controlled mutation when necessary (rare in practice).

all_documents: lists URIs so workspace-level features can iterate through open files.

By centralizing parsing and symbol management, the pipeline guarantees consistent snapshots across diagnostics, navigation, and refactoring features.

Feature Providers

Each capability exposed by the language server lives in its own provider under crates/server/src/features. Providers share the DocumentManager and, when needed, the analyzer. This modular design keeps logic focused and testable.

Diagnostics

DiagnosticProvider aggregates:

Parse errors emitted by the parser.
Unbound variable checks.
Type errors and warnings from the analyzer.
Additional semantic warnings (e.g., annotation mismatches).

Results are published with document versions to prevent stale diagnostics in the editor.

Hover

HoverProvider returns context-sensitive information for the symbol under the cursor—typically inferred types or documentation snippets. It reads the current AST and analyzer output to assemble Hover responses.

Completion

CompletionProvider uses symbol tables to surface in-scope identifiers. Trigger characters (currently ".") allow editors to request completions proactively.

GotoDefinitionProvider locates definitions using symbol table lookups.

ReferencesProvider returns all occurrences of a symbol across open documents.

DocumentHighlightProvider highlights occurrences within a single file.

Symbols

DocumentSymbolsProvider walks the AST to produce hierarchical outlines (classes, functions, variables).

WorkspaceSymbolsProvider scans all open documents, performing case-insensitive matching. It falls back to sensible defaults when nested symbols are missing from the symbol table.

Semantic Enhancements

SemanticTokensProvider projects syntax nodes into semantic token types and modifiers, enabling advanced highlighting.

InlayHintsProvider emits type annotations or other inline hints derived from the analyzer.

Refactoring

RenameProvider validates proposed identifiers, gathers edits via both AST traversal and Tree-sitter scans, deduplicates overlapping ranges, and returns a WorkspaceEdit. Future refinements will integrate deeper analyzer data for cross-file renames.

Code Actions

CodeActionsProvider is scaffolded for quick fixes. It currently returns empty results until specific code actions are implemented.

Adding new features typically means introducing a provider that consumes DocumentManager, optionally the analyzer, and wiring it through the Features struct so the backend can route requests.

Request Lifecycles

This section traces how the server handles key LSP interactions from start to finish.

Initialization

initialize request
- Captures the workspace root (root_uri) from the client.
- Builds ServerCapabilities, advertising supported features: incremental sync, hover, completion, definitions, references, highlights, code actions, inlay hints, semantic tokens (full & range), document/workspace symbols, rename, and workspace symbol resolve.
- Returns InitializeResult with optional ServerInfo.
initialized notification
- Currently logs an info message. Future work will kick off workspace scanning or indexing.

Text Synchronization & Diagnostics

didOpen → store the document, parse it, and call publish_diagnostics.

didChange → apply edits, reparse, invalidate analyzer caches, then re-run diagnostics.

didSave → trigger diagnostics again; behaviour matches the change handler.

didClose → remove the document and publish empty diagnostics to clear markers.

publish_diagnostics collects issues via DiagnosticProvider, tagging them with the current document version to avoid race conditions.

hover → query HoverProvider, which reads the AST and analyzer to produce Hover content.

completion → call CompletionProvider, returning a CompletionResponse (list or completion list).

gotoDefinition, typeDefinition, references, documentHighlight → use symbol table lookups to answer navigation requests.

These operations are pure reads when possible, avoiding locks beyond short-lived document snapshots.

Symbols

documentSymbol → returns either DocumentSymbol trees or SymbolInformation lists.

workspace/symbol → aggregates symbols from every open document, performing case-insensitive matching.

workspaceSymbol/resolve → currently a no-op passthrough; in future it will supplement symbols with locations on demand.

Semantic Tokens & Inlay Hints

textDocument/semanticTokens/full and /range → run the semantic tokens provider to emit delta-encoded token sequences for supported types/modifiers.

textDocument/inlayHint → acquire a write lock on the analyzer and compute inline hints for the requested range.

Refactoring

textDocument/rename → validate the new identifier, locate the target symbol, collect edits (AST traversal + Tree-sitter identifiers), deduplicate, and return a WorkspaceEdit.

textDocument/codeAction → placeholder; currently returns an empty list until specific actions are implemented.

Shutdown

shutdown returns Ok(()), signalling graceful teardown.

exit follows to terminate the process. We do not persist state yet, so shutdown is effectively stateless.

Workspace Services

While most features operate on individual documents, Beacon’s language server already supports several cross-file capabilities and is laying groundwork for broader workspace awareness.

Workspace Symbols

Iterates over URIs retrieved from DocumentManager::all_documents.

For each document, fetches the AST and symbol table, then performs case-insensitive matching against the query string.

Returns SymbolInformation with ranges, optional container names, and deprecation tags (SymbolTag::DEPRECATED where applicable).

Falls back to reasonable defaults when nested symbols (e.g., class methods) are missing from the symbol table.

Document Symbols

Provides structured outlines per file, organising classes, functions, assignments, and nested items.

Editors use the resulting tree to populate outline panes, breadcrumbs, or navigation search.

Workspace State

The Workspace struct records the root_uri supplied during initialization.
Future enhancements will:
- Crawl the filesystem to discover modules outside the current editor session.
- Populate caches for unopened files, enabling cross-file references and renames.
- Track configuration and environment settings at the workspace level.

Notifications and Logging

The backend emits window/logMessage notifications for status updates and window/showMessage for user-facing alerts.

Diagnostics are republished after changes so editors update their inline markers and problems panels.

Long-Term Plans

Implement persistent symbol indexing keyed by the workspace root.

Add background tasks that refresh indexes when files change on disk.

Support multi-root workspaces and remote filesystems where applicable.

Although the current implementation focuses on open buffers, the architecture is designed to scale to full-project workflows as these enhancements land.

PyDoc Retrieval

The language server enriches hover and completion items for third-party Python packages by executing a short-lived Python subprocess to read real docstrings and signatures from the user's environment.

Interpreter Discovery

find_python_interpreter in crates/server/src/interpreter.rs walks common virtual environment managers (Poetry, Pipenv, uv) before falling back to python on the PATH. Each probe shells out (poetry env info -p, pipenv --venv, uv python find) and returns the interpreter inside the virtual environment when successful. The search runs per workspace and only logs at debug level on success. Missing tools or failures are tolerated—only a final warn! is emitted if no interpreter can be located. Interpreter lookups currently rely on external commands and inherit their environment; this will eventually be an explicit path via LSP settings.

Introspection Flow

When a hover needs documentation for module.symbol, we call introspect in crates/server/src/introspection.rs with the discovered interpreter. introspect constructs a tiny Python script that imports the target module, fetches the attribute, and prints two sentinel sections: SIGSTART (signature) and DOCSTART (docstring). The async path spawns tokio::process::Command, while introspect_sync uses std::process::Command. Both share parsing logic via parse_introspection_output. The script uses inspect.signature and inspect.getdoc, so it respects docstring inheritance and returns cleaned whitespace. Failures to inspect still return whatever data is available.

Parsing and Error Handling

Results are parsed by scanning for the sentinel lines and trimming the sections, yielding an IntrospectionResult { signature, docstring }. Timeouts (3 seconds) protect the async path from hanging interpreters. Other errors—missing module, attribute, or import failure—come back as IntrospectionError::ExecutionFailed with the stderr payload for debugging. We log subprocess stderr on failure but avoid surfacing internal exceptions directly to the client.

Testing Guarantees

Unit tests cover the parser, confirm the generated script embeds the sentinels, and run best-effort smoke tests against standard library symbols when a Python interpreter is available. Tests skip gracefully if Python cannot be located, keeping CI green on machines without Python.

Static Analyzer

Beacon's language server leans on a modular static-analysis stack housed in crates/server/src/analysis. The subsystem ingests a parsed document, infers types, builds control-flow graphs, and produces diagnostics that drive editor features like hovers and squiggles. The sections below highlight the moving pieces without diving into implementation minutiae.

Pipeline Overview

Analyzer::analyze is the high-level orchestration point:

Grab a consistent AST + symbol table snapshot from the DocumentManager.
Walk the tree with a TypeEnvironment to emit lightweight constraints that describe how expressions relate to each other.
Invoke the shared beacon_core unifier to solve those constraints, capturing any mismatches as TypeErrorInfo.
Build function-level control-flow graphs and run data-flow passes to uncover use-before-def, unreachable code, and unused symbols.
Package the inputs, inferred data, and diagnostics into an AnalysisResult, which the cache stores per URI for quick repeat lookups.

Hover/type-at-position still rely on a future type_map, but the analyzer already produces the substitution data required to implement it.

Type Inference in Brief

type_env.rs supplies the Hindley–Milner style environment that powers constraint generation. It seeds built-in symbols, hydrates annotations, and hands out fresh type variables whenever the AST does not provide one. Each visit to a FunctionDef, assignment, call, or control-flow node updates the environment and records the relationships that must hold; the actual solving is deferred so the analyzer can collect all obligations before touching the unifier. This keeps the pass linear, side-effect free, and easy to extend with new AST constructs.

Once constraints reach solve_constraints, they are unified in order. Successful unifications compose into a substitution map, while failures persist with span metadata so editor clients can render precise diagnostics. Attribute constraints are stubbed out for now, leaving room for structural typing or row-polymorphic records later.

Control & Data Flow

cfg.rs and data_flow.rs provide the structural analyses that complement pure typing:

The CFG builder splits a function body into BasicBlocks linked by typed edges (normal flow, branch outcomes, loop exits, exception edges, etc.), mirroring Python semantics closely enough for downstream passes to reason about reachability.
The data-flow analyzer consumes that graph plus the original AST slice to flag common hygiene issues: variables read before assignment, code that cannot execute, and symbols that never get used. Results surface through DataFlowResult and end up in the final AnalysisResult.

This layered approach lets the LSP report both type-level and flow-level problems in a single request, keeping feedback tight while avoiding duplicate walks of the AST.

Diagnostics, Utilities, and Future Work

Beyond inference and CFG analysis, the module exposes helpers for locating unbound identifiers, invalidating cached results when documents change, and bridging between symbol-table scopes and LSP positions. Outstanding work includes filling the type_map for hover support, tightening attribute/type-guard handling, and scaling CFG/data-flow analysis to whole modules. The current design purposefully isolates each responsibility so new passes (e.g., constant propagation) can slot in without reworking the rest of the stack.

Beacon Linter

The Beacon Rule Engine is a modular static analysis system powering diagnostics in Beacon.

It's foundationally a pure Rust implementation of PyFlakes.

Legend: ⚠ = Warning ✕ = Error ⓘ = Info

Code	Name / RuleKind	Level	Category	Description
BEA001	`UndefinedName`	✕	Naming	Variable or function used before being defined.
BEA002	`DuplicateArgument`	✕	Functions	Duplicate parameter names in a function definition.
BEA003	`ReturnOutsideFunction`	✕	Flow	`return` statement outside of a function or method body.
BEA004	`YieldOutsideFunction`	✕	Flow	`yield` or `yield from` used outside a function context.
BEA005	`BreakOutsideLoop`	✕	Flow	`break` used outside a `for`/`while` loop.
BEA006	`ContinueOutsideLoop`	✕	Flow	`continue` used outside a `for`/`while` loop.
BEA007	`DefaultExceptNotLast`	⚠	Exception	A bare `except:` is not the final exception handler in a `try` block.
BEA008	`RaiseNotImplemented`	⚠	Semantics	Using `raise NotImplemented` instead of `raise NotImplementedError`.
BEA009	`TwoStarredExpressions`	✕	Syntax	Two or more `*` unpacking expressions in assignment.
BEA010	`TooManyExpressionsInStarredAssignment`	✕	Syntax	Too many expressions when unpacking into a starred target.
BEA011	`IfTuple`	⚠	Logic	A tuple literal used as an `if` condition — always `True`.
BEA012	`AssertTuple`	⚠	Logic	Assertion always true due to tuple literal.
BEA013	`FStringMissingPlaceholders`	⚠	Strings	f-string declared but contains no `{}` placeholders.
BEA014	`TStringMissingPlaceholders`	⚠	Strings	t-string declared but contains no placeholders.
BEA015	`UnusedImport`	⚠	Symbols	Import is never used within the file.
BEA016	`UnusedVariable`	⚠	Symbols	Local variable assigned but never used.
BEA017	`UnusedAnnotation`	⚠	Symbols	Annotated variable never referenced.
BEA018	`RedefinedWhileUnused`	⚠	Naming	Variable redefined before original was used.
BEA019	`ImportShadowedByLoopVar`	⚠	Scope	Import name shadowed by a loop variable.
BEA020	`ImportStarNotPermitted`	✕	Imports	`from module import *` used inside a function or class.
BEA021	`ImportStarUsed`	⚠	Imports	`import *` prevents detection of undefined names.
BEA022	`UnusedIndirectAssignment`	⚠	Naming	Global or nonlocal declared but never reassigned.
BEA023	`ForwardAnnotationSyntaxError`	✕	Typing	Syntax error in forward type annotation.
BEA024	`MultiValueRepeatedKeyLiteral`	⚠	Dict	Dictionary literal repeats key with different values.
BEA025	`PercentFormatInvalidFormat`	⚠	Strings	Invalid `%` format string.
BEA026	`IsLiteral`	⚠	Logic	Comparing constants with `is` or `is not` instead of `==`/`!=`.
BEA027	`DefaultExceptNotLast`	⚠	Exception	Bare `except:` must appear last.
BEA028	`UnreachableCode`	⚠	Flow	Code after a `return`, `raise`, or `break` is never executed.
BEA029	`RedundantPass`	ⓘ	Cleanup	`pass` used in a block that already has content.
BEA030	`EmptyExcept`	⚠	Exception	`except:` with no handling code (silent failure).

Rules

Planned

Name	Kind	Category	Severity	Rationale
Mutable Default Argument	`MutableDefaultArgument`	Semantic	✕	Detect functions that use a mutable object (e.g., `list`, `dict`, `set`) as a default argument.
Return in Finally	`ReturnInFinally`	Flow	✕	Catch a `return`, `break`, or `continue` inside a `finally` block: this often suppresses the original exception and leads to subtle bugs.
For-Else Without Break	`ForElseWithoutBreak`	Flow	⚠	The `for ... else` construct where the `else` never executes a `break` is confusing and often mis-used. If you have `else:` on a loop but never `break`, you may signal confusing logic.
Wrong Exception Caught	`BroadExceptionCaught`	Exception	⚠	Catching too broad exceptions (e.g., `except Exception:` or `except:`) instead of specific types can hide bugs. You already have empty except; this expands to overly broad catching.
Inconsistent Return Types	`InconsistentReturnTypes`	Function	⚠	A function that returns different types on different paths (e.g., `return int` in one branch, `return None` in another) may lead to consuming code bugs especially if not annotated.
Index / Key Errors Likely	`UnsafeIndexOrKeyAccess`	Data	⚠	Detect patterns that likely lead to `IndexError` or `KeyError`, e.g., accessing list/dict without checking length/keys, especially inside loops.
Unused Coroutine / Async Function	`UnusedCoroutine`	Symbol	⚠	In async code: a `async def` function is defined but neither awaited nor returned anywhere — likely a bug.
Resource Leak / Unclosed Descriptor	`UnclosedResource`	Symbol	⚠	Detect file or network resource opened (e.g., `open(...)`) without being closed or managed via context manager (`with`).
Logging Format String Errors	`LoggingFormatError`	String	⚠	Using `%` or f-string incorrectly in logging calls (e.g., logging format mismatches number of placeholders) can cause runtime exceptions or silent failures.
Comparison to None Using == / !=	`NoneComparison`	Logic	⚠	Discourage `== None` or `!= None` in favor of `is None` / `is not None`.

Testing Strategy

Beacon’s LSP crate includes both unit tests and async integration tests to ensure feature behaviour remains stable as the analyzer evolves.

Provider Unit Tests

Each feature module embeds targeted tests that construct in-memory documents via DocumentManager::new().

Common scenarios include rename edits across nested scopes, workspace symbol searches, and diagnostic generation for simple errors.

Because providers operate on real ASTs and symbol tables, these tests exercise production logic without needing a running language server.

Backend Integration Tests

Async tests spin up an in-process tower_lsp::LspService<Backend> to simulate client interactions.

They call methods like initialize, did_open, did_change, hover, and completion, asserting that responses match expectations and no panics occur.

This pattern verifies protocol wiring, capability registration, and shared state management without external tooling.

Command-line Checks

cargo check and cargo check --tests are run frequently for quick feedback.

cargo fmt --check enforces formatting consistency across Rust code.

Documentation changes are validated with mdbook build docs to catch broken links or syntax errors.

Current Limitations

The Beacon language server already covers core workflows but still has notable constraints. Understanding these limitations helps set expectations for contributors and users.

Open-Document Focus

Most features only inspect documents currently open in the editor.

Closed files are invisible until workspace indexing is implemented, so cross-project references or renames may miss targets.

Analyzer Coupling

Rename and references rely on a mix of AST traversal and simple heuristics; deep semantic queries across modules are not yet available.

Analyzer caches are invalidated wholesale after edits. Incremental typing work is on the roadmap but not implemented.

Performance

Tree-sitter reparses the entire document per change. While acceptable for small files, large modules may benefit from incremental parsing.

Workspace symbol searches iterate synchronously over all open documents, which can lag in large sessions.

Feature Gaps

Code actions return placeholders; no concrete quick fixes ship yet.

Formatting endpoints (textDocument/formatting, etc.) are unimplemented.

Configuration (Config) is still a stub and does not honour user settings.

Tooling Ergonomics

Error messages from the analyzer can be terse; improving diagnostics and logs is part of future work.

There is no persistence of analysis results across sessions, so large projects require recomputation on startup.

Next Steps

The following projects are planned to evolve Beacon’s language server from a solid MVP into a full-featured development companion.

Analyzer Integration

Tighten the connection between the LSP and analyzer so rename, references, and completions can operate across modules.

Cache analyzer results to avoid repeated full reanalysis after every edit.

Surface richer hover information (e.g., inferred types with provenance, docstrings).

Workspace Indexing

Build a background indexer that scans the workspace root, populating symbol data for unopened files.

Add file watchers to refresh indexes when on-disk files change outside the editor.

Support multi-root workspaces and remote development scenarios.

Tooling Enhancements

Implement formatting (textDocument/formatting, rangeFormatting) and integrate with Beacon’s formatting rules.

Deliver concrete code actions (e.g., quick fixes for undefined variables, import suggestions).

Extend semantic tokens with modifier support (documentation, deprecated symbols) and align with editor theming.

Performance & Reliability

Adopt Tree-sitter’s incremental parsing to reduce reparse costs for large files.

Improve logging and telemetry so users can diagnose performance issues or protocol errors.

Harden handling of unexpected client input, ensuring the server degrades gracefully.

Documentation & Ecosystem

Publish editor-specific setup guides (VS Code, Neovim, Helix, Zed) alongside troubleshooting tips.

Automate documentation deployment (see deploy-docs workflow) and version docs with releases.

Encourage community extensions by documenting provider APIs and expected invariants.

VS Code Extension

The Beacon VS Code extension (pkg/vscode/) pairs the Rust language server with the VS Code UI. It activates automatically for Python files and forwards editor requests to the Beacon LSP binary.

Feature Highlights

On-type diagnostics for syntax and type errors
Hover tooltips with type information
Go to definition & find references
Document and workspace symbols
Semantic tokens for enhanced highlighting
Identifier completions and inlay hints
(Scaffolded) code actions for quick fixes

These capabilities mirror the features exposed by the Rust server in crates/server.

Repository Layout

pkg/vscode/
├── client/                 # TypeScript client that binds to VS Code APIs
│   ├── src/extension.ts    # Extension entry point; starts the LSP client
│   └── src/test/           # End-to-end tests using the VS Code test runner
├── package.json            # Extension manifest (activation, contributions)
├── tsconfig.json           # TypeScript project references
├── eslint.config.js        # Lint configuration
└── dprint.json             # Formatting config for client sources

The client launches the Beacon server binary from target/debug/beacon-lsp (or target/release/beacon-lsp if present). Ensure one of these binaries exists before activating the extension.

Prerequisites

Rust toolchain (stable) with cargo available in PATH
Node.js 18+ (aligned with current VS Code requirements)
pnpm for dependency management Install globally with npm install -g pnpm
VS Code ≥ 1.100 (see package.json engines field)
(Optional) vsce or ovsx for packaging/publishing

Installing Dependencies

From the repository root:

pnpm install

This installs dependencies for all packages, including the VS Code extension.

Building The Extension Client

The extension compiles TypeScript into client/out/:

pnpm --filter beacon-lsp compile

For iterative development, run:

pnpm --filter beacon-lsp watch

This keeps the TypeScript project in watch mode so recompiles happen automatically after you edit client files.

Building The Beacon LSP Server

The client resolves the server binary relative to the repository root:

target/debug/beacon-lsp    (default)
target/release/beacon-lsp  (used if available)

Build the server before launching the extension:

cargo build -p beacon-lsp              # debug binary
# or
cargo build -p beacon-lsp --release    # release binary

Running In VS Code

Open pkg/vscode in VS Code.
Select the Run and Debug panel and choose the Beacon LSP launch configuration (provided in .vscode/launch.json).
Press F5 to start the Extension Development Host.
In the new window, open a Python file (the repository’s samples/ directory is a good starting point).

The launch configuration compiles the TypeScript client and relies on the previously built Rust binary. In debug mode, RUST_LOG=beacon_lsp=debug is set automatically so server logs appear in the “Beacon LSP” output channel.

Configuration

The extension contributes a single user/workspace setting:

Setting	Values	Description
`beacon.trace.server`	`off` \| `messages` \| `verbose`	Controls JSON-RPC tracing between VS Code and the Beacon server.

Enable messages or verbose while debugging protocol issues; traces are written to the “Beacon LSP” output channel.

Packaging & Publishing

Ensure the client is built (pnpm --filter beacon-lsp compile) and the server release binary exists (cargo build -p beacon-lsp --release).
From pkg/vscode, run vsce package (or ovsx package) to produce a .vsix.
Publish the package with vsce publish or ovsx publish once authenticated.

The generated .vsix expects the server binary to be shipped alongside the extension or obtainable on the user’s machine. Adjust extension.ts if you plan to bundle the binary differently.

Troubleshooting

Extension activates but features are missing: confirm the beacon-lsp binary exists in target/debug or target/release.
No diagnostics shown: check the “Beacon LSP” output channel for JSON-RPC logs and run the server manually (cargo run -p beacon-lsp) to ensure it starts cleanly.
TypeScript errors: rerun pnpm --filter beacon-lsp compile and ensure dependencies are installed.
Protocol tracing: set beacon.trace.server to verbose and inspect the output to verify requests/responses.

With the extension compiled and the Rust server built, you can iterate quickly—edit the TypeScript client, rebuild with pnpm watch, and reload the Extension Development Host (Developer: Reload Window) to pick up changes.

Research

Reading List

Theory

Hindley–Milner Type Inference

Principal Type-Schemes for Functional Programs - https://doi.org/10.1145/582153.582176
Types and Programming Languages (2002), ch. 22-24
Implementing a Hindley–Milner Type Inference - https://smunix.github.io/dev.stephendiehl.com/fun/006_hindley_milner.html
Typing Haskell in Haskell - https://web.cecs.pdx.edu/~mpj/pubs/thih.html
"Typed Racket: Gradual Typing for Dynamic Languages"
TypeScript Specification - 2–4 (structural subtyping)
PEP 544 - Protocols: Structural subtyping in Python

Implementation-Level Concepts

Tree-sitter docs: https://tree-sitter.github.io/tree-sitter/
"Rust for Rustaceans"
The Rustonomicon - 3 (Type Layout & Lifetimes)
https://jeremymikkola.com/posts/2019_01_01_type_inference_intro.html
MyPy design docs: https://mypy.readthedocs.io/en/stable/internal.html
PyRight internals (analyzer.py)
Expert F# 5.0 (Ch. 9–10).
TypeScript Compiler (specifically checker.ts)

Hindley–Milner Type Systems

Hindley–Milner (HM) is the classical polymorphic type system that powers languages such as ML, OCaml, and early versions of Haskell. It strikes a balance between expressiveness (parametric polymorphism) and tractable, annotation-free type inference.

Overview

Parametric polymorphism: functions can operate uniformly over many types without runtime overhead¹.

Type inference: the compiler deduces the most general (principal) type scheme for each expression¹.

Declarative typing judgment: The typing judgment \(\Gamma \vdash e : \sigma\) relates a context \( \Gamma \), an expression \( e \), and a type scheme \( \sigma \).

The result is a system where generic programs remain statically typed without drowning the developer in annotations.

Core Concepts

Why HM?

\(\lambda\)-calculus requires explicit annotations to achieve polymorphism. HM extends the calculus with let-polymorphism and carefully restricted generalization so that inference stays decidable and efficient.

Monotypes vs Polytypes

Monotypes (\(\tau\)): concrete types such as \(\alpha\), \(\text{Int} \to \text{Bool}\), or constructor applications \(C,\tau_1\cdots\tau_n\)².

Polytypes / type schemes (\(\sigma\)): quantifications over monotypes, e.g. \(\forall \alpha.,\alpha \to \alpha\).

Principal type: every well-typed expression has a unique (up to renaming) most general type scheme from which all other valid typings can be instantiated¹.

Generalization and Instantiation

Generalization: close a monotype over the free type variables not present in the environment.

Instantiation: specialise a polytype by substituting quantified variables with fresh monotype variables.

Let-Polymorphism

Only let-bound definitions are generalized. Lambda parameters remain monomorphic in HM; this restriction is critical to keep inference decidable¹.

Formal Skeleton

Syntax

e ::= x
    | λ x. e
    | e₁ e₂
    | let x = e₁ in e₂

The associated type grammar and typing environments are:

\[ \begin{aligned} \tau &::= \alpha \mid C(\tau_1,\dots,\tau_n) \mid \tau \to \tau \ \sigma &::= \tau \mid \forall \alpha.,\sigma \ \Gamma &::= \emptyset \mid \Gamma, x : \sigma \end{aligned} \]

Typing Rules

Typing judgments take the form \(\Gamma \vdash e : \sigma\). Core rules include:

\[ \frac{x : \sigma \in \Gamma}{\Gamma \vdash x : \sigma} \quad\text{(Var)} \]

\[ \frac{\Gamma, x : \tau \vdash e : \tau'}{\Gamma \vdash \lambda x.,e : \tau \to \tau'} \quad\text{(Abs)} \]

\[ \frac{\Gamma \vdash e_0 : \tau \to \tau' \qquad \Gamma \vdash e_1 : \tau}{\Gamma \vdash e_0,e_1 : \tau'} \quad\text{(App)} \]

\[ \frac{\Gamma \vdash e_0 : \sigma \qquad \Gamma, x : \sigma \vdash e_1 : \tau}{\Gamma \vdash \text{let } x = e_0 \text{ in } e_1 : \tau} \quad\text{(Let)} \]

\[ \frac{\Gamma \vdash e : \sigma' \qquad \sigma' \sqsubseteq \sigma}{\Gamma \vdash e : \sigma} \quad\text{(Inst)} \]

\[ \frac{\Gamma \vdash e : \sigma \qquad \alpha \notin \mathrm{free}(\Gamma)}{\Gamma \vdash e : \forall \alpha.,\sigma} \quad\text{(Gen)} \]

Here \(\sigma' \sqsubseteq \sigma\) means that \(\sigma'\) is an instance of \(\sigma\) (obtained by instantiating quantified variables)¹.

Algorithm W (Inference Sketch)

Algorithm W is the archetypal inference engine for HM³.

Annotate sub-expressions with fresh type variables.
Collect constraints when traversing the AST (especially from applications).
Unify constraints to solve for unknown types.
Generalize at each let by quantifying over variables not free in the environment.
Return the principal type scheme produced by the substitutions.

Typical programs are handled in near-linear time, although the theoretical worst case is higher¹.

Strengths and Limitations

Strengths

Minimal annotations with strong static guarantees.

Principled parametric polymorphism with predictable runtime behaviour.

A deterministic, well-understood inference algorithm.

Limitations

No native subtyping; adding it naively renders inference undecidable¹.

Higher-rank polymorphism (e.g., passing polymorphic functions as arguments) requires extensions that typically sacrifice automatic inference.

Recursive bindings and mutation demand additional care to avoid unsound generalization.

Extensions: Type Classes

Many ML-derived languages extend HM with type classes to model constrained polymorphism⁴. Type classes capture ad-hoc behavior (equality, ordering, pretty-printing) without abandoning the core inference model.

Motivation

Developers often need functions that work only for types supporting specific operations (equality, ordering, etc.).

Type classes describe those obligations once and then allow generic code to depend on them declaratively.

Integration with HM

A type class \(C\) packages a set of operations. A type \(T\) becomes an instance of \(C\) by providing implementations.

Type schemes gain constraint contexts, e.g. \(\forall a.,(Eq,a) \Rightarrow a \to a\), read as “for all \(a\) that implement Eq, this function maps \(a\) to \(a\)”.

Environments track both type bindings and accumulated constraints, written informally as \(\Gamma \vdash e : \sigma \mid \Delta\).

During generalization, constraints that do not mention the generalized variables can be abstracted over; during instantiation, remaining constraints must be satisfied (dictionary passing, instance resolution, etc.).

Type classes preserve type safety while keeping user code concise, but introduce design questions about coherence (no conflicting instances), instance search termination, and tooling ergonomics.

Extensions: Higher-Rank Types

Higher-rank polymorphism allows universal quantifiers to appear inside function arguments, enabling functions that consume polymorphic functions⁵.

HM is rank-1: all \(\forall\) quantifiers appear at the outermost level.

Why Higher Rank?

Certain abstractions require accepting polymorphic functions as arguments, e.g.

applyTwice :: (forall a. a -> a) -> Int -> Int
applyTwice f x = f (f x)

HM cannot express this because the quantifier lives to the left of an arrow. Extending to rank-2 (or higher) types unlocks APIs like runST :: ∀a.(∀s. ST s a) -> a⁶.

Typing Considerations

The grammar generalizes to allow quantified types within arrow positions; checking such programs typically relies on bidirectional type checking⁷.

Full type inference for arbitrary rank is undecidable; practical compilers require annotations or rely on heuristics⁸.

Despite the cost, higher-rank types enable powerful encapsulation patterns and stronger invariants.

Design Trade-offs

Pros: Expressiveness for APIs manipulating polymorphic functions; better information hiding (e.g., ST).

Cons: Additional annotations, more complex error messages, heavier implementation burden.

Language Server Protocol

Why LSP Exists

Before LSP, editor integrations for language tooling (completion, diagnostics, refactors) were bespoke. Every compiler or analyzer needed plug-ins for VS Code, Vim, IntelliJ, Sublime, etc., and each editor duplicated work to support many languages. This matrix of per-language, per-editor plug-ins slowed innovation and made advanced tooling inaccessible outside first-party IDEs.

The Language Server Protocol—initiated by Microsoft for VS Code and now standardized by the Open Source community—solves this coupling. It defines a JSON-RPC protocol so a single language server can speak to any compliant editor. Editors implement the client half once and gain tooling support for every language that implements the server half.

Problems It Solves

Shared investment: Language teams implement the protocol once instead of maintaining multiple editor-specific plug-ins.
Editor freedom: Developers choose tools without sacrificing language-aware features.
Feature parity: Diagnostics, go-to-definition, workspace symbols, rename, and more behave consistently across environments.
Incremental updates: The protocol is designed for streaming updates as the user types, enabling responsive experiences.

How LSP Works

Transport: Client and server communicate over stdin/stdout pipes, TCP, or WebSockets. Messages use JSON-RPC 2.0 framed with Content-Length headers.
Initialization: Client sends initialize with capabilities and workspace metadata. Server responds with supported features (ServerCapabilities). A follow-up initialized notification signals readiness.
Document Synchronization: The client streams document lifecycle notifications (didOpen, didChange, didSave, didClose) so the server maintains up-to-date views of open files.
Feature Requests: Once documents are synchronized, the client issues requests such as:
- textDocument/completion for completion items.
- textDocument/hover for inline info.
- textDocument/definition and textDocument/references for navigation.
- textDocument/documentSymbol and workspace/symbol for structure searches.
- textDocument/codeAction, textDocument/rename, textDocument/semanticTokens, and more.
Responses and Notifications: Servers send responses with payloads defined in the protocol. They can also push diagnostics (textDocument/publishDiagnostics) or log messages asynchronously.
Shutdown: Clients request graceful shutdown via shutdown followed by exit.

The protocol evolves through versioned specifications (currently 3.x). Beacon targets the subset required for an ergonomic Python workflow, while keeping the implementation modular so new methods can be added as needed.

Tree-sitter

This document contains notes I've compiled based on learnings about tree-sitter.

Tree-sitter is both a parser-generator tool and an incremental parsing library¹. It’s optimized for embedding in editors and tooling (rather than being only a compiler backend parser). It supports many languages, with language-specific grammars².

From the official site:

Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited.

What problems it solves

Here are its key value-propositions and the issues it addresses:

Better than regex/highlight hacks

Traditional editors often use regular expressions or ad-hoc syntax rules for things like syntax highlighting, folding, code navigation. These approaches tend to fail with complex nested constructs or incomplete code (common in live editing). Tree-sitter uses a proper parse tree (Concrete Syntax Tree) rather than purely regex heuristics, giving more accurate structure.

Incremental parsing / live editing

In an editor context, users are typing and modifying files constantly. Re-parsing the entire file on every keystroke is expensive and slow. Tree-sitter supports incremental parsing, meaning it updates only the changed portion of the tree rather than rebuilding everything. This means edits are reflected quickly and the tree remains coherent, which enables features like structured selection, live syntax highlighting, etc.

Unified API / language-agnostic tooling

Because each language has a Tree-sitter grammar, you can build tooling (highlighting, navigation, refactoring) in a language-agnostic way: query the tree, capture nodes of interest, etc. This reduces duplication of effort: editor vendors don’t have to write custom parsing logic per language to support advanced features.

Error-tolerant parsing for editing

Since code is often incomplete/invalid in the middle of editing, a robust parser needs to recover gracefully. Tree-sitter is designed to continue to provide a usable tree under such conditions so editors can rely on the tree structure even when the file is only partially valid.

Enables richer editor tooling

Because you have a full tree, you can support advanced features: structural selection (e.g., select "function" or "if block"), code folding by AST node, refactorings, cross-language injections (e.g., embedded languages). For example, using queries you can capture specific nodes in the tree and apply tooling logic.

Internals

Grammar / Parser Generation

For each language you want support for, you write a grammar file, typically grammar.js (or some variant) describing the language’s syntax in a DSL provided by Tree-sitter. Example: You describe rules like sum: ..., product: ..., define precedence, associativity (via helpers like prec.left, prec.right). You then run the Tree-sitter CLI (or build process) to generate a parser.c file (and possibly scanner.c) that formalizes the grammar into C code. That generated parser becomes the actual runtime component for that language.

Lexer/Tokenization

The generated parser includes a lexer (scanner) component that tokenizes the source code (turning characters into tokens). In some languages, you may supply a custom external scanner to handle tricky lexing cases (e.g., indent-based blocks, embedded languages) via scanner.c.

Parser Engine (GLR / LR)

The core algorithm is a generalized LR (GLR) parser. GLR means it can handle grammars with some ambiguity and still produce valid parse trees. In simple terms, the parser uses a parse table (states × tokens) to decide shift/reduce actions. The grammar defines precedence/associativity to resolve ambiguities. In addition to traditional LR parsing, Tree-sitter is optimized for incremental operation (see next).

Tree Representation & Node Structure

After parsing, you obtain a Concrete Syntax Tree (CST), which is a graph of nodes representing lexical tokens and syntactic constructs. Nodes carry the source-range (start and end positions) information. Nodes can be named, anonymous (underscore prefix in grammar means "anonymous" so it doesn’t appear in the final tree) to keep the tree cleaner.

Incremental Parsing

A key feature: when the source text changes (e.g., editing in an editor), Tree-sitter avoids re-parsing the whole file. Instead it reuses existing subtrees for unchanged regions and re-parses only the changed region plus a small margin around it.

Editor notifies parser of changes (range of changed characters, old/new text)
Parser identifies which nodes’ source ranges are invalidated
It re-parses the minimal region and re-connects to reused nodes outside that region
It produces an updated tree with source ranges corrected.

Querying & Tree Walk / API

Once you have a tree, you can run queries (S-expression style) to find sets of nodes matching patterns. For example, capture all if_statement nodes or function declarations. The API (C API, plus language bindings) allows you to walk nodes, inspect children, get start/end positions, text, etc³. The query system is powerful: you can specify patterns, nested structures, predicates (e.g., #eq? @attr_name "class").

Embedding / Use in Editors & Tools

Tree-sitter is designed to be embedded: the parsing library is written in C, and there are bindings in many languages (Rust, JS, Python, etc.)². Editor plugins (for example nvim‑treesitter for Neovim) use Tree-sitter for syntax highlighting, structural editing, text-objects.

https://tree-sitter.github.io/ "Tree-sitter: Introduction"

https://en.wikipedia.org/wiki/Tree-sitter_%28parser_generator%29 "Tree-sitter (parser generator)"

https://tree-sitter.github.io/tree-sitter/using-parsers/ "Using Parsers - Tree-sitter"

PEP8

Philosophy

Purpose

Provide coding conventions for the Python standard library, to enhance readability and consistency¹.

Underlying principle

"Code is read much more often than it is written."

Consistency matters

Within a project > within a module > within a function.

Exceptions permitted

When strictly following the guideline reduces clarity or conflicts with surrounding code.

Encoding

Use UTF-8 encoding for source files (in the core distribution).

Avoid non-ASCII identifiers in standard library modules; if used, limit noisy Unicode characters.

Layout

Indentation

Use 4 spaces per indentation level. Tabs are strongly discouraged. Never mix tabs and spaces.

Line Length

Preferred maximum: 79 characters for code.

For long blocks of text (comments/docstrings): ~72 characters.

Blank Lines and Vertical Whitespace

Insert blank lines to separate top-level functions and classes, and within classes to separate method groups.

Avoid extraneous blank lines within code structure.

Imports

Imports at top of file, after module docstring and before module globals/constants.

Group imports in the following order:

Standard library imports
Related third-party imports
Local application/library-specific imports Insert a blank line between each group.

Absolute imports preferred; explicit relative imports acceptable for intra-package use.

Wildcard imports (from module import *) should be avoided except in rare cases (e.g., to publish a public API).

Whitespace

Avoid extra spaces in the following contexts:

Immediately inside parentheses, brackets or braces.
Between a trailing comma and a closing bracket.
Before a comma, semicolon, or colon.
More than one space around an assignment operator to align multiple statements (alignment discouraged)

Usage

# Correct:
spam(ham[1], {eggs: 2})

# Avoid:
spam( ham[ 1 ], { eggs: 2 } )

Comments

Good comments improve readability, explain why, not how.

Use full sentences, capitalize first word, leave a space after the #.

Inline comments should be used sparingly and separated by at least two spaces from the statement.

Block comments should align with code indentation and be separated by blank lines where appropriate.

Docstrings

Use triple-quoted strings for modules, functions, classes.

The first line should be a short summary; following lines provide more detail if necessary.

For conventions specific to docstrings see PEP 257 – Docstring Conventions.

Naming Conventions

Kind	Convention
Modules	Short, lowercase, may use underscores
Packages	All-lowercase, preferably no underscores
Classes	Use CapWords (CamelCase) convention
Exceptions	Typically CapWords
Functions and methods	Lowercase with underscores (snake_case)
Variables	Use lowercase_with_underscores
Constants	All UPPERCASE_WITH_UNDERSCORES
Private identifiers	One leading underscore `_private`; name mangling via __double_leading_underscore.
Type Vars (in generics)	CapWords

Avoid single character names like l, O, I (they are easily confused with 1 and 0).

Recommendations

Avoid pointless object wrappers, redundant code; prefer simple, explicit approaches. This matches the ethos "explicit is better than implicit" from The Zen of Python.

When offering interfaces, design them so it is difficult to misuse them (i.e., "avoid programming errors").

Avoid using mutable default arguments in functions.

In comparisons to singletons (e.g., None), use is or is not rather than equality operators.

Exceptions to the Rules

The style guide states that while adherence is recommended, there are legitimate cases for deviation.

Reasons to deviate:

Strict adherence would reduce readability in context.
Code must remain consistent with surrounding non-PEP8 code (especially legacy).
The code predates the rule and rewriting it isn’t justified.

Tooling

Tools exist to help enforce or auto-format code to PEP 8 style (e.g., linters, auto-formatters).

Using such tools helps maintain style consistency especially on teams or open-source projects.

Summary

Readability and consistency are the primary goals.
Follow conventions: 4 spaces, line length ~79 chars, snake_case for functions/variables, CapWords for classes, uppercase for constants.
Imports at top, grouped logically.
Whitespace matters—used meaningfully, not decoratively.
Use comments and docstrings effectively: explain why, not how.
Be pragmatic: if strictly following every rule makes things worse, depart in favour of clarity.
Use automation tools to assist but don’t treat the guide as dogma—interpret intelligently.

https://peps.python.org/pep-0008/ "PEP 8 – Style Guide for Python Code"

Type Hints (484) & Annotations (585)

PEP 484 - Type Hints

Overview

PEP 484 introduced a standardized system for adding type hints to Python code. Its goal was not to enforce static typing at runtime but to establish a formal syntax for optional type checking via external tools like mypy, pytype and later Pyright¹. This marked a pivotal moment for Python’s type ecosystem — bridging the gap between dynamic and statically analyzable Python. It defined the foundations of the typing module and introduced the concept of gradual typing, where type hints coexist with dynamic typing¹.

Concepts

Gradual Typing

Type annotations are optional, enabling progressive adoption without breaking existing code.

Type System Syntax

Function signatures, variables, and class members can be annotated using syntax like def greet(name: str) -> str:¹.

`typing` module

Adds classes like List, Dict, Tuple, Optional, Union, Any, Callable¹.

Type Checkers

External tools (e.g., mypy) use these annotations for static analysis, error detection, and IDE autocompletion.

Runtime Neutrality

Annotations are stored in __annotations__ and ignored by Python itself; type enforcement is delegated to external tools¹.

Motivation

Before PEP 484, large Python projects (e.g., Dropbox, Google) developed internal type systems to manage complexity. PEP 484 unified these under a common specification inspired by mypy and by research in gradual typing¹.

Impact

Established a shared foundation for static analysis across the ecosystem.
Enabled downstream standards like PEP 561 (distributable type stubs), PEP 563 (deferred evaluation of annotations), and PEP 604/649 (modernized syntax and semantics).

PEP 585 - Type Hinting Generics in Standard Collections

Overview

PEP 585 streamlined the use of generics by allowing the built-in collection types (e.g., list, dict, set) to be used directly as generic types, replacing typing.List, typing.Dict, etc². For example, code such as:

from typing import List
def f(x: List[int]) -> None: ...

can now be written as:

def f(x: list[int]) -> None: ...

Motivation

PEP 484’s design relied on importing type aliases from the typing module. This indirection created redundancy, confusion, and runtime overhead. By 2020, with from __future__ import annotations and runtime type information improvements, it became viable to use built-ins directly².

Core Changes

Built-in classes (list, dict, tuple, set, etc.) now support subscripting ([]) at runtime.
A new types.GenericAlias class is introduced internally to represent these parameterized generics².
Backwards compatibility preserved — typing.List and others remain but are considered deprecated³.
Simplified syntax aligns Python with other typed languages’ ergonomics.

Impact

Improved readability and ergonomics: Encourages list[int] over List[int].
Reduces the mental split between runtime and static type worlds.
Opens the door for the removal of redundant wrappers in future releases.

Summary

Together, PEP 484 and PEP 585 represent Python’s maturing type system:

PEP 484 built the scaffolding by defining syntax, semantics, and conventions.
PEP 585 modernized it by integrating type information natively into Python’s core language model. This reflects a shift from externalized static typing toward first-class optional typing. It preserves Python’s philosophy of flexibility while offering stronger correctness guarantees for large-scale codebases.

https://peps.python.org/pep-0484/ "PEP 484 – Type Hints | peps.python.org"

https://peps.python.org/pep-0585/ "PEP 585 – Type Hinting Generics In Standard Collections"

https://docs.python.org/3/library/typing.html "typing — Support for type hints — Python 3.13.5 documentation"

Structural Pattern Matching in Python

Structural Pattern Matching extends if/elif logic with declarative, data-shape-based matching. It allows code to deconstruct complex data structures and branch based on both type and content. Unlike switch in other languages, pattern matching inspects structure and value, not just equality¹.

match command:
    case ("move", x, y):
        handle_move(x, y)
    case ("stop",):
        handle_stop()
    case _:
        print("Unknown command")

Syntax

Basic

match subject:
    case pattern_1 if guard_1:
        ...
    case pattern_2 if guard_2:
        ...
    case _:
        ...

The subject is evaluated once.
Each case pattern is tested in order.
The first pattern that matches (and whose optional if guard succeeds) executes.
The _ pattern matches anything (a wildcard).

Pattern Types

Literal

Match exact constants or values:

case 0 | 1 | 2:
    ...
case "quit":
    ...

Multiple literals can be combined with | (OR patterns).

Capture

Assign matched values to variables:

case ("move", x, y):
    # binds x and y

⚠️ Names in patterns always bind, they do not compare. To compare to an existing variable, use a value pattern:
case Point(x, y) if x == origin.x:

Sequence

Match list or tuple structure:

case [x, y, z]:
    ...
case [first, *rest]:
    ...

Mapping

Match dictionaries:

case {"type": "point", "x": x, "y": y}:
    ...

Keys are matched literally; missing keys cause no match.

Class

Deconstruct class instances via their attributes or positional parameters:

case Point(x, y):
    ...

This uses the class’s __match_args__ attribute to define positional fields.

Example:

class Point:
    __match_args__ = ("x", "y")
    def __init__(self, x, y):
        self.x, self.y = x, y

OR

Combine multiple alternatives:

case "quit" | "exit":
    ...

AS

Bind the entire match while destructuring:

case [x, y] as pair:
    ...

Wildcard

The _ pattern matches anything and never binds.

Guards (`if` clauses)

Optional if conditions refine matches:

match point:
    case Point(x, y) if x == y:
        print("on diagonal")

Guards are evaluated after successful structural match and can use bound names.

Semantics

Concept	Behavior
Evaluation	`subject` evaluated once; patterns checked in order
Binding	Successful match creates new local bindings
Failure	Non-matching case continues to next pattern
Exhaustiveness	No implicit `else`; always include `case _:` for completeness
Guards	Boolean expressions using pattern-bound variables

Examples 2

Algebraic Data Types (ADTs)

Pattern matching elegantly models variant data:

class Node: pass
class Leaf(Node): ...
class Branch(Node):
    __match_args__ = ("left", "right")

def depth(tree):
    match tree:
        case Leaf(): return 1
        case Branch(l, r): return 1 + max(depth(l), depth(r))

Command Parsing

def process(cmd):
    match cmd.split():
        case ["load", filename]:
            load_file(filename)
        case ["quit" | "exit"]:
            sys.exit()
        case _:
            print("Unknown command")

HTTP-like Routing

match (method, path):
    case ("GET", "/"):
        return homepage()
    case ("GET", "/users"):
        return list_users()
    case ("POST", "/users", data):
        return create_user(data)

Design 3

Goals

Provide clarity and conciseness for branching on structured data.
Support static analysis: patterns are explicit and compositional.
Encourage declarative code, replacing complex if ladders.

Why Not Switch?

Structural, not value-only: matches shape, type, and contents.
Integrates with Python’s dynamic typing and destructuring capabilities.

Why Not Functions?

While if statements or dispatch tables can emulate simple branching, pattern matching better communicates intent and is easier to read and verify.

Spec

Category	Rule
Subject types	Any object, including sequences, mappings, and classes
Match protocol	For class patterns, Python checks `__match_args__` and attributes
Sequence match	Requires `__len__` and `__getitem__` methods
Mapping match	Requires `.keys()` and `__getitem__`; ignores extra keys
Pattern scope	Variables bound within a `case` are local to that block
Evaluation order	Top-to-bottom, left-to-right
Errors	SyntaxError for invalid pattern constructs

Pitfalls

Shadowing: Every bare name in a pattern binds, it doesn’t compare:

color = "red"
match color:
    case color:  # always matches and binds new variable!
        ...

Use constants or enums instead:

match color:
    case "red": ...

Ignoring guards: Guards run after matching, not during expensive side effects inside guards are discouraged.
Over-matching: Pattern length must align unless *rest is used.

Tooling

Linters: flake8, ruff, and pyright support pattern syntax.
Static analyzers: Type checkers can verify exhaustive matches on enums and dataclasses.
Refactoring tools: can replace nested if trees with match statements.

Usage Patterns

Use Case	Pattern Example
Enum dispatch	`case Status.OK:`
Dataclasses	`case Point(x, y):`
Command tuples	`case ("move", x, y):`
JSON-like dicts	`case {"user": name, "id": uid}:`
Error handling	`case {"error": msg} if "fatal" in msg:`

Backwards Compatibility and Evolution

Introduced in Python 3.10⁴.
Future extensions may include:
- Better exhaustiveness checking
- Improved IDE refactoring tools
- Expanded type integration for dataclasses and typing constructs

Backward-incompatible syntax changes are unlikely; the match semantics are stable.

Summary

Pattern matching provides:

Declarative branching over structured data
Readable syntax for destructuring and filtering
Powerful composition of match conditions and guards

It is not a replacement for if statements. It is a new control structure for expressing shape-based logic cleanly and expressively.

PEP 634 - Structural Pattern Matching: Specification

PEP 635 - Motivation and Rationale

PEP 636 - Tutorial

⁴

Python 3.10 documentation

Distributing and Packaging Python Type Information (`.pyi`/stubs)

Abstract

PEP 561 establishes a standardized method for distributing and packaging type information in Python. It builds upon PEP 484, addressing the problem of how type information for bboth inline and in separate stub files. Stubs can be discovered, packaged, and used by type checkers across environments.

This allows:

Package maintainers to declare their code as typed,
Third parties to publish independent stub packages, and
Type checkers to resolve imports consistently across mixed environments.

Background

Prior to PEP 561:

There was no consistent way to distribute typing information with Python packages.
Stub files had to be manually placed in MYPYPATH or equivalent.
Community stubs were collected centrally in Typeshed, which became a scalability bottleneck.

The goals are:

To use existing packaging infrastructure (distutils/setuptools).
To provide clear markers for type-aware packages.
To define resolution rules so that tools like mypy, pyright, or pylance can locate and prioritize type information uniformly

PEP 561 recognizes three models: inline-typed, stub-typed, and third-party stub-only packages.

Packaging Type Information

Inline

Inline-typed packages must include a marker file named py.typed inside the package root.

Example setup:

setup(
    name="foopkg",
    packages=["foopkg"],
    package_data={"foopkg": ["py.typed"]},
)

This file signals to type checkers that the package and all its submodules are typed. For namespace packages (PEP 420), the marker should be placed in submodules to avoid conflicts.

Stub-Only

Stub-only packages contain .pyi files without any runtime code.
Naming convention: foopkg-stubs provides types for foopkg.
py.typed is not required for these packages.
Version compatibility should be expressed in dependencies (e.g. via install_requires).

Example layout:

shapes-stubs/
└── polygons/
    ├── pentagon/__init__.pyi
    └── hexagon/__init__.pyi

Partial Stubs

Partial stubs (incomplete libraries) must include partial\n inside py.typed.

These instruct type checkers to:

Merge the stub directory with the runtime or typeshed directory.
Continue searching through later steps in the resolution order.

Module Resolution Order

Type checkers must resolve type information using the following ordered search path:

Priority	Source	Description
1	Manual stubs / MYPYPATH	User-specified patches override all.
2	User code	The project’s own files.
3	Stub packages (`*-stubs`)	Distributed stubs take precedence over inline types.
4	`py.typed` packages	Inline or bundled types inside installed packages.
5	Typeshed	Fallback for stdlib and untyped third-party libs.

If a stub-only namespace package lacks a desired module, type checkers continue searching through the inline and typeshed steps. When checking against another Python version, the checker must look up that version’s site-packages path.

Conventions

Library Interface

When py.typed is present:

All .py and .pyi files are considered importable.
Files beginning with _ are private.
Public symbols are controlled via __all__.

Valid __all__ idioms include:

__all__ = ['a', 'b']
__all__ += submodule.__all__
__all__.extend(['c', 'd'])

These restrictions allow static determination of public exports by type checkers.

Imports and Re-Exports

Certain import forms signal that an imported symbol should be re-exported as part of the module’s public interface:

import X as X            # re-export X
from Y import X as X     # re-export X
from Y import *          # re-exports __all__ or all public symbols

All other imports are private by default.

Implementation and Tooling

mypy implements full PEP 561 resolution, allowing users to inspect installed package metadata (py.typed, stub presence, etc.).
Tools like pyright, pylance, and Pytype adopt the same ordering and conventions.
Example repositories include:

This design remains fully backward compatible, requiring no changes to Python’s runtime or packaging systems.

Beacon.rs