No description
  • Rust 99.2%
  • Scheme 0.8%
Find a file
Shawn Hurley aff9d3674a
Conform to ast-index proof: fix P7 span violation, implement R17/R18, add parameter resolution
- Fix P7 violation: tuple/starred unpacking now uses per-child spans
  instead of sharing the assignment node span across all names
- Implement receiver_ref_span (R17): method chains like a.b().c()
  now link each ref to the previous via resolve_receiver
- Implement initializer_ref_span (R18): unannotated assignments like
  x = Factory.create() now point to the RHS expression ref span
- Add parameter reference resolution: refs inside function bodies
  resolve to parameter spans as Local origin per Python LEGB rules
- Fix resolve_origin disambiguation: module-level defs preferred
  over class-scoped defs for unqualified references
- Fix nested class qualified_name chaining: OuterClass.InnerClass
  methods get fully-chained qualified names
- Populate parent class package from imports for cross-module
  inheritance resolution
- Fix dependency synthetic spans to use Span::new(0,0) per patterns
- Thread enclosing_params through all ref extraction functions
- Update AGENTS.md with new test counts (412), dependency_tests,
  and design decision sections 13-16
- Add 30 new conformance tests covering all fixes
2026-04-23 12:23:32 -04:00
src Conform to ast-index proof: fix P7 span violation, implement R17/R18, add parameter resolution 2026-04-23 12:23:32 -04:00
tests Conform to ast-index proof: fix P7 span violation, implement R17/R18, add parameter resolution 2026-04-23 12:23:32 -04:00
.gitignore Initial commit: Python indexer for ast-index 2026-04-21 14:41:46 -04:00
AGENTS.md Conform to ast-index proof: fix P7 span violation, implement R17/R18, add parameter resolution 2026-04-23 12:23:32 -04:00
Cargo.lock Conform to ast-index proof: fix P7 span violation, implement R17/R18, add parameter resolution 2026-04-23 12:23:32 -04:00
Cargo.toml Adapt to ast-index infallible build() and removed LanguageDataConversion trait 2026-04-22 12:07:10 -04:00
README.md Add lazy dependency resolution for stdlib and third-party packages 2026-04-22 16:44:38 -04:00

python-indexer

Python language indexer for the ast-index library. Parses Python source files with tree-sitter, extracts definitions, references, and imports, and exposes them through the FileAnalyzer trait. Ships as both a Rust library crate and a CLI binary.

Quick Start

# Build
cargo build --release

# Index a project (full JSON output)
python-indexer index /path/to/project

# Index, stats only
python-indexer index /path/to/project --stats-only

# Query for a symbol
python-indexer query /path/to/project --pattern "mypackage.MyClass" -k class

Installation

Requires Rust 2024 edition. Clone the repo and build:

cargo build --release

The binary is at target/release/python-indexer.

CLI Usage

index -- Index a Python project

python-indexer index <REPO_PATH> [OPTIONS]

Walks the project tree, parses every .py file with tree-sitter, and outputs the full index as JSON to stdout.

Flag Description
<REPO_PATH> Path to the Python project root (required)
--stats-only Only output build statistics, not the full index
--deps-path <PATH> Path to a dependency directory (repeatable, see Dependency Resolution)
# Full index
python-indexer index ./my-project

# Just stats (file counts, unresolved modules)
python-indexer index ./my-project --stats-only

# With dependency resolution
python-indexer index ./my-project \
  --deps-path ./typeshed/stdlib \
  --deps-path ./.venv/lib/python3.12/site-packages

query -- Search the index

python-indexer query <REPO_PATH> --pattern <PATTERN> [OPTIONS]

Builds the index then searches for symbols matching the pattern. Results are output as JSON to stdout.

Flag Description
<REPO_PATH> Path to the Python project root
-p, --pattern <PAT> Search pattern (dotted name, wildcard, or regex)
-k, --kind <KIND> Filter by symbol kind
--stdin Read query as JSON from stdin
--deps-path <PATH> Path to a dependency directory (repeatable)

Symbol kind values: function, class (or typedef), enum, enum_member, variable (or var), const (or constant), type_alias, property, field, constructor

# Find a class by fully-qualified name
python-indexer query ./my-project -p "myapp.models.User" -k class

# Find all functions in a module
python-indexer query ./my-project -p "myapp.utils.*" -k function

# Wildcard across subpackages
python-indexer query ./my-project -p "myapp.services..*" -k class

# Regex pattern
python-indexer query ./my-project -p "myapp..*Handler"

# With dependency resolution
python-indexer query ./my-project \
  -p "flask.Flask" -k class \
  --deps-path ./.venv/lib/python3.12/site-packages

JSON stdin mode

For programmatic use, pass query parameters as JSON on stdin:

echo '{"repo_path": "/path/to/project", "search_pattern": "myapp.MyClass", "symbol_kind": "class"}' \
  | python-indexer query --stdin

Dependency Resolution

By default the indexer only knows about files inside your project. Imports of stdlib modules (os, sys, pathlib, ...) and third-party packages (flask, requests, ...) show up as unresolved modules in the build stats.

The --deps-path flag enables lazy dependency resolution: when a query matches an unresolved import, the indexer locates the dependency source/stubs in the provided directories, parses it, and returns the symbols -- all on demand.

How it works

  1. During index, imports that don't resolve to project files are recorded as unresolved.
  2. During query, if the search pattern matches an unresolved module, resolve_dependency_symbols() is called.
  3. The resolver searches each --deps-path directory in order for the module file.
  4. The first match is parsed and its exported symbols are returned.
  5. Results are cached -- subsequent queries for the same module skip parsing.

File resolution order

For a module name like os.path, each --deps-path is probed in this order:

  1. <deps-path>/os/path.pyi -- stub file (single module)
  2. <deps-path>/os/path/__init__.pyi -- stub package
  3. <deps-path>/os/path.py -- source file (single module)
  4. <deps-path>/os/path/__init__.py -- source package

.pyi stubs always take precedence over .py source within the same directory. When multiple --deps-path values are given, they are searched in order and the first match wins.

What to point --deps-path at

Source Typical path What it provides
virtualenv site-packages .venv/lib/python3.x/site-packages Third-party packages
typeshed stubs path/to/typeshed/stdlib Stdlib type stubs
CPython stdlib source /usr/lib/python3.x Stdlib source (fallback)
# Resolve both stdlib (via typeshed) and third-party deps
python-indexer query ./my-project \
  -p "flask.Flask" -k class \
  --deps-path ./typeshed/stdlib \
  --deps-path ./.venv/lib/python3.12/site-packages

Re-export handling

Most Python packages use __init__.py to re-export symbols from submodules:

# flask/__init__.py
from .app import Flask
from .blueprints import Blueprint

The resolver handles this automatically. When resolving flask, it parses flask/__init__.py, finds the imports, and synthesizes definitions for re-exported names (Flask, Blueprint). If __all__ is defined, only listed names are exported. Otherwise, visibility convention applies (names without a leading _ are public).

Library Usage

The indexer can also be used as a Rust library:

use python_indexer::analyzer::PythonAnalyzer;
use ast_index::{ProjectIndex, QueryParams, SymbolKindTag};

// Basic usage
let analyzer = PythonAnalyzer::new("/path/to/project".into());
let index = ProjectIndex::new(analyzer);
let stats = index.build(Path::new("/path/to/project"));

// With dependency resolution
let analyzer = PythonAnalyzer::with_deps_paths(
    "/path/to/project".into(),
    vec!["/path/to/site-packages".into()],
);
let index = ProjectIndex::new(analyzer);
index.build(Path::new("/path/to/project"));

// Query
let traces = index
    .params(QueryParams {
        search_pattern: "myapp.models.User".to_string(),
        symbol_kind: Some(SymbolKindTag::TypeDef),
    })
    .query()
    .expect("query failed");

What Gets Extracted

The indexer extracts the following from Python source files:

Definitions: functions, classes (with inheritance), constructors, methods, properties, fields (self.x assignments), variables, constants (UPPER_CASE), enums, enum members, type aliases, nested functions/classes

Imports: import x, from x import y, from . import z, from __future__ import, wildcard imports, aliased imports

References: function calls, decorator usage, type annotations, attribute access, del targets, augmented assignments

Python-specific metadata: visibility (public/private/name-mangled), decorators, async, static/classmethod, abstract, @property, @overload, enum member values, dataclass/protocol/ABC/NamedTuple/TypedDict type kinds

Logging

Set RUST_LOG to control log output (written to stderr):

RUST_LOG=debug python-indexer index ./my-project --stats-only
RUST_LOG=python_indexer=trace python-indexer query ./my-project -p "app.*"

Testing

# Run all tests (382 tests)
cargo test

# Run a specific suite
cargo test --test dependency_tests    # 33 dependency resolution tests
cargo test --test analyzer_tests      # 13 core extraction tests
cargo test --test search_tests        # 124 language semantics tests
cargo test --test query_tests         # 88 query pattern tests
cargo test --test coverage_tests      # 124 coverage gap tests

# Run a single test
cargo test --test dependency_tests e2e_project_imports_dep_function