- Rust 99.2%
- Scheme 0.8%
- Fix P7 violation: tuple/starred unpacking now uses per-child spans instead of sharing the assignment node span across all names - Implement receiver_ref_span (R17): method chains like a.b().c() now link each ref to the previous via resolve_receiver - Implement initializer_ref_span (R18): unannotated assignments like x = Factory.create() now point to the RHS expression ref span - Add parameter reference resolution: refs inside function bodies resolve to parameter spans as Local origin per Python LEGB rules - Fix resolve_origin disambiguation: module-level defs preferred over class-scoped defs for unqualified references - Fix nested class qualified_name chaining: OuterClass.InnerClass methods get fully-chained qualified names - Populate parent class package from imports for cross-module inheritance resolution - Fix dependency synthetic spans to use Span::new(0,0) per patterns - Thread enclosing_params through all ref extraction functions - Update AGENTS.md with new test counts (412), dependency_tests, and design decision sections 13-16 - Add 30 new conformance tests covering all fixes |
||
|---|---|---|
| src | ||
| tests | ||
| .gitignore | ||
| AGENTS.md | ||
| Cargo.lock | ||
| Cargo.toml | ||
| README.md | ||
python-indexer
Python language indexer for the ast-index library. Parses Python source files with tree-sitter, extracts definitions, references, and imports, and exposes them through the FileAnalyzer trait. Ships as both a Rust library crate and a CLI binary.
Quick Start
# Build
cargo build --release
# Index a project (full JSON output)
python-indexer index /path/to/project
# Index, stats only
python-indexer index /path/to/project --stats-only
# Query for a symbol
python-indexer query /path/to/project --pattern "mypackage.MyClass" -k class
Installation
Requires Rust 2024 edition. Clone the repo and build:
cargo build --release
The binary is at target/release/python-indexer.
CLI Usage
index -- Index a Python project
python-indexer index <REPO_PATH> [OPTIONS]
Walks the project tree, parses every .py file with tree-sitter, and outputs the full index as JSON to stdout.
| Flag | Description |
|---|---|
<REPO_PATH> |
Path to the Python project root (required) |
--stats-only |
Only output build statistics, not the full index |
--deps-path <PATH> |
Path to a dependency directory (repeatable, see Dependency Resolution) |
# Full index
python-indexer index ./my-project
# Just stats (file counts, unresolved modules)
python-indexer index ./my-project --stats-only
# With dependency resolution
python-indexer index ./my-project \
--deps-path ./typeshed/stdlib \
--deps-path ./.venv/lib/python3.12/site-packages
query -- Search the index
python-indexer query <REPO_PATH> --pattern <PATTERN> [OPTIONS]
Builds the index then searches for symbols matching the pattern. Results are output as JSON to stdout.
| Flag | Description |
|---|---|
<REPO_PATH> |
Path to the Python project root |
-p, --pattern <PAT> |
Search pattern (dotted name, wildcard, or regex) |
-k, --kind <KIND> |
Filter by symbol kind |
--stdin |
Read query as JSON from stdin |
--deps-path <PATH> |
Path to a dependency directory (repeatable) |
Symbol kind values: function, class (or typedef), enum, enum_member, variable (or var), const (or constant), type_alias, property, field, constructor
# Find a class by fully-qualified name
python-indexer query ./my-project -p "myapp.models.User" -k class
# Find all functions in a module
python-indexer query ./my-project -p "myapp.utils.*" -k function
# Wildcard across subpackages
python-indexer query ./my-project -p "myapp.services..*" -k class
# Regex pattern
python-indexer query ./my-project -p "myapp..*Handler"
# With dependency resolution
python-indexer query ./my-project \
-p "flask.Flask" -k class \
--deps-path ./.venv/lib/python3.12/site-packages
JSON stdin mode
For programmatic use, pass query parameters as JSON on stdin:
echo '{"repo_path": "/path/to/project", "search_pattern": "myapp.MyClass", "symbol_kind": "class"}' \
| python-indexer query --stdin
Dependency Resolution
By default the indexer only knows about files inside your project. Imports of stdlib modules (os, sys, pathlib, ...) and third-party packages (flask, requests, ...) show up as unresolved modules in the build stats.
The --deps-path flag enables lazy dependency resolution: when a query matches an unresolved import, the indexer locates the dependency source/stubs in the provided directories, parses it, and returns the symbols -- all on demand.
How it works
- During
index, imports that don't resolve to project files are recorded as unresolved. - During
query, if the search pattern matches an unresolved module,resolve_dependency_symbols()is called. - The resolver searches each
--deps-pathdirectory in order for the module file. - The first match is parsed and its exported symbols are returned.
- Results are cached -- subsequent queries for the same module skip parsing.
File resolution order
For a module name like os.path, each --deps-path is probed in this order:
<deps-path>/os/path.pyi-- stub file (single module)<deps-path>/os/path/__init__.pyi-- stub package<deps-path>/os/path.py-- source file (single module)<deps-path>/os/path/__init__.py-- source package
.pyi stubs always take precedence over .py source within the same directory. When multiple --deps-path values are given, they are searched in order and the first match wins.
What to point --deps-path at
| Source | Typical path | What it provides |
|---|---|---|
| virtualenv site-packages | .venv/lib/python3.x/site-packages |
Third-party packages |
| typeshed stubs | path/to/typeshed/stdlib |
Stdlib type stubs |
| CPython stdlib source | /usr/lib/python3.x |
Stdlib source (fallback) |
# Resolve both stdlib (via typeshed) and third-party deps
python-indexer query ./my-project \
-p "flask.Flask" -k class \
--deps-path ./typeshed/stdlib \
--deps-path ./.venv/lib/python3.12/site-packages
Re-export handling
Most Python packages use __init__.py to re-export symbols from submodules:
# flask/__init__.py
from .app import Flask
from .blueprints import Blueprint
The resolver handles this automatically. When resolving flask, it parses flask/__init__.py, finds the imports, and synthesizes definitions for re-exported names (Flask, Blueprint). If __all__ is defined, only listed names are exported. Otherwise, visibility convention applies (names without a leading _ are public).
Library Usage
The indexer can also be used as a Rust library:
use python_indexer::analyzer::PythonAnalyzer;
use ast_index::{ProjectIndex, QueryParams, SymbolKindTag};
// Basic usage
let analyzer = PythonAnalyzer::new("/path/to/project".into());
let index = ProjectIndex::new(analyzer);
let stats = index.build(Path::new("/path/to/project"));
// With dependency resolution
let analyzer = PythonAnalyzer::with_deps_paths(
"/path/to/project".into(),
vec!["/path/to/site-packages".into()],
);
let index = ProjectIndex::new(analyzer);
index.build(Path::new("/path/to/project"));
// Query
let traces = index
.params(QueryParams {
search_pattern: "myapp.models.User".to_string(),
symbol_kind: Some(SymbolKindTag::TypeDef),
})
.query()
.expect("query failed");
What Gets Extracted
The indexer extracts the following from Python source files:
Definitions: functions, classes (with inheritance), constructors, methods, properties, fields (self.x assignments), variables, constants (UPPER_CASE), enums, enum members, type aliases, nested functions/classes
Imports: import x, from x import y, from . import z, from __future__ import, wildcard imports, aliased imports
References: function calls, decorator usage, type annotations, attribute access, del targets, augmented assignments
Python-specific metadata: visibility (public/private/name-mangled), decorators, async, static/classmethod, abstract, @property, @overload, enum member values, dataclass/protocol/ABC/NamedTuple/TypedDict type kinds
Logging
Set RUST_LOG to control log output (written to stderr):
RUST_LOG=debug python-indexer index ./my-project --stats-only
RUST_LOG=python_indexer=trace python-indexer query ./my-project -p "app.*"
Testing
# Run all tests (382 tests)
cargo test
# Run a specific suite
cargo test --test dependency_tests # 33 dependency resolution tests
cargo test --test analyzer_tests # 13 core extraction tests
cargo test --test search_tests # 124 language semantics tests
cargo test --test query_tests # 88 query pattern tests
cargo test --test coverage_tests # 124 coverage gap tests
# Run a single test
cargo test --test dependency_tests e2e_project_imports_dep_function