- Rust 93.9%
- Shell 5.2%
- Scheme 0.9%
- Add 30 integration tests and 5 unit tests covering all pattern-matching
scenarios from analyzer-lsp's java-pattern-matching.md (PACKAGE, IMPORT,
TYPE, METHOD, METHOD_CALL, FIELD, ANNOTATION, CLASS/METHOD/FIELD+annotated,
IMPLEMENTS_TYPE)
- Add conformance bash script (tests/conformance/test_pattern_matching.sh)
that runs the indexer CLI against analyzer-lsp's fixture projects and
validates results with jq against demo-output.yaml expectations, including
exact line-number verification via byte-offset-to-line conversion
- Fix Maven auto-detection to resolve pom.xml <properties> variables
(e.g. ${spring.version} -> 5.3.30), include provided-scope dependencies,
fall back to any cached version when exact version is missing, and scan
sibling artifacts under the same groupId for transitive dependency
discovery
- Add @Audited annotation and AuditedService fixtures for METHOD+annotated
test coverage
|
||
|---|---|---|
| src | ||
| tests | ||
| .gitignore | ||
| AGENTS.md | ||
| Cargo.lock | ||
| Cargo.toml | ||
| README.md | ||
java-indexer
A Java source code indexer built on ast-index. Parses Java source files using tree-sitter, extracts definitions, imports, and references with full Java language semantics, builds a cross-file index, and provides an interactive query REPL.
Given a Java project, the indexer answers questions like:
- Where is
com.example.model.Userdefined and where is it used? - What classes are in the
com.example.servicepackage? - Who calls
getName()across the entire codebase? - What does
UserServiceimport and from where?
Quick Start
# Build
cargo build --release
# Index a project and enter the query REPL
./target/release/java-indexer /path/to/your/java/project
# With dependency resolution from JARs
./target/release/java-indexer /path/to/project --classpath /path/to/lib
On startup, the indexer scans all .java files, builds the cross-file index, and drops you into an interactive REPL:
Indexing Java project: /path/to/project
Classpath: 42 JAR(s) configured
Build complete:
{
"files_scanned": 19,
"files_cached": 19,
"files_failed": 0,
"unresolved_modules": [
"java.util.stream",
"java.util.concurrent"
],
"errors": []
}
Interactive query REPL. Commands:
{"symbol": "com.example.*", "kind": "class"} -- query the index
help -- show this help
stats -- show build stats
exit / quit / Ctrl+D -- exit
query>
Installation
Requires Rust 1.85+ (edition 2024).
git clone <repo>
cd ast-index-languages/java
cargo build --release
The binary is at ./target/release/java-indexer.
CLI Reference
java-indexer [OPTIONS] <REPO_PATH>
Arguments
| Argument | Required | Description |
|---|---|---|
REPO_PATH |
Yes | Path to the Java project root to index |
Options
| Flag | Description |
|---|---|
--classpath <PATH:PATH:...> |
Colon-separated list of JAR files or directories containing JARs for dependency resolution. Directories are searched recursively. |
--no-auto-classpath |
Disable automatic classpath detection from pom.xml / build.gradle. |
--log-level <LEVEL> |
Log verbosity: error, warn (default), info, debug, trace. |
Examples
# Basic usage -- index with JDK stub fallback only
java-indexer /path/to/project
# Explicit classpath -- point at your dependency JARs
java-indexer /path/to/project --classpath /path/to/lib:/path/to/other-lib
# Single JAR
java-indexer /path/to/project --classpath /path/to/guava-31.1.jar
# Maven project -- auto-detects from pom.xml + ~/.m2/repository
java-indexer /path/to/maven-project
# Disable auto-detection
java-indexer /path/to/project --no-auto-classpath
# Verbose logging to see dependency resolution
java-indexer /path/to/project --classpath /path/to/lib --log-level debug
Query Reference
Queries are JSON objects submitted at the query> prompt.
Format
{"symbol": "<search pattern>", "kind": "<optional kind filter>"}
Search Patterns
| Pattern | Meaning | Example |
|---|---|---|
com.example.model.User |
Exact fully-qualified name | Find one specific class |
com.example.model.* |
All symbols in a package | Browse a package |
com.example.model.User.* |
All members of a class | See methods, fields, inner classes |
com.example.*.* |
All symbols in any sub-package | Explore a namespace |
*.User |
Symbol named "User" in any package | Find by short name |
*.*Utils |
Any symbol ending in "Utils" | Find utility classes |
*.getName |
Method named "getName" anywhere | Find method definitions |
com.example.model.User.Builder |
Inner class by FQN | Navigate nested types |
com.example.model.UserStatus.ACTIVE |
Enum member by FQN | Find specific constants |
Kind Filters
| Kind | Matches |
|---|---|
class |
Classes (not interfaces or enums) |
type_def |
All type definitions (classes, interfaces, records) |
interface |
Interfaces only |
enum |
Enum types |
enum_member |
Individual enum constants |
function / method |
Methods |
constructor |
Constructors |
field |
Fields |
variable |
Local variables (limited) |
const |
Constants |
type_alias |
Type aliases |
REPL Commands
| Command | Description |
|---|---|
stats |
Show build statistics (files scanned, unresolved modules) |
help |
Show query format help |
exit / quit / Ctrl+D |
Exit the REPL |
Output Format
Each query returns an array of traces. A trace connects a symbol's definition to all the places it's used across the codebase:
{
"definition": {
"type": "project_file",
"file": "src/main/java/com/example/model/User.java",
"span": { "start": 180, "end": 1894 },
"def": {
"name": "User",
"qualified_name": "com.example.model.User",
"kind": "type_def",
"exported": true,
"language_data": {
"visibility": "public",
"type_kind": "class",
"is_abstract": false,
"is_final": false,
"is_sealed": false
},
"parents": [
{ "kind": "extends", "type_ref": "Entity" }
]
}
},
"chain": [
{
"file": "src/main/java/com/example/service/UserService.java",
"kind": "re_export",
"span": { "start": 30, "end": 60 },
"original_name": "User"
}
],
"usage_sites": [
{
"file": "src/main/java/com/example/service/UserService.java",
"span": { "start": 374, "end": 378 },
"usage": "type"
},
{
"file": "src/main/java/com/example/model/Admin.java",
"span": { "start": 204, "end": 208 },
"usage": "type"
}
]
}
Definition types:
project_file-- defined in a project source file (hasfileandspan)dependency-- resolved from a JAR on the classpath or JDK stubs (haspackage)unresolved-- imported but could not be resolved (haspackage)
Usage types: type (type reference), read (value read), write (assignment), read_write
Query Examples
Find a class and see where it's used:
query> {"symbol": "com.example.model.User", "kind": "class"}
Find all classes in a package:
query> {"symbol": "com.example.model.*", "kind": "class"}
Find a method across the codebase:
query> {"symbol": "*.getName", "kind": "function"}
Query a JDK type (resolved through dependency stubs):
query> {"symbol": "java.lang.String", "kind": "type_def"}
Returns a dependency definition with span: {start: 0, end: 0} since it comes from stubs, not source.
Find all members of a class:
query> {"symbol": "com.example.model.User.*"}
Dependency Resolution
The indexer resolves references to external libraries (JDK, third-party JARs) through the --classpath flag or auto-detection.
How It Works
- At index time: imports like
import java.util.List;are recorded as unresolved modules - At query time: when you query for a symbol in an unresolved module, the indexer lazily resolves it:
- Scans JAR central directories to build a
package -> JARindex (fast -- only reads ZIP TOC) - Parses
.classfiles from matching JARs usingcafebabe - Extracts types, methods, fields, constructors with full modifiers and type information
- Caches the results for subsequent queries
- Scans JAR central directories to build a
- Fallback: built-in JDK stubs for
java.lang,java.util,java.ioprovide coverage when no classpath is configured
Classpath Configuration
Explicit classpath (most reliable):
# Point at a directory of JARs (searched recursively)
java-indexer /project --classpath /project/lib
# Multiple entries, colon-separated (standard Java convention)
java-indexer /project --classpath /project/lib:/other/deps:/path/to/specific.jar
# After running Maven dependency:copy
mvn dependency:copy-dependencies -DoutputDirectory=target/deps
java-indexer /project --classpath target/deps
Maven auto-detection (when pom.xml exists):
# Auto-detects dependencies from pom.xml, resolves from ~/.m2/repository
java-indexer /path/to/maven-project
Reads <dependency> blocks from pom.xml, constructs paths like:
~/.m2/repository/com/google/guava/guava/31.1-jre/guava-31.1-jre.jar
Limitations of auto-detection:
- Does not resolve Maven properties/variables (
${project.version}) - Does not follow parent POM inheritance
- Does not resolve transitive dependencies
- Only includes compile/runtime scope (excludes test/provided)
For full fidelity, use explicit --classpath with the output of mvn dependency:build-classpath -q.
JDK coverage:
When JAVA_HOME is set (Java 9+), JDK module files (.jmod) are auto-detected from $JAVA_HOME/jmods/, providing full JDK type coverage with complete method signatures and generics. This happens automatically alongside Maven/Gradle detection.
Without JAVA_HOME, built-in stubs cover common types in java.lang, java.util, and java.io as a fallback.
For Java 8, add the JDK to the classpath manually:
java-indexer /project --classpath $JAVA_HOME/jre/lib/rt.jar
What Gets Extracted from .class Files
For each public/protected class in a JAR:
| Element | Extracted Data |
|---|---|
| Type | Name, qualified name, kind (class/interface/enum/record/annotation), modifiers, parents (extends/implements) |
| Methods | Name, return type, parameter types and names (if available), modifiers (public/static/abstract/etc.) |
| Constructors | Parameter types and names, modifiers |
| Fields | Name, type, modifiers (static/final/transient/volatile) |
| Enum constants | Name |
Private and package-private members are skipped. Generic type arguments are fully extracted from the JVM Signature attribute (e.g., List<String>, Map<K, V>, ? extends Number), falling back to erased descriptor types when no signature is present.
Architecture
Source files (.java)
|
v
[tree-sitter-java parser] --> Concrete Syntax Tree
|
v
[Extraction] --> AnalysisResult { package, imports, definitions, references }
| |
| .scm query files define | Each reference gets a SymbolOrigin:
| what to capture: | - Import (from an import statement)
| - definitions.scm | - Local (same-file definition)
| - imports.scm | - Global (unresolved)
| - references.scm |
v v
[ast-index::ProjectIndex::build()] --> Cross-file index
|
v
[Query + Trace] --> SymbolTrace { definition, chain, usage_sites }
| |
| | Lazy dependency resolution:
| | 1. Try classpath JARs (parse .class files)
| | 2. Fall back to JDK stubs
v
[JSON output]
Module Map
| Module | Purpose |
|---|---|
analyzer.rs |
FileAnalyzer trait implementation, JDK stubs |
extraction/ |
Tree-sitter query execution and symbol extraction |
extraction/definitions.rs |
Type, method, field, constructor extraction |
extraction/imports.rs |
Import statement extraction (regular, wildcard, static) |
extraction/references.rs |
Reference extraction and origin resolution |
classpath/mod.rs |
Classpath struct -- lazy JAR-based dependency resolver |
classpath/classfile.rs |
.class file to SymbolDef conversion |
classpath/descriptors.rs |
JVM type descriptors to ast-index types |
classpath/signatures.rs |
JVM generic signature parser (type arguments, wildcards, bounds) |
classpath/jar.rs |
JAR/ZIP scanning and class extraction |
classpath/autodetect.rs |
Maven/Gradle classpath auto-detection |
lang_data.rs |
Java-specific metadata types (visibility, modifiers) |
output.rs |
JSON serialization wrappers |
main.rs |
CLI and REPL |
For detailed architecture documentation, see AGENTS.md.
Testing
Running Tests
# All tests (unit + integration + diagnostics)
cargo test
# Unit tests only (extraction correctness)
cargo test --test java_language_tests
# Integration tests only (cross-file index queries)
cargo test --test index_integration_tests
# Diagnostic dumps (use --nocapture to see output)
cargo test --test dump_analysis -- --nocapture
Test Categories
| Category | File | Tests | What It Validates |
|---|---|---|---|
| Unit | tests/java_language_tests.rs |
132 | Extraction correctness per Java Language Specification. Each test parses a Java snippet and asserts the extracted definitions, imports, and references match JLS semantics. |
| Integration | tests/index_integration_tests.rs |
102 | Full pipeline (parse -> extract -> index -> query -> trace). Builds a 19-file fixture project and runs queries against it. |
| Diagnostic | tests/dump_analysis.rs |
1 | Dumps raw AnalysisResult for method call chains. |
| Diagnostic | tests/dump_getname.rs |
1 | Dumps getName() call site analysis. |
| Classpath | tests/classpath_tests.rs |
32 | JAR scanning, .class extraction, generic signatures, Classpath struct, analyzer integration, full index with classpath. |
| Inline | src/classpath/signatures.rs |
21 | JVM generic signature parser (field, method, class signatures). |
| Inline | src/classpath/descriptors.rs |
6 | JVM descriptor/name conversion helpers. |
| Inline | src/classpath/autodetect.rs |
2 | Maven pom.xml parsing and path construction. |
Remaining Test Gaps
Most classpath functionality is now tested (32 tests in classpath_tests.rs + 21 inline signature parser tests). Remaining gaps:
| Priority | Area | What to Test |
|---|---|---|
| Medium | Auto-detect e2e | Create a temp directory with a pom.xml and matching JARs in ~/.m2-like structure. Verify detect_classpath() finds the right JARs. |
| Low | Edge cases | Empty JARs, corrupt .class files, JARs with only module-info.class, inner/anonymous classes, multi-release JARs. |
Known Limitations
- Lambda parameter type inference -- only works via the collection generic type argument heuristic. See AGENTS.md for details.
- Maven auto-detection -- does not resolve properties, parent POMs, or transitive dependencies. Use
mvn dependency:build-classpathfor full fidelity.
License
Apache-2.0