No description
  • Rust 93.9%
  • Shell 5.2%
  • Scheme 0.9%
Find a file
Shawn Hurley f13bc91bae
Add java-pattern-matching conformance tests and fix Maven auto-detection
- Add 30 integration tests and 5 unit tests covering all pattern-matching
  scenarios from analyzer-lsp's java-pattern-matching.md (PACKAGE, IMPORT,
  TYPE, METHOD, METHOD_CALL, FIELD, ANNOTATION, CLASS/METHOD/FIELD+annotated,
  IMPLEMENTS_TYPE)

- Add conformance bash script (tests/conformance/test_pattern_matching.sh)
  that runs the indexer CLI against analyzer-lsp's fixture projects and
  validates results with jq against demo-output.yaml expectations, including
  exact line-number verification via byte-offset-to-line conversion

- Fix Maven auto-detection to resolve pom.xml <properties> variables
  (e.g. ${spring.version} -> 5.3.30), include provided-scope dependencies,
  fall back to any cached version when exact version is missing, and scan
  sibling artifacts under the same groupId for transitive dependency
  discovery

- Add @Audited annotation and AuditedService fixtures for METHOD+annotated
  test coverage
2026-04-23 13:57:55 -04:00
src Add java-pattern-matching conformance tests and fix Maven auto-detection 2026-04-23 13:57:55 -04:00
tests Add java-pattern-matching conformance tests and fix Maven auto-detection 2026-04-23 13:57:55 -04:00
.gitignore Add local variable, chained member access, JDK stubs, and inherited method resolution 2026-04-21 15:43:30 -04:00
AGENTS.md Align Java indexer with ast-index formal proof preconditions 2026-04-23 11:42:27 -04:00
Cargo.lock Align Java indexer with ast-index formal proof preconditions 2026-04-23 11:42:27 -04:00
Cargo.toml Add README, classpath test suite, and test JAR fixture 2026-04-22 09:58:14 -04:00
README.md Add .jmod support for full JDK type coverage from JAVA_HOME 2026-04-22 10:54:55 -04:00

java-indexer

A Java source code indexer built on ast-index. Parses Java source files using tree-sitter, extracts definitions, imports, and references with full Java language semantics, builds a cross-file index, and provides an interactive query REPL.

Given a Java project, the indexer answers questions like:

  • Where is com.example.model.User defined and where is it used?
  • What classes are in the com.example.service package?
  • Who calls getName() across the entire codebase?
  • What does UserService import and from where?

Quick Start

# Build
cargo build --release

# Index a project and enter the query REPL
./target/release/java-indexer /path/to/your/java/project

# With dependency resolution from JARs
./target/release/java-indexer /path/to/project --classpath /path/to/lib

On startup, the indexer scans all .java files, builds the cross-file index, and drops you into an interactive REPL:

Indexing Java project: /path/to/project
Classpath: 42 JAR(s) configured
Build complete:
{
  "files_scanned": 19,
  "files_cached": 19,
  "files_failed": 0,
  "unresolved_modules": [
    "java.util.stream",
    "java.util.concurrent"
  ],
  "errors": []
}

Interactive query REPL. Commands:
  {"symbol": "com.example.*", "kind": "class"}  -- query the index
  help                                           -- show this help
  stats                                          -- show build stats
  exit / quit / Ctrl+D                           -- exit

query>

Installation

Requires Rust 1.85+ (edition 2024).

git clone <repo>
cd ast-index-languages/java
cargo build --release

The binary is at ./target/release/java-indexer.

CLI Reference

java-indexer [OPTIONS] <REPO_PATH>

Arguments

Argument Required Description
REPO_PATH Yes Path to the Java project root to index

Options

Flag Description
--classpath <PATH:PATH:...> Colon-separated list of JAR files or directories containing JARs for dependency resolution. Directories are searched recursively.
--no-auto-classpath Disable automatic classpath detection from pom.xml / build.gradle.
--log-level <LEVEL> Log verbosity: error, warn (default), info, debug, trace.

Examples

# Basic usage -- index with JDK stub fallback only
java-indexer /path/to/project

# Explicit classpath -- point at your dependency JARs
java-indexer /path/to/project --classpath /path/to/lib:/path/to/other-lib

# Single JAR
java-indexer /path/to/project --classpath /path/to/guava-31.1.jar

# Maven project -- auto-detects from pom.xml + ~/.m2/repository
java-indexer /path/to/maven-project

# Disable auto-detection
java-indexer /path/to/project --no-auto-classpath

# Verbose logging to see dependency resolution
java-indexer /path/to/project --classpath /path/to/lib --log-level debug

Query Reference

Queries are JSON objects submitted at the query> prompt.

Format

{"symbol": "<search pattern>", "kind": "<optional kind filter>"}

Search Patterns

Pattern Meaning Example
com.example.model.User Exact fully-qualified name Find one specific class
com.example.model.* All symbols in a package Browse a package
com.example.model.User.* All members of a class See methods, fields, inner classes
com.example.*.* All symbols in any sub-package Explore a namespace
*.User Symbol named "User" in any package Find by short name
*.*Utils Any symbol ending in "Utils" Find utility classes
*.getName Method named "getName" anywhere Find method definitions
com.example.model.User.Builder Inner class by FQN Navigate nested types
com.example.model.UserStatus.ACTIVE Enum member by FQN Find specific constants

Kind Filters

Kind Matches
class Classes (not interfaces or enums)
type_def All type definitions (classes, interfaces, records)
interface Interfaces only
enum Enum types
enum_member Individual enum constants
function / method Methods
constructor Constructors
field Fields
variable Local variables (limited)
const Constants
type_alias Type aliases

REPL Commands

Command Description
stats Show build statistics (files scanned, unresolved modules)
help Show query format help
exit / quit / Ctrl+D Exit the REPL

Output Format

Each query returns an array of traces. A trace connects a symbol's definition to all the places it's used across the codebase:

{
  "definition": {
    "type": "project_file",
    "file": "src/main/java/com/example/model/User.java",
    "span": { "start": 180, "end": 1894 },
    "def": {
      "name": "User",
      "qualified_name": "com.example.model.User",
      "kind": "type_def",
      "exported": true,
      "language_data": {
        "visibility": "public",
        "type_kind": "class",
        "is_abstract": false,
        "is_final": false,
        "is_sealed": false
      },
      "parents": [
        { "kind": "extends", "type_ref": "Entity" }
      ]
    }
  },
  "chain": [
    {
      "file": "src/main/java/com/example/service/UserService.java",
      "kind": "re_export",
      "span": { "start": 30, "end": 60 },
      "original_name": "User"
    }
  ],
  "usage_sites": [
    {
      "file": "src/main/java/com/example/service/UserService.java",
      "span": { "start": 374, "end": 378 },
      "usage": "type"
    },
    {
      "file": "src/main/java/com/example/model/Admin.java",
      "span": { "start": 204, "end": 208 },
      "usage": "type"
    }
  ]
}

Definition types:

  • project_file -- defined in a project source file (has file and span)
  • dependency -- resolved from a JAR on the classpath or JDK stubs (has package)
  • unresolved -- imported but could not be resolved (has package)

Usage types: type (type reference), read (value read), write (assignment), read_write

Query Examples

Find a class and see where it's used:

query> {"symbol": "com.example.model.User", "kind": "class"}

Find all classes in a package:

query> {"symbol": "com.example.model.*", "kind": "class"}

Find a method across the codebase:

query> {"symbol": "*.getName", "kind": "function"}

Query a JDK type (resolved through dependency stubs):

query> {"symbol": "java.lang.String", "kind": "type_def"}

Returns a dependency definition with span: {start: 0, end: 0} since it comes from stubs, not source.

Find all members of a class:

query> {"symbol": "com.example.model.User.*"}

Dependency Resolution

The indexer resolves references to external libraries (JDK, third-party JARs) through the --classpath flag or auto-detection.

How It Works

  1. At index time: imports like import java.util.List; are recorded as unresolved modules
  2. At query time: when you query for a symbol in an unresolved module, the indexer lazily resolves it:
    • Scans JAR central directories to build a package -> JAR index (fast -- only reads ZIP TOC)
    • Parses .class files from matching JARs using cafebabe
    • Extracts types, methods, fields, constructors with full modifiers and type information
    • Caches the results for subsequent queries
  3. Fallback: built-in JDK stubs for java.lang, java.util, java.io provide coverage when no classpath is configured

Classpath Configuration

Explicit classpath (most reliable):

# Point at a directory of JARs (searched recursively)
java-indexer /project --classpath /project/lib

# Multiple entries, colon-separated (standard Java convention)
java-indexer /project --classpath /project/lib:/other/deps:/path/to/specific.jar

# After running Maven dependency:copy
mvn dependency:copy-dependencies -DoutputDirectory=target/deps
java-indexer /project --classpath target/deps

Maven auto-detection (when pom.xml exists):

# Auto-detects dependencies from pom.xml, resolves from ~/.m2/repository
java-indexer /path/to/maven-project

Reads <dependency> blocks from pom.xml, constructs paths like:

~/.m2/repository/com/google/guava/guava/31.1-jre/guava-31.1-jre.jar

Limitations of auto-detection:

  • Does not resolve Maven properties/variables (${project.version})
  • Does not follow parent POM inheritance
  • Does not resolve transitive dependencies
  • Only includes compile/runtime scope (excludes test/provided)

For full fidelity, use explicit --classpath with the output of mvn dependency:build-classpath -q.

JDK coverage:

When JAVA_HOME is set (Java 9+), JDK module files (.jmod) are auto-detected from $JAVA_HOME/jmods/, providing full JDK type coverage with complete method signatures and generics. This happens automatically alongside Maven/Gradle detection.

Without JAVA_HOME, built-in stubs cover common types in java.lang, java.util, and java.io as a fallback.

For Java 8, add the JDK to the classpath manually:

java-indexer /project --classpath $JAVA_HOME/jre/lib/rt.jar

What Gets Extracted from .class Files

For each public/protected class in a JAR:

Element Extracted Data
Type Name, qualified name, kind (class/interface/enum/record/annotation), modifiers, parents (extends/implements)
Methods Name, return type, parameter types and names (if available), modifiers (public/static/abstract/etc.)
Constructors Parameter types and names, modifiers
Fields Name, type, modifiers (static/final/transient/volatile)
Enum constants Name

Private and package-private members are skipped. Generic type arguments are fully extracted from the JVM Signature attribute (e.g., List<String>, Map<K, V>, ? extends Number), falling back to erased descriptor types when no signature is present.

Architecture

Source files (.java)
    |
    v
[tree-sitter-java parser] --> Concrete Syntax Tree
    |
    v
[Extraction]  --> AnalysisResult { package, imports, definitions, references }
    |                                    |
    |    .scm query files define         |  Each reference gets a SymbolOrigin:
    |    what to capture:                |    - Import (from an import statement)
    |    - definitions.scm              |    - Local (same-file definition)
    |    - imports.scm                  |    - Global (unresolved)
    |    - references.scm              |
    v                                    v
[ast-index::ProjectIndex::build()]  --> Cross-file index
    |
    v
[Query + Trace]  --> SymbolTrace { definition, chain, usage_sites }
    |                      |
    |                      |  Lazy dependency resolution:
    |                      |    1. Try classpath JARs (parse .class files)
    |                      |    2. Fall back to JDK stubs
    v
[JSON output]

Module Map

Module Purpose
analyzer.rs FileAnalyzer trait implementation, JDK stubs
extraction/ Tree-sitter query execution and symbol extraction
extraction/definitions.rs Type, method, field, constructor extraction
extraction/imports.rs Import statement extraction (regular, wildcard, static)
extraction/references.rs Reference extraction and origin resolution
classpath/mod.rs Classpath struct -- lazy JAR-based dependency resolver
classpath/classfile.rs .class file to SymbolDef conversion
classpath/descriptors.rs JVM type descriptors to ast-index types
classpath/signatures.rs JVM generic signature parser (type arguments, wildcards, bounds)
classpath/jar.rs JAR/ZIP scanning and class extraction
classpath/autodetect.rs Maven/Gradle classpath auto-detection
lang_data.rs Java-specific metadata types (visibility, modifiers)
output.rs JSON serialization wrappers
main.rs CLI and REPL

For detailed architecture documentation, see AGENTS.md.

Testing

Running Tests

# All tests (unit + integration + diagnostics)
cargo test

# Unit tests only (extraction correctness)
cargo test --test java_language_tests

# Integration tests only (cross-file index queries)
cargo test --test index_integration_tests

# Diagnostic dumps (use --nocapture to see output)
cargo test --test dump_analysis -- --nocapture

Test Categories

Category File Tests What It Validates
Unit tests/java_language_tests.rs 132 Extraction correctness per Java Language Specification. Each test parses a Java snippet and asserts the extracted definitions, imports, and references match JLS semantics.
Integration tests/index_integration_tests.rs 102 Full pipeline (parse -> extract -> index -> query -> trace). Builds a 19-file fixture project and runs queries against it.
Diagnostic tests/dump_analysis.rs 1 Dumps raw AnalysisResult for method call chains.
Diagnostic tests/dump_getname.rs 1 Dumps getName() call site analysis.
Classpath tests/classpath_tests.rs 32 JAR scanning, .class extraction, generic signatures, Classpath struct, analyzer integration, full index with classpath.
Inline src/classpath/signatures.rs 21 JVM generic signature parser (field, method, class signatures).
Inline src/classpath/descriptors.rs 6 JVM descriptor/name conversion helpers.
Inline src/classpath/autodetect.rs 2 Maven pom.xml parsing and path construction.

Remaining Test Gaps

Most classpath functionality is now tested (32 tests in classpath_tests.rs + 21 inline signature parser tests). Remaining gaps:

Priority Area What to Test
Medium Auto-detect e2e Create a temp directory with a pom.xml and matching JARs in ~/.m2-like structure. Verify detect_classpath() finds the right JARs.
Low Edge cases Empty JARs, corrupt .class files, JARs with only module-info.class, inner/anonymous classes, multi-release JARs.

Known Limitations

  • Lambda parameter type inference -- only works via the collection generic type argument heuristic. See AGENTS.md for details.
  • Maven auto-detection -- does not resolve properties, parent POMs, or transitive dependencies. Use mvn dependency:build-classpath for full fidelity.

License

Apache-2.0