Skip to content

Accenture/MethodAtlas

Repository files navigation

MethodAtlas

MethodAtlas logo

MethodAtlas is a CLI tool that scans Java source trees for JUnit test methods and emits one structured record per discovered method — with optional AI-assisted security classification.

It is built for teams that must demonstrate test coverage of security properties to auditors, regulators, or security review boards: it separates deterministic source analysis from optional AI interpretation so that every result is traceable, repeatable, and defensible.

Why MethodAtlas

Security-focused teams in regulated industries need more than a passing test suite. They need to demonstrate which tests cover which security controls, at a level of detail that satisfies external review.

MethodAtlas addresses this by turning an existing Java test suite (JUnit 5, JUnit 4, or TestNG — detected automatically) into a structured inventory with minimal setup:

Challenge What MethodAtlas provides
"Show us your security test coverage" AI-classified inventory with rationale per method
"Prove the tests haven't changed since last audit" Per-class SHA-256 content fingerprints (-content-hash)
"Integrate this into our SAST pipeline" Native SARIF 2.1.0 output, compatible with GitHub Advanced Security, VS Code, Azure DevOps, and SonarQube
"We can't send source code to external AI APIs" Local inference via Ollama, or a two-phase manual AI workflow for air-gapped environments
"Classification must be consistent and auditable" Closed, versioned security taxonomy with optional custom taxonomy aligned to your controls framework
"We need confidence scores, not just yes/no" Per-method AI confidence scores (0.0–1.0) for threshold-based filtering and human-review queues
"Annotate the source files for us" Apply-tags mode writes @DisplayName and @Tag annotations directly into source files
"Our @Tag annotations look stale" Tag vs AI drift detection flags disagreements between source annotations and AI classification

Key capabilities

  • Deterministic test discovery — JavaParser AST analysis; no inference, no false positives on method existence; JUnit 5, JUnit 4, and TestNG detected automatically from import declarations
  • SARIF 2.1.0 output — first-class integration with static analysis platforms and IDE tooling
  • AI security classification — classifies each test method against a closed security taxonomy; supports Ollama, OpenAI, Anthropic, Azure OpenAI, Groq, xAI, GitHub Models, Mistral, and OpenRouter
  • Confidence scoring — per-method decimal score (-ai-confidence); filter by threshold for audit packages
  • Content hash fingerprints — SHA-256 of the class AST text (-content-hash); all methods in the same class share the same hash; enables incremental scanning and change detection
  • AI result cache — reuse previous AI classifications by hash (-ai-cache); unchanged classes cost zero API calls
  • Tag vs AI drift detection-drift-detect flags methods where @Tag("security") in source disagrees with the AI classification
  • Classification overrides-override-file records human-reviewed corrections; overrides persist across re-runs and set confidence to 1.0 or 0.0
  • Delta report-diff compares two CSV scans and emits a change report: methods added, removed, or modified between runs; useful for CI regression gates
  • Security-only filter-security-only suppresses non-security methods from CSV/plain output; applied automatically in SARIF mode
  • Mismatch limit-mismatch-limit safety gate for -apply-tags-from-csv; aborts without touching source files when the CSV diverges from the current codebase
  • GitHub Actions annotations-github-annotations emits inline PR annotations for security-relevant methods without requiring a GitHub Advanced Security licence
  • Apply-tags — writes AI-suggested @DisplayName and @Tag annotations back into source files; idempotent
  • Apply-tags-from-csv — applies human-reviewed annotation decisions from a CSV back to source; separates the review step from the write-back
  • Manual AI workflow — two-phase prepare/consume workflow for environments where API access is blocked
  • Local inference — Ollama support keeps source code entirely within your network
  • YAML configuration — share scan settings across a team or CI pipeline without repeating CLI flags
  • Custom taxonomy — supply an external taxonomy file aligned to ISO 27001, NIST SP 800-53, PCI DSS, or your own controls framework
  • Scan provenance-emit-metadata prepends tool version and timestamp to CSV; embed in evidence packages
  • Multiple output modes — CSV (default), plain text, SARIF, and GitHub Actions annotations

Quick start

Build and unpack the distribution archive, then:

cd methodatlas-<version>/bin

# Static scan — outputs fqcn, method, loc, tags
./methodatlas /path/to/project

# AI security classification (local Ollama)
./methodatlas -ai /path/to/project

# SARIF output — pipe to a file for upload to GitHub Advanced Security
./methodatlas -sarif /path/to/project > results.sarif

# SARIF + AI enrichment + content hash fingerprints
./methodatlas -ai -sarif -content-hash /path/to/project > results.sarif

# Apply AI-suggested annotations back into source files
./methodatlas -ai -apply-tags /path/to/tests

# Apply reviewed CSV decisions back into source files
./methodatlas -apply-tags-from-csv reviewed.csv /path/to/tests

# GitHub Actions inline PR annotations
./methodatlas -ai -github-annotations /path/to/tests

See docs/cli-reference.md for the complete option reference.

What MethodAtlas reports

For each discovered JUnit test method, MethodAtlas emits one record.

Source-derived fields:

Field Present when Description
fqcn Always Fully qualified class name
method Always Test method name
loc Always Inclusive line count of the method declaration
tags Always Existing JUnit @Tag values declared on the method
content_hash -content-hash SHA-256 fingerprint of the enclosing class

AI enrichment fields (present when -ai is enabled):

Field Present when Description
ai_security_relevant -ai Whether the model classified the test as security-relevant
ai_display_name -ai Suggested security-oriented @DisplayName value
ai_tags -ai Suggested security taxonomy tags (e.g. security;auth, security;crypto)
ai_reason -ai Short rationale for the classification
ai_interaction_score -ai Fraction of assertions that only verify method calls rather than outcomes (0.0 = all outcome checks, 1.0 = all interaction checks)
ai_confidence -ai + -ai-confidence Model confidence score 0.0–1.0
tag_ai_drift -ai + -drift-detect Disagreement between source @Tag("security") and AI classification

Output modes

CSV (default)

fqcn,method,loc,tags,display_name,ai_security_relevant,ai_display_name,ai_tags,ai_reason,ai_interaction_score
com.acme.auth.LoginTest,testLoginWithValidCredentials,12,,,true,SECURITY: auth - validates session token,security;auth,Verifies session token is issued on successful login.,0.0
com.acme.util.DateTest,format_returnsIso8601,5,,,false,,,,0.1

SARIF 2.1.0

./methodatlas -ai -sarif /path/to/tests > results.sarif

Produces a single valid SARIF 2.1.0 JSON document. Security-relevant methods receive SARIF level note; all other test methods receive level none. Rule IDs are derived from AI taxonomy tags (security/auth, security/crypto, etc.).

SARIF is natively consumed by:

  • GitHub Advanced Security — upload via the upload-sarif action to surface findings in the Security tab
  • VS CodeSARIF Viewer extension renders results inline
  • Azure DevOps — SARIF viewer pipeline extension
  • SonarQube — import via the generic issue import format after conversion

Plain text

./methodatlas -plain /path/to/project

Human-readable line-oriented output, useful for terminal inspection and shell scripting.

GitHub Actions annotations

./methodatlas -ai -github-annotations /path/to/tests

Emits ::notice / ::warning workflow commands that GitHub Actions renders as inline annotations on the PR diff. Does not require a GitHub Advanced Security licence.

See docs/output-formats.md for full format descriptions and examples.

SARIF for regulated environments

SARIF is the OASIS standard interchange format for static analysis results. Adopting it means that MethodAtlas findings can be imported into any SARIF-compatible platform without custom tooling, and the format itself provides a stable, auditable record.

A SARIF result from MethodAtlas includes:

  • Physical location — source file path relative to %SRCROOT% and the method's start line
  • Logical location — fully qualified method name (com.acme.auth.LoginTest.testLoginWithValidCredentials) with kind member
  • Properties bagloc, optional contentHash, and all AI enrichment fields including tagAiDrift

This makes each SARIF finding independently traceable to a specific method in a specific class at a specific revision.

AI security classification

When -ai is enabled, MethodAtlas submits each parsed test class to a configured AI provider for security classification. The model receives:

  1. The closed security taxonomy — a controlled set of tags that constrains what the model can return
  2. The exact list of JUnit methods discovered by the parser — the model cannot invent or skip methods
  3. The full class source as context for semantic interpretation

Because discovery is AST-based and AI classification is constrained by a fixed tag set, the structural inventory is deterministic even when the semantic interpretation uses a language model.

Supported providers

Provider value AI product / platform Deployment Free tier
ollama Any locally installed model Local — source never leaves the machine
auto Ollama → API key fallback Local first, cloud fallback
openai ChatGPT / OpenAI API Cloud No
anthropic Claude / Anthropic API Cloud No
xai Grok / xAI API Cloud Limited
groq Groq (fast LPU inference) Cloud Yes
github_models GitHub Models Cloud Yes (GitHub account)
mistral Mistral AI Cloud (EU) Limited
openrouter Many models via OpenRouter Cloud Yes (free models)
azure_openai Azure OpenAI Service Customer's Azure tenant No

See docs/ai/providers.md for per-provider setup instructions, including which well-known AI assistant corresponds to which provider value.

Confidence scoring

Pass -ai-confidence to add a 0.0–1.0 confidence score per method:

./methodatlas -ai -ai-confidence /path/to/tests | \
  awk -F',' 'NR==1 || ($11+0) >= 0.7'   # keep only high-confidence findings

ai_confidence is column 11 in standard output (column 12 when -content-hash is also passed).

Score Meaning
1.0 Explicitly and unambiguously tests a named security property
~0.7 Clearly tests a security-adjacent concern
~0.5 Plausible but ambiguous; candidate for manual review
0.0 Not security-relevant

See docs/ai/confidence.md for the full interpretation guide.

Content hash fingerprints and incremental scanning

Pass -content-hash to append a SHA-256 fingerprint of each class to every emitted record:

./methodatlas -content-hash -sarif /path/to/tests > results.sarif

The hash is computed from the JavaParser AST text of the enclosing class. All methods in the same class share the same value, and the hash changes only when the class body changes — not when unrelated files are modified.

Practical applications:

  • Incremental scanning — skip classes whose hash has not changed since the last run
  • Audit traceability — correlate a SARIF finding back to the exact class revision that produced it
  • CI change detection — detect modified test classes between two pipeline stages without diffing source files

AI result cache

Pass -ai-cache <prev-scan.csv> to reuse AI classifications from a previous run. Before calling the AI provider for a class, MethodAtlas checks whether that class's content hash appears in the cache file. On a hit, the stored result is used directly — no API call is made. Only changed or new classes incur a provider call.

# Day 1 — full scan; save the result as the cache
./methodatlas -ai -content-hash src/test/java > scan.csv

# Day 2, 3, … — unchanged classes cost nothing
./methodatlas -ai -content-hash -ai-cache scan.csv src/test/java > scan-new.csv

When producing SARIF output, use a two-pass approach: the first pass refreshes the CSV cache (calling AI only for changed classes), the second pass generates SARIF from the cache with zero AI calls.

# Pass 1: refresh cache (AI called only for changed classes)
./methodatlas -ai -content-hash -ai-cache scan.csv src/test/java > scan-new.csv

# Pass 2: generate SARIF from cache — zero AI calls
./methodatlas -ai -content-hash -ai-cache scan-new.csv -sarif src/test/java > results.sarif

See docs/ai/caching.md for the full cache documentation and docs/ci/github-actions.md for the complete GitHub Actions workflow that implements this pattern.

Manual AI workflow

For environments where direct AI API access is blocked by corporate policy, MethodAtlas supports a two-phase manual workflow:

# Phase 1 — write prompts to files
./methodatlas -manual-prepare ./work ./responses /path/to/tests

# (paste each work file's AI prompt into a chat window, save the response)

# Phase 2 — consume responses and emit the enriched CSV (or apply tags)
./methodatlas -manual-consume ./work ./responses /path/to/tests
./methodatlas -manual-consume ./work ./responses -apply-tags /path/to/tests

All taxonomy and confidence flags apply equally in manual mode. The consume phase is incremental — you can process classes as responses arrive rather than waiting for the full batch.

See docs/usage-modes/manual.md for the complete workflow.

YAML configuration

Store shared settings in a YAML file so that CI pipelines and team members use consistent options without repeating flags:

outputMode: sarif
contentHash: true
ai:
  enabled: true
  provider: ollama
  model: qwen2.5-coder:7b
  confidence: true
  taxonomyMode: optimized
./methodatlas -config ./methodatlas.yaml /path/to/tests

Command-line flags always override YAML values. See docs/cli-reference.md for the complete field reference.

Distribution layout

methodatlas-<version>/
├── bin/
│   ├── methodatlas
│   └── methodatlas.bat
└── lib/
    ├── methodatlas-<version>.jar
    └── *.jar  (runtime dependency libraries)

The startup scripts in bin/ configure the classpath automatically to include all JARs in lib/, so no manual setup is required after extraction.

Documentation

Full documentation is available at accenture.github.io/MethodAtlas.

Document Contents
docs/cli-reference.md Complete option reference, YAML schema, exit codes, and example commands
docs/output-formats.md CSV, plain text, SARIF, and GitHub Annotations format descriptions
docs/usage-modes/ All operating modes: static inventory, API AI, manual workflow, apply-tags, apply-tags-from-csv, delta, security-only
docs/ai/providers.md Per-provider setup: Ollama, OpenAI, Anthropic, Azure OpenAI, Groq, xAI, GitHub Models, Mistral, OpenRouter
docs/ai/overrides.md Classification override file: format, governance, and CI integration
docs/ai/confidence.md Confidence scoring: interpretation and threshold guidance
docs/ai/caching.md AI result caching: skip unchanged classes, two-pass SARIF pattern, CI cache key strategy
docs/ai/drift-detection.md Tag vs AI drift detection: detecting stale @Tag("security") annotations
docs/ai/interaction-score.md Placebo-test detection: interaction-score semantics and CI thresholds
docs/compliance.md Compliance framework mapping: OWASP SAMM, NIST SSDF, ISO 27001, DORA; reproducibility statement
docs/deployment/ Regulated environment guidance: PCI-DSS, ISO 27001, NIST SSDF, DORA, SOC 2, air-gapped
docs/deployment/onboarding.md Onboarding a brownfield codebase: six-phase progression from static scan to CI gate
docs/concepts/data-governance.md What data is submitted to AI providers, data residency options, enterprise secret management

About

AI-assisted security test classification for Java/JUnit 5 — CSV, SARIF, and source write-back.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages