Proposal: First-class integration with OpenCodeReview — benchmark results & collaboration offer

Hi @colbymchenry,

I'm the author of [OpenCodeReview](https://alibaba.github.io/open-code-review) (OCR), an open-source AI code review CLI. We recently integrated CodeGraph as an MCP tool provider in our review pipeline and ran a benchmark across **200 real-world pull requests** from production repositories. The results were impressive enough that I wanted to share them and explore a deeper collaboration.

## What we did

OCR spawns CodeGraph via MCP (`codegraph serve --mcp`) during code review. The agent uses `codegraph_explore` alongside our built-in tools (`file_read`, `code_search`, etc.) to understand code structure before generating review comments. We evaluated on the same 200 PRs (6 were auto-skipped as test-only changes) with and without CodeGraph, using Claude Opus 4.6.

## Benchmark results (200 PRs, Claude Opus 4.6)

### Review quality improvement

| Metric | Without CodeGraph | With CodeGraph | Change |
|--------|------------------|----------------|--------|
| **Precision** | 30.6% (285/931) | **31.9%** (307/963) | **+1.3pp** |
| **Recall** | 18.9% (285/1505) | **20.4%** (307/1505) | **+1.5pp** |
| True positive issues found | 285 | **307** | **+7.7%** |
| Zero-line comments (low quality) | 1.27% | **0.72%** | **-43%** |

CodeGraph helped the agent find **22 more real issues** across the same PR set while simultaneously reducing low-quality (zero-line) comments by 43%.

### Efficiency improvement

| Metric | Without CodeGraph | With CodeGraph | Change |
|--------|------------------|----------------|--------|
| Total tool calls | 7,363 (avg 37/PR) | **7,107** (avg 36/PR) | **-3.5%** |
| `file_read` calls | 3,101 (avg 15) | **2,779** (avg 14) | **-10.4%** |
| `code_search` calls | 2,253 (avg 11) | **1,973** (avg 10) | **-12.4%** |
| `file_find` calls | 128 | **114** | **-10.9%** |
| Wall-clock time | 4h 46m 21s | **4h 41m 32s** | **-1.7%** |

Even though CodeGraph added 328 `codegraph_explore` calls, **net tool calls still decreased** — the agent read fewer files and ran fewer searches because `codegraph_explore` gave it the structural context it needed upfront. This aligns perfectly with the design goal described in CodeGraph's docs: making the agent's answer sufficient enough to stop it from reading.

### Token usage

| Metric | Without CodeGraph | With CodeGraph |
|--------|------------------|----------------|
| Total tokens | 79.5M | 89.6M |
| Output tokens | 2.0M | 2.0M |
| Avg per PR | 410K | 462K |

Input tokens increased ~13% (CodeGraph's context), but output tokens stayed flat — the agent used the extra context to make better decisions, not to write more.

## Current integration approach

Right now, OCR treats CodeGraph as an external MCP server configured by the user:

```json
{
  "mcp_servers": {
    "codegraph": {
      "command": "codegraph",
      "args": ["serve", "--mcp"],
      "setup": "codegraph init && codegraph index"
    }
  }
}
```

This works, but requires users to install CodeGraph separately and configure it manually. We'd like to make it zero-config.

## Proposal: make CodeGraph a built-in provider in OCR

We want to ship CodeGraph as a **built-in, zero-configuration** code intelligence provider in OCR. When a user runs `ocr review`, OCR would automatically:

1. Discover a locally installed `codegraph` binary (PATH lookup)
2. If not found, download the platform-specific bundle from GitHub Releases (same self-heal logic as `npm-shim.js`) and cache it in `~/.codegraph/bundles/`
3. Run `codegraph init && codegraph index` on the target repo
4. Connect via MCP and register tools
5. Clean up after review

Users can disable it with `ocr config set codegraph.enabled false` or `OCR_NO_CODEGRAPH=1`.

### What we need from CodeGraph (all optional, nothing blocking)

The integration works today with no changes to CodeGraph. But these would improve the experience:

1. **`--no-daemon` / `--no-watch` CLI flags** — We currently pass `CODEGRAPH_NO_DAEMON=1` and `CODEGRAPH_NO_WATCH=1` as env vars. Explicit flags would make the subprocess invocation more self-documenting.

2. **Stable version manifest** — A `version.json` release asset listing the latest version and checksums would let us check for updates without hitting GitHub API rate limits.

3. **`codegraph index --timeout <duration>`** — For large repos, we kill the index process externally after a timeout. A built-in timeout with graceful partial-index checkpoint would be cleaner.

None of these are blockers — we can ship the integration with CodeGraph as-is.

## What we offer in return

- **Promotion in OCR's README and docs**: CodeGraph as the recommended code intelligence provider, with a link to the CodeGraph repo.
- **Real-world case study**: We're happy to be featured as an integration case study in CodeGraph's docs/README — an AI code review tool that uses CodeGraph to improve review quality by 7.7% on real PRs.
- **Upstream contributions**: We're willing to submit PRs for any of the improvements above.
- **Ongoing benchmark data**: As we iterate, we can share updated benchmark results to help validate CodeGraph improvements.

## About OpenCodeReview

- Open-source AI code review CLI for Git repositories
- Supports multiple LLM providers (Anthropic, OpenAI, custom)
- Distributed via npm (`@alibaba-group/open-code-review`) and GitHub Releases
- Go binary, cross-compiled for 6 platforms (darwin/linux/windows x amd64/arm64)
- GitHub: https://github.com/alibaba/open-code-review

Looking forward to your thoughts! Happy to discuss any technical details or alternative approaches.


Metric	Without CodeGraph	With CodeGraph	Change
Total tool calls	7,363 (avg 37/PR)	7,107 (avg 36/PR)	-3.5%
`file_read` calls	3,101 (avg 15)	2,779 (avg 14)	-10.4%
`code_search` calls	2,253 (avg 11)	1,973 (avg 10)	-12.4%
`file_find` calls	128	114	-10.9%
Wall-clock time	4h 46m 21s	4h 41m 32s	-1.7%

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: First-class integration with OpenCodeReview — benchmark results & collaboration offer #1056

What we did

Benchmark results (200 PRs, Claude Opus 4.6)

Review quality improvement

Efficiency improvement

Token usage

Current integration approach

Proposal: make CodeGraph a built-in provider in OCR

What we need from CodeGraph (all optional, nothing blocking)

What we offer in return

About OpenCodeReview

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Metric	Without CodeGraph	With CodeGraph	Change
Precision	30.6% (285/931)	31.9% (307/963)	+1.3pp
Recall	18.9% (285/1505)	20.4% (307/1505)	+1.5pp
True positive issues found	285	307	+7.7%
Zero-line comments (low quality)	1.27%	0.72%	-43%

Metric	Without CodeGraph	With CodeGraph
Total tokens	79.5M	89.6M
Output tokens	2.0M	2.0M
Avg per PR	410K	462K

Proposal: First-class integration with OpenCodeReview — benchmark results & collaboration offer #1056

Description

What we did

Benchmark results (200 PRs, Claude Opus 4.6)

Review quality improvement

Efficiency improvement

Token usage

Current integration approach

Proposal: make CodeGraph a built-in provider in OCR

What we need from CodeGraph (all optional, nothing blocking)

What we offer in return

About OpenCodeReview

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions