Skip to content

Add a distinct C++ parser#4

Merged
jonyoder merged 6 commits intomainfrom
cm/add-cpp-parser
Apr 23, 2026
Merged

Add a distinct C++ parser#4
jonyoder merged 6 commits intomainfrom
cm/add-cpp-parser

Conversation

@cm421
Copy link
Copy Markdown
Contributor

@cm421 cm421 commented Apr 10, 2026

There's already some C++ parsing in the parser_c.go parser, but there were parts missing, and it'd be nice to pull them apart so I can continue to iterate on C++ parsing if needed an not worry about accidentally breaking parser_c.go

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR separates C++ parsing into a dedicated CPPParser, removing C++-specific behavior from the existing C parser so C++ parsing can evolve independently without risking regressions in parser_c.go.

Changes:

  • Added a new Tree-sitter–based C++ parser (CPPParser) with namespace/template awareness and class member extraction.
  • Updated CLI parse command routing to use cCParser and cppCPPParser.
  • Extended test fixtures and unit tests to cover parsing of a new C++ header (container.hpp) and to validate C++ parsing separately from C.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
testdata/c/container.hpp Adds C++ header fixtures (namespaces, template class, struct, enum) for parser validation.
indexer/parser_test.go Splits C and C++ tests; adds assertions for C++ header parsing.
indexer/parser_cpp.go Introduces new C++ parser implementation using Tree-sitter C++ grammar.
indexer/parser_c.go Restricts C parser to .c/.h and removes C++ grammar handling.
cmd/code-index/cmd/parse.go Routes c and cpp languages to different parsers and updates error messages accordingly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread indexer/parser_cpp.go Outdated
Comment thread indexer/parser_cpp.go Outdated
Comment thread cmd/code-index/cmd/parse.go
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@cm421 cm421 requested a review from jonyoder April 22, 2026 21:36
Copy link
Copy Markdown
Collaborator

@jonyoder jonyoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! This is a solid first contribution — thoughtful separation of concerns, namespace and template tracking that the old combined parser lacked, and good test coverage with the container.hpp fixture. Verified locally: go build, go test ./..., and golangci-lint run all pass.

Real issues

  1. Doc/code mismatch on .h extension. docs/configuration.md lists .h under both c and cpp parsers, but parser_cpp.go only handles .cpp/.cc/.hpp/.cxx/.hxx.h is intentionally absent. That's actually the right choice (prevents double-parsing when a project configures both c and cpp sources over the same tree), but the docs need to reflect it. Either drop .h from the cpp row, or add an explicit note that .h files go to the C parser by convention.

Worth discussing (not blocking)

  1. Template qualified name format. The test asserts "template<typename T> testlib::collections::Container" as the type Name. That's functional but reads awkwardly in search results as [class] template<typename T> testlib::collections::Container. Consider moving template parameters to Signature (which we already use for functions) and keeping Name as just the qualified name. Not a blocker — if it indexes and searches well in practice, it's fine.

  2. extern "C" blocks. C++ headers frequently wrap declarations in extern "C" { ... } (tree-sitter's linkage_specification node). Neither the old nor the new parser recurses into these, so declarations inside are currently missed. Not a regression, but a candidate for a follow-up.

  3. Duplication between parser_c.go and parser_cpp.goparseFile, the walker, extractStructOrUnion, extractEnum, extractTypedef are structurally similar. I don't think pulling them into shared helpers would help — the CPP versions need namespace/template context, and the signatures get ugly. Keeping them separate is fine; just wanted to acknowledge I thought about it.

Nits

  1. The 200-char signature truncation is repeated in buildSignature, buildDeclarationSignature, buildFieldDeclSignature, extractConstructorDecl, and extractCSignature. Could be a const maxSignatureLen = 200 with a helper. Not worth a round-trip if you'd rather land as-is.

Verified

  • go build ./... clean
  • go test ./... -count=1 all pass including TestCParser and TestCPPParser
  • golangci-lint run ./... — 0 issues
  • No regressions in existing parser tests

Approving once the docs are fixed. Thanks again for the contribution!

@cm421
Copy link
Copy Markdown
Contributor Author

cm421 commented Apr 23, 2026

Sorry @jonyoder to push changes after you approved, but I wanted to address your nit, and the 2nd issue you brought up. I'm not 100% confident I addressed your 2nd issue, and if not I'll probably revert it and take a stab at it next PR.

I think the extern "C" callout is excellent and will include that in a follow on PR

@cm421 cm421 requested a review from jonyoder April 23, 2026 19:25
Copy link
Copy Markdown
Collaborator

@jonyoder jonyoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good iteration! Just one remaining issue and one nit.

Issue

The note breaks the markdown table. In docs/configuration.md the blockquote is inserted between rows of the language table:

| `cpp` | `.cpp`, `.cc`, `.hpp`, `.cxx`, `.hxx` | tree-sitter — ... |

> **Note:** `.h` files are handled by the `c` parser...
| `r` | `.R`, `.r` | ...

The blank line + blockquote ends the table, so the r and markdown rows below will render as orphaned text (not as table rows). Move the note to after the full table:

| `markdown` | `.md`, `.qmd` | Regex — ... |

> **Note:** `.h` files are handled by the `c` parser, not `cpp`. ...

Nit

maxSignatureLen and truncateSignature live in parser_cpp.go but parser_c.go now uses them too. It's fine functionally since they're in the same package, but conceptually the helper isn't C++-specific. Consider moving to a new parser_common.go or just the top of parser_c.go (the older file). Not a blocker.

Verified

  • Template format change is clean: TypeInfo.Signature now carries "template<typename T>", Name is the qualified name. Reads much better in search results.
  • truncateSignature properly replaces all 5 inlined occurrences.
  • go test ./... -count=1 — all pass, including TestCPPParser with the new Signature assertion.
  • golangci-lint run ./... — 0 issues.
  • CI passing.

Once the doc formatting is fixed, happy to approve!

@jonyoder jonyoder merged commit e2b4cd5 into main Apr 23, 2026
2 checks passed
@jonyoder jonyoder deleted the cm/add-cpp-parser branch April 23, 2026 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants