Conversation
There was a problem hiding this comment.
Pull request overview
This PR separates C++ parsing into a dedicated CPPParser, removing C++-specific behavior from the existing C parser so C++ parsing can evolve independently without risking regressions in parser_c.go.
Changes:
- Added a new Tree-sitter–based C++ parser (
CPPParser) with namespace/template awareness and class member extraction. - Updated CLI parse command routing to use
c→CParserandcpp→CPPParser. - Extended test fixtures and unit tests to cover parsing of a new C++ header (
container.hpp) and to validate C++ parsing separately from C.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
testdata/c/container.hpp |
Adds C++ header fixtures (namespaces, template class, struct, enum) for parser validation. |
indexer/parser_test.go |
Splits C and C++ tests; adds assertions for C++ header parsing. |
indexer/parser_cpp.go |
Introduces new C++ parser implementation using Tree-sitter C++ grammar. |
indexer/parser_c.go |
Restricts C parser to .c/.h and removes C++ grammar handling. |
cmd/code-index/cmd/parse.go |
Routes c and cpp languages to different parsers and updates error messages accordingly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Agent-Logs-Url: https://github.com/posit-dev/code-index/sessions/fd6b6fd0-d232-4b04-91f7-0951b3514988 Co-authored-by: cm421 <56326543+cm421@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
jonyoder
left a comment
There was a problem hiding this comment.
Nice work! This is a solid first contribution — thoughtful separation of concerns, namespace and template tracking that the old combined parser lacked, and good test coverage with the container.hpp fixture. Verified locally: go build, go test ./..., and golangci-lint run all pass.
Real issues
- Doc/code mismatch on
.hextension.docs/configuration.mdlists.hunder bothcandcppparsers, butparser_cpp.goonly handles.cpp/.cc/.hpp/.cxx/.hxx—.his intentionally absent. That's actually the right choice (prevents double-parsing when a project configures bothcandcppsources over the same tree), but the docs need to reflect it. Either drop.hfrom the cpp row, or add an explicit note that.hfiles go to the C parser by convention.
Worth discussing (not blocking)
-
Template qualified name format. The test asserts
"template<typename T> testlib::collections::Container"as the typeName. That's functional but reads awkwardly in search results as[class] template<typename T> testlib::collections::Container. Consider moving template parameters toSignature(which we already use for functions) and keepingNameas just the qualified name. Not a blocker — if it indexes and searches well in practice, it's fine. -
extern "C"blocks. C++ headers frequently wrap declarations inextern "C" { ... }(tree-sitter'slinkage_specificationnode). Neither the old nor the new parser recurses into these, so declarations inside are currently missed. Not a regression, but a candidate for a follow-up. -
Duplication between
parser_c.goandparser_cpp.go—parseFile, the walker,extractStructOrUnion,extractEnum,extractTypedefare structurally similar. I don't think pulling them into shared helpers would help — the CPP versions need namespace/template context, and the signatures get ugly. Keeping them separate is fine; just wanted to acknowledge I thought about it.
Nits
- The 200-char signature truncation is repeated in
buildSignature,buildDeclarationSignature,buildFieldDeclSignature,extractConstructorDecl, andextractCSignature. Could be aconst maxSignatureLen = 200with a helper. Not worth a round-trip if you'd rather land as-is.
Verified
go build ./...cleango test ./... -count=1all pass includingTestCParserandTestCPPParsergolangci-lint run ./...— 0 issues- No regressions in existing parser tests
Approving once the docs are fixed. Thanks again for the contribution!
|
Sorry @jonyoder to push changes after you approved, but I wanted to address your nit, and the 2nd issue you brought up. I'm not 100% confident I addressed your 2nd issue, and if not I'll probably revert it and take a stab at it next PR. I think the extern "C" callout is excellent and will include that in a follow on PR |
jonyoder
left a comment
There was a problem hiding this comment.
Good iteration! Just one remaining issue and one nit.
Issue
The note breaks the markdown table. In docs/configuration.md the blockquote is inserted between rows of the language table:
| `cpp` | `.cpp`, `.cc`, `.hpp`, `.cxx`, `.hxx` | tree-sitter — ... |
> **Note:** `.h` files are handled by the `c` parser...
| `r` | `.R`, `.r` | ...
The blank line + blockquote ends the table, so the r and markdown rows below will render as orphaned text (not as table rows). Move the note to after the full table:
| `markdown` | `.md`, `.qmd` | Regex — ... |
> **Note:** `.h` files are handled by the `c` parser, not `cpp`. ...
Nit
maxSignatureLen and truncateSignature live in parser_cpp.go but parser_c.go now uses them too. It's fine functionally since they're in the same package, but conceptually the helper isn't C++-specific. Consider moving to a new parser_common.go or just the top of parser_c.go (the older file). Not a blocker.
Verified
- Template format change is clean:
TypeInfo.Signaturenow carries"template<typename T>",Nameis the qualified name. Reads much better in search results. truncateSignatureproperly replaces all 5 inlined occurrences.go test ./... -count=1— all pass, includingTestCPPParserwith the newSignatureassertion.golangci-lint run ./...— 0 issues.- CI passing.
Once the doc formatting is fixed, happy to approve!
There's already some C++ parsing in the parser_c.go parser, but there were parts missing, and it'd be nice to pull them apart so I can continue to iterate on C++ parsing if needed an not worry about accidentally breaking parser_c.go