Skip to content

Refactor: columnvalues to use strings builder#353

Open
micyen wants to merge 3 commits intodatabricks:mainfrom
micyen:refactor/columnvalues-strings-builder
Open

Refactor: columnvalues to use strings builder#353
micyen wants to merge 3 commits intodatabricks:mainfrom
micyen:refactor/columnvalues-strings-builder

Conversation

@micyen
Copy link
Copy Markdown

@micyen micyen commented May 2, 2026

Summary

Replace repeated string concatenation with strings.Builder in complex type Value() methods to improve performance.

Problem

listValueContainer, mapValueContainer, and structValueContainer build strings using + inside loops, resulting in O(n^2) time complexity.

In cases like COLLECT_SET(...) returning many elements, this can make rows.Next() / rows.Scan() hang, with most time spent in runtime.concatstring2.

Observed Workaround

Changing the SQL from COLLECT_SET() (array type) to ARRAY_JOIN(COLLECT_SET(...), '\n') (string type) bypasses the problem because the driver reads the column as a plain string value, avoiding calls to the complex type Value() methods.

Fix

Use strings.Builder for string construction in all three Value() methods, which reduces time complexity from O(n^2) to O(n).

Testing

Pre-existing unit tests passed, no tests added.

Notes

No behavioral changes, output format identical.

Signed-off-by: Michael Yen 29710093+micyen@users.noreply.github.com

micyen added 3 commits May 2, 2026 15:27
…ValueContainer.Value()

Signed-off-by: Michael Yen <29710093+micyen@users.noreply.github.com>
…alueContainer.Value()

Signed-off-by: Michael Yen <29710093+micyen@users.noreply.github.com>
…ctValueContainer.Value()

Signed-off-by: Michael Yen <29710093+micyen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant