Conversation
a199e5d to
f13cccf
Compare
Signed-off-by: Andrew Stein <steinlink@gmail.com>
Signed-off-by: Andrew Stein <steinlink@gmail.com>
f13cccf to
0b47162
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR mainly add
View.with_typed_arrays, a new zero-copy JS API(febaf25f5):with_typed_arrays(window, callback)on the JSView. Callscallback(names, values, validities, dictionaries)with zero-copyTypedArrayviews over the Arrow buffers. Numeric columns map 1:1 toInt32Array/Uint32Array/Float32Array/Float64Array;Dictionary<Int32, Utf8>columns surface as(Int32Array keys, string[] values)pairs; validity bitmaps areUint8Arrayviews. Arrays are only valid inside the callback.float32option downcastsFloat64/Date32/Timestamp/Int64columns toFloat32Arrayfor half-memory GPU uploads.ViewWindow.emit_legacy_row_path_names, whenfalse, group-by columns are named__ROW_PATH_N__to match the SQL backend;with_typed_arraysforces this off. Wired through protobuf (ViewPort.emit_legacy_row_path_names) andView::to_arrow. This is helpful to make sense of which columns are which, if you don't need them to be human readable.As a drive-by, the internals of
Viewregistration have been optimized for memory and CPU performance, originally an experient to share state betweenTableandViewwhen their columns would otherwise be identical (e.g.ctx0), that spilled over into allViewloading and glanced CSV loading as well:t_ctx0/t_ctx1/t_ctx2/t_ctxunit- skip populatingm_delta_pkeys(saves a hopscotch_set insert per row).t_ctx0: new bulk-load fast path for unsorted/unfiltered/empty traversal — appends pkeys directly intom_indexvia newt_ftrav::bulk_load_reserve/append/finalize, skipping them_new_elemshopscotch_map round-trip.t_ftrav::step_endnow short-circuits when no step work happened.Table::init_bulkAPI that aliasesshared_ptr<t_column>s into the master table instead of deep-cloning. Preconditions: empty gstate, allOP_INSERT, uniquepsp_pkey.Table::from_csvuses the new bulk path when the index is implicit (no explicit index, no__INDEX__); otherwise falls back to the flatten-and-merge path, since those allow duplicate pkeys.fill_master_table/update_master_tablenow takeshared_ptr<t_data_table>and alias columns (zero-copy) instead ofclone()-ing each column.