Skip to content

panic: capacity overflow in generate_series / range with large i64 arguments #22188

@Dandandan

Description

@Dandandan

Describe the bug

generate_series and range panic with capacity overflow when given an integer range so large the count exceeds isize::MAX bytes. The panic comes from Vec::reserve inside the integer-range implementation, hit during planning (constant folding of the table-valued function).

To Reproduce

use datafusion::prelude::SessionContext;

#[tokio::main]
async fn main() {
    let ctx = SessionContext::new();
    let _ = ctx
        .sql("SELECT generate_series(0, 9223372036854775807)")
        .await
        .unwrap()
        .create_physical_plan()
        .await;
}

Panic:

thread 'main' panicked at .../alloc/src/raw_vec/mod.rs:28:5:
capacity overflow

Also reproduces with:

  • SELECT range(0, 9223372036854775807)
  • SELECT range(9223372036854775807)
  • SELECT generate_series(-9223372036854775808, 9223372036854775807)

Bounded ranges like SELECT generate_series(1, 100) are fine.

Expected behavior

Return a planning/execution error along the lines of "range too large to materialize" (or, ideally, a streaming implementation that does not need to materialize the full sequence eagerly). The public SQL API should never panic on user-supplied SQL.

Root cause

datafusion/functions-nested/src/range.rs, in generate_range_values:

// line 563-565   (step > 0 branch)
let count =
    (start.abs_diff(limit) / step.unsigned_abs()).saturating_add(1) as usize;
values.reserve(count);                                  // ← panics here

// line 583-585   (step < 0 branch — identical pattern)
let count =
    (start.abs_diff(limit) / step.unsigned_abs()).saturating_add(1) as usize;
values.reserve(count);

For generate_series(0, i64::MAX, 1) the count is ~u64::MAX/8 (after saturating_add(1)), which on a 64-bit target turns into a usize of ~9.2 × 10^18. Vec::<i64>::reserve multiplies by size_of::<i64>() = 8, sees that exceeds isize::MAX, and panics.

Suggested fix

Bound count at allocation time:

const MAX_RANGE_ELEMENTS: usize = isize::MAX as usize / std::mem::size_of::<i64>();
if count > MAX_RANGE_ELEMENTS {
    return exec_err!(
        "range too large: would produce {count} elements (max {MAX_RANGE_ELEMENTS})"
    );
}
values.reserve(count);

A friendlier limit (say, 1 GiB / 8 B = 128 M elements, configurable) would also stop this from being a memory-exhaustion DoS.

Additional context

Found by a cargo fuzz target (fuzz/fuzz_targets/sql_physical_plan.rs) seeded with SQL extracted from datafusion/sqllogictest/test_files/. The fuzzer triggered it from a mutated generate_series example by replacing a small numeric literal with 9223372036854775807 (i64::MAX).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions