Skip to content

bugfix (routing): fix min_vehicles bug#1405

Open
np96 wants to merge 7 commits into
NVIDIA:mainfrom
np96:bug/min_vehicles_fix
Open

bugfix (routing): fix min_vehicles bug#1405
np96 wants to merge 7 commits into
NVIDIA:mainfrom
np96:bug/min_vehicles_fix

Conversation

@np96

@np96 np96 commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Description

Fix min_vehicles ignored with vehicle fixed costs (#904)

When vehicle_fixed_costs were set and min_vehicles == fleet_size, the
solver returned fewer vehicles than min_vehicles and, in debug builds,
crashed with device-side asserts. Two coordinated issues:

  1. move_candidates.cuh: move_path::reset() zeroed n_insertions via
    device_scalar::set_value_async(), which copies from a host stack local.
    reset() runs inside a captured CUDA graph, so that host source dangled
    by graph-launch time and left n_insertions garbage (~32k) — perform_moves
    then launched tens of thousands of out-of-bounds blocks. Use the
    memset-based set_value_to_zero_async(), which is capture-safe.

  2. ox_recombiner.cuh: in fixed_route mode (fleet_size == min_vehicles), the
    VEHICLE_FIXED_COST branch enabled optimal-routes search, letting
    Bellman-Ford vary the route count. That contradicts the fixed vehicle
    count: it broke the recreate_solution invariant (routes removed == routes
    added) and drifted the solution below min_vehicles. Skip optimal-routes
    search when the count is fixed, and reject any count-mismatched offspring
    instead of applying it.

Issue

#904

Checklist

  • I am familiar with the Contributing Guidelines.
  • Testing
    • New or existing tests cover these changes
    • Added tests
    • Created an issue to follow-up
    • NA
  • Documentation
    • The documentation is up to date with these changes
    • Added new documentation
    • NA

@np96 np96 requested review from a team as code owners June 8, 2026 19:51
@np96 np96 requested review from Iroy30, hlinsen and kaatish June 8, 2026 19:51
@copy-pr-bot

copy-pr-bot Bot commented Jun 8, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Solver selection now treats fleet_size >= min_vehicles as the fixed-route path; OX recombiner skips variable route-count search in fixed-route mode and rejects offspring on route-count mismatch; cycle finder adds buffer bounds checks; move-candidates uses async zeroing; tests add a parametrized regression ensuring min_vehicles is respected.

Changes

Issue #904: Fixed-route and min-vehicles behavior enforcement

Layer / File(s) Summary
Solver target-vehicles selection logic
cpp/src/routing/solver.cu
Set target_vehicles when fleet_size >= min_vehicles (uses cached min_vehicles) and update SPDX year.
Fixed-route recombination enforcement
cpp/src/routing/crossovers/ox_recombiner.cuh
Disable optimal_routes_search when fixed_route is true and reject offspring in recreate_solution if route-counts mismatch (assert + return false).
Cycle finder buffer safety
cpp/src/routing/local_search/cycle_finder/cycle.hpp
Add utilities/macros.cuh include and expand ret_cycles_t::view_t::push_back / append_cycle with cuopt_assert bounds checks for paths and offsets.
Move-candidates async reset utility
cpp/src/routing/local_search/move_candidates/move_candidates.cuh
Use set_value_to_zero_async on provided stream to clear n_insertions in move_path_t::reset.
Regression test for issue #904
python/cuopt/cuopt/tests/routing/test_solver.py
Add pytest import, _ISSUE_904_COST_MATRIX, _build_min_vehicles_data_model(...), and a single @pytest.mark.parametrize test validating min_vehicles is respected for non-zero and zero vehicle fixed costs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly identifies the main change as a bug fix for the min_vehicles issue, which aligns with the changeset's primary purpose.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description check ✅ Passed The pull request description clearly relates to the changeset, providing specific technical details about the bug fixes in three files and their purposes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@np96 np96 force-pushed the bug/min_vehicles_fix branch 3 times, most recently from f51933b to de69f9e Compare June 8, 2026 20:18
@np96 np96 changed the title bugfix (routing): fix incorrect min_vehicle_count bugfix (routing): fix incorrect min_vehicles Jun 9, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python/cuopt/cuopt/tests/routing/test_solver.py (1)

335-335: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use the established solution message accessor in the assert payload.

This file consistently uses get_message(), but this assertion uses get_error_message(). If that accessor is unavailable, a failing status check can raise AttributeError and mask the real solver failure reason.

Suggested fix
-    assert sol.get_status() == 0, sol.get_error_message()
+    assert sol.get_status() == 0, sol.get_message()
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/cuopt/cuopt/tests/routing/test_solver.py` at line 335, Replace the
assert payload that references the non-standard accessor get_error_message()
with the established accessor get_message(); specifically, update the assertion
that checks sol.get_status() to use sol.get_message() as the second argument
(referencing sol.get_status(), sol.get_message(), and removing
get_error_message()) so the failure message uses the standard solution message
accessor.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@python/cuopt/cuopt/tests/routing/test_solver.py`:
- Line 335: Replace the assert payload that references the non-standard accessor
get_error_message() with the established accessor get_message(); specifically,
update the assertion that checks sol.get_status() to use sol.get_message() as
the second argument (referencing sol.get_status(), sol.get_message(), and
removing get_error_message()) so the failure message uses the standard solution
message accessor.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d55296de-e57d-4d47-85a8-6696511bc14a

📥 Commits

Reviewing files that changed from the base of the PR and between de69f9e and 5c49546.

📒 Files selected for processing (5)
  • cpp/src/routing/crossovers/ox_recombiner.cuh
  • cpp/src/routing/local_search/cycle_finder/cycle.hpp
  • cpp/src/routing/local_search/move_candidates/move_candidates.cuh
  • cpp/src/routing/solver.cu
  • python/cuopt/cuopt/tests/routing/test_solver.py
✅ Files skipped from review due to trivial changes (1)
  • cpp/src/routing/local_search/move_candidates/move_candidates.cuh
🚧 Files skipped from review as they are similar to previous changes (3)
  • cpp/src/routing/crossovers/ox_recombiner.cuh
  • cpp/src/routing/local_search/cycle_finder/cycle.hpp
  • cpp/src/routing/solver.cu

@np96

np96 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@hlinsen please take a look. Current main branch and all releases do have UB here:
https://github.com/NVIDIA/cuopt/pull/1405/changes#diff-bd4a0f52972e8aa5df38c9f1ceefd3fb3fd326d1d227f7116d41ad6cfa80fd26L126

Particulary, when I tested at large-scale instances with heterogenous fleet, this caused non-converging solver behavior. After the fix the solver started to converge correctly.

@np96 np96 force-pushed the bug/min_vehicles_fix branch from 5c49546 to 2a98a4f Compare June 11, 2026 16:30
@np96 np96 changed the title bugfix (routing): fix incorrect min_vehicles bugfix (routing): fix min_vehicles bug Jun 11, 2026
Comment thread cpp/src/routing/solver.cu Outdated
target_vehicles = data_view_ptr_->get_min_vehicles();
auto target_vehicles = -1;
const auto min_vehicles = data_view_ptr_->get_min_vehicles();
if (min_vehicles > 0 && data_view_ptr_->get_fleet_size() >= min_vehicles) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can revert this change. min_vehicle > 0 should be checked in data_model and here we handle a special mode called "fixed route" that avoids route minimization paths so data_view_ptr_->get_fleet_size() == min_vehicles needs to hold. If fleet_size > min_vehicle we still want to use route minimization path.

@np96 np96 Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Reverting.

Just to confirm my understanding, in this mode each vehicle must serve exactly 1 order. If there's a situation when such solution is impossible, the problem must be considered infeasible.

For me this looks unusual — this looks like an assignment problem solved via Hungarian algorithm: N*N complexity scoring, infeasible assignment gets large score, then polynomial weighted matching. If any part of the solution has large score, consider infeasible.

I ran the solver with this mode enabled and it didn't consolidate routes, just naive 1-to-1 assignment, which matches that interpretation.

I don't understand why VRP solver may be needed for this kind of problem. Guess I misunderstand something, and would be happy to be corrected.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hlinsen reverted this one.

/// append_cycle() finalizes it.
DI void push_back(i_t val)
{
const auto write_pos = offsets[*n_cycles_] + curr_cycle_size++;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the purpose of the change simply to add asserts?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
I believe it is good to fire assertions as early as possible.
With assertions enabled the corrupted async-copied stack value propagated downstream and the assertion happened only at perform_moves.cu:78. If this assertion existed the origin of the problem would be bisected faster.

If you don't want this addition, I will revert it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hlinsen reverted.

@np96 np96 force-pushed the bug/min_vehicles_fix branch from 60c0264 to 8d643ef Compare June 11, 2026 20:08
@np96 np96 force-pushed the bug/min_vehicles_fix branch from 5fec8cb to 76ded91 Compare June 11, 2026 21:23
@np96

np96 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@hlinsen both comments addressed, only necessary fixes present. Also edited PR message to match current state.

@hlinsen hlinsen added bug Something isn't working non-breaking Introduces a non-breaking change labels Jun 11, 2026
@hlinsen

hlinsen commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

/ok to test 76ded91

@np96

np96 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

A kind reminder to consider this PR.
@kaatish @Iroy30
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants