bugfix (routing): fix min_vehicles bug#1405
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughSolver selection now treats fleet_size >= min_vehicles as the fixed-route path; OX recombiner skips variable route-count search in fixed-route mode and rejects offspring on route-count mismatch; cycle finder adds buffer bounds checks; move-candidates uses async zeroing; tests add a parametrized regression ensuring min_vehicles is respected. ChangesIssue
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
f51933b to
de69f9e
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
python/cuopt/cuopt/tests/routing/test_solver.py (1)
335-335:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse the established solution message accessor in the assert payload.
This file consistently uses
get_message(), but this assertion usesget_error_message(). If that accessor is unavailable, a failing status check can raiseAttributeErrorand mask the real solver failure reason.Suggested fix
- assert sol.get_status() == 0, sol.get_error_message() + assert sol.get_status() == 0, sol.get_message()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@python/cuopt/cuopt/tests/routing/test_solver.py` at line 335, Replace the assert payload that references the non-standard accessor get_error_message() with the established accessor get_message(); specifically, update the assertion that checks sol.get_status() to use sol.get_message() as the second argument (referencing sol.get_status(), sol.get_message(), and removing get_error_message()) so the failure message uses the standard solution message accessor.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@python/cuopt/cuopt/tests/routing/test_solver.py`:
- Line 335: Replace the assert payload that references the non-standard accessor
get_error_message() with the established accessor get_message(); specifically,
update the assertion that checks sol.get_status() to use sol.get_message() as
the second argument (referencing sol.get_status(), sol.get_message(), and
removing get_error_message()) so the failure message uses the standard solution
message accessor.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d55296de-e57d-4d47-85a8-6696511bc14a
📒 Files selected for processing (5)
cpp/src/routing/crossovers/ox_recombiner.cuhcpp/src/routing/local_search/cycle_finder/cycle.hppcpp/src/routing/local_search/move_candidates/move_candidates.cuhcpp/src/routing/solver.cupython/cuopt/cuopt/tests/routing/test_solver.py
✅ Files skipped from review due to trivial changes (1)
- cpp/src/routing/local_search/move_candidates/move_candidates.cuh
🚧 Files skipped from review as they are similar to previous changes (3)
- cpp/src/routing/crossovers/ox_recombiner.cuh
- cpp/src/routing/local_search/cycle_finder/cycle.hpp
- cpp/src/routing/solver.cu
|
@hlinsen please take a look. Current main branch and all releases do have UB here: Particulary, when I tested at large-scale instances with heterogenous fleet, this caused non-converging solver behavior. After the fix the solver started to converge correctly. |
5c49546 to
2a98a4f
Compare
| target_vehicles = data_view_ptr_->get_min_vehicles(); | ||
| auto target_vehicles = -1; | ||
| const auto min_vehicles = data_view_ptr_->get_min_vehicles(); | ||
| if (min_vehicles > 0 && data_view_ptr_->get_fleet_size() >= min_vehicles) { |
There was a problem hiding this comment.
I believe we can revert this change. min_vehicle > 0 should be checked in data_model and here we handle a special mode called "fixed route" that avoids route minimization paths so data_view_ptr_->get_fleet_size() == min_vehicles needs to hold. If fleet_size > min_vehicle we still want to use route minimization path.
There was a problem hiding this comment.
Oh, I see. Reverting.
Just to confirm my understanding, in this mode each vehicle must serve exactly 1 order. If there's a situation when such solution is impossible, the problem must be considered infeasible.
For me this looks unusual — this looks like an assignment problem solved via Hungarian algorithm: N*N complexity scoring, infeasible assignment gets large score, then polynomial weighted matching. If any part of the solution has large score, consider infeasible.
I ran the solver with this mode enabled and it didn't consolidate routes, just naive 1-to-1 assignment, which matches that interpretation.
I don't understand why VRP solver may be needed for this kind of problem. Guess I misunderstand something, and would be happy to be corrected.
| /// append_cycle() finalizes it. | ||
| DI void push_back(i_t val) | ||
| { | ||
| const auto write_pos = offsets[*n_cycles_] + curr_cycle_size++; |
There was a problem hiding this comment.
Is the purpose of the change simply to add asserts?
There was a problem hiding this comment.
Yes.
I believe it is good to fire assertions as early as possible.
With assertions enabled the corrupted async-copied stack value propagated downstream and the assertion happened only at perform_moves.cu:78. If this assertion existed the origin of the problem would be bisected faster.
If you don't want this addition, I will revert it.
There was a problem hiding this comment.
We use spans so out of bounds should already trigger: https://github.com/NVIDIA/cuopt/blob/main/cpp/src/routing/local_search/cycle_finder/cycle.hpp#L69
60c0264 to
8d643ef
Compare
5fec8cb to
76ded91
Compare
|
@hlinsen both comments addressed, only necessary fixes present. Also edited PR message to match current state. |
|
/ok to test 76ded91 |
Description
Fix min_vehicles ignored with vehicle fixed costs (#904)
When vehicle_fixed_costs were set and min_vehicles == fleet_size, the
solver returned fewer vehicles than min_vehicles and, in debug builds,
crashed with device-side asserts. Two coordinated issues:
move_candidates.cuh: move_path::reset() zeroed n_insertions via
device_scalar::set_value_async(), which copies from a host stack local.
reset() runs inside a captured CUDA graph, so that host source dangled
by graph-launch time and left n_insertions garbage (~32k) — perform_moves
then launched tens of thousands of out-of-bounds blocks. Use the
memset-based set_value_to_zero_async(), which is capture-safe.
ox_recombiner.cuh: in fixed_route mode (fleet_size == min_vehicles), the
VEHICLE_FIXED_COST branch enabled optimal-routes search, letting
Bellman-Ford vary the route count. That contradicts the fixed vehicle
count: it broke the recreate_solution invariant (routes removed == routes
added) and drifted the solution below min_vehicles. Skip optimal-routes
search when the count is fixed, and reject any count-mismatched offspring
instead of applying it.
Issue
#904
Checklist