Fast MPS parser for free-format MPS files#1429
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds an opt-in experimental fast MPS reader with SIMD tokenization and Eisel–Lemire FP64 conversion, mmap-backed input streams, multi-threaded LZ4 frame decoding, a phase-based section scanner, and extended test coverage with CMake wiring for reader selection and optional LZ4 support. ChangesExperimental Fast MPS Parser with LZ4 Support
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
|
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (2)
cpp/tests/linear_programming/parser_test.cpp (1)
2553-2675: ⚡ Quick winAdd direct coverage for the new fast-reader rejection branches.
These dispatch tests still only exercise success paths. They never assert the two new guards in
read(...): rejectingfast_experimentalwithfixed_mps_format=true, and rejecting.qps*when the fast reader is selected. A small pair ofEXPECT_THROWcases would lock down the new CLI/API contract.As per coding guidelines, "Add test coverage for edge cases (empty, infeasible, unbounded, degenerate) when adding new code paths."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cpp/tests/linear_programming/parser_test.cpp` around lines 2553 - 2675, Add two negative tests exercising the new fast-reader rejection branches: (1) call read<int,double> (or use dispatch_parse) with fast_experimental enabled while passing fixed_mps_format=true and EXPECT_THROW(std::logic_error) to cover the "reject fast_experimental with fixed_mps_format" guard (reference the read function signature and any CLI flag/parameter used to enable fast_experimental/fixed_mps_format in your API), and (2) attempt to parse a ".qps" (or ".qps.gz"/".qps.bz2") file while forcing the fast reader and EXPECT_THROW(std::logic_error) to cover the "reject .qps* when fast reader selected" branch; add these as new TEST cases next to the existing dispatch tests (e.g., alongside read, qps_extension_dispatches_to_mps_parser) so they run with the other parser dispatch tests.Source: Coding guidelines
cpp/tests/linear_programming/experimental_mps_fast/fast_fp64_parser_test.cpp (1)
117-149: ⚡ Quick winAdd explicit overflow/underflow and subnormal boundary cases.
The current suite never exercises the hardest FP64 paths: overflow, underflow-to-zero, and subnormal boundaries. Because the randomized generator caps exponents at
[-30, 30], it also won’t cover the fallback/equivalence edge cases this parser is most likely to regress on. Please add a few fixed cases like max-finite overflow, min-normal neighbors, and min-subnormal rounding boundaries.Based on learnings: "Tests must validate numerical correctness, not just run without error" and "Add test coverage for edge cases (empty, infeasible, unbounded, degenerate) when adding new code paths."
Also applies to: 168-176
Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cpp/include/cuopt/linear_programming/io/parser.hpp`:
- Around line 20-25: The documentation for the experimental fast MPS reader is
incorrect: it claims LP/MIP/QP support but the fast path throws on any .qps*
input when mps_reader_type_t::fast_experimental is selected; update the docs to
accurately reflect supported formats (remove QP/.qps claim) wherever the enum
mps_reader_type_t and the dispatch comment block are defined (the enum
declaration and the doc comment above the MPS reader dispatcher), or
alternatively implement QP/.qps support in the fast reader; pick the
documentation fix unless you also add QP handling.
- Around line 155-180: The code unconditionally accepts and advertises “.lz4”
extensions even when LZ4 support may be compiled out; update the suffix checks
and the supported-extensions error message to be gated by the same compile-time
flag used in the build (e.g., MPS_PARSER_WITH_LZ4/CUOPT_PARSER_WITH_LZ4).
Concretely, wrap the checks that call lower.ends_with(".mps.lz4"), ".qps.lz4",
and ".lp.lz4" and the inclusion of ".mps.lz4/.qps.lz4/.lp.lz4" in the thrown
message with `#ifdef` MPS_PARSER_WITH_LZ4 (or the project’s appropriate macro),
and when the macro is not defined ensure those suffixes are not matched and not
listed in the supported extensions string referenced by
read_mps_fast_experimental, read_mps, and read_lp routing logic.
In `@cpp/src/io/experimental_mps_fast/fast_fp64_parser.hpp`:
- Around line 417-427: parse_fp64_advance currently accepts partially-parsed
numbers because it trusts parse_decimal_advance even when it stops early; change
it so that after a successful parse_decimal_advance(p, end, dec) you verify the
parser consumed the entire token by checking that p == end, and if not return
fallback_strtod(std::string_view(start, (size_t)(p - start))); only when p ==
end should you call assemble_fp64(dec) and return its value (with the existing
NaN check that falls back to fallback_strtod). This uses the existing symbols
parse_fp64_advance, parse_decimal_advance, assemble_fp64, and fallback_strtod.
In `@cpp/src/io/experimental_mps_fast/file_reader.cpp`:
- Around line 41-46: The case-sensitive suffix checks in path_has_suffix (and
the similar checks at 274-296) cause inconsistency with file_to_string.cpp which
lowercases filenames before detecting .lz4/.gz/.bz2; normalize the filename to a
lowercase copy once before calling effective_file_read_method()/before any
suffix checks and use that lowercase string for all suffix comparisons (or
update path_has_suffix to perform case-insensitive comparison), so files like
MODEL.MPS.LZ4 are detected as compressed by parse_mps_fast_file and the legacy
reader alike.
- Around line 97-108: get_file_size(const std::string& path) currently opens fd
then calls get_file_size(fd, path) and closes fd only on the success path,
leaking the descriptor if get_file_size(fd, path) throws; wrap the raw fd in a
small RAII/scoped guard (or use a unique_fd/ScopeExit) immediately after ::open
so the descriptor is closed on all exit paths, then call get_file_size(fd, path)
using the RAII handle and let the guard close the fd when it goes out of scope.
In `@cpp/src/io/experimental_mps_fast/file_reader.hpp`:
- Around line 156-183: The helper parallel_for_indexed currently accepts
thread_count==0 and silently does no work; validate thread_count at the start of
parallel_for_indexed (before reserving/spawning threads) and either clamp it to
at least 1 or throw an exception on zero. Update the function to check the
thread_count parameter (the one passed into parallel_for_indexed) immediately
and then proceed to use the validated value with scoped_thread_group workers,
next, and the existing worker lambda so no silent no-op occurs.
In `@cpp/src/io/experimental_mps_fast/lz4_file_reader.cpp`:
- Around line 536-557: Wrap the entire body of read_window in a try { ... }
catch(...) block so any exception (e.g., from new char[w.size], pread_full, or
other statements) is caught and forwarded by calling
fail_and_notify(std::current_exception()); retain the existing mps_parser_fail
usage for pread errors but remove any paths that let exceptions escape
read_window (ensure all exception flows end up invoking
fail_and_notify(std::current_exception()), using the function names read_window,
pread_full, mps_parser_fail, and fail_and_notify to locate the changes).
In `@cpp/src/io/experimental_mps_fast/mmap_region.hpp`:
- Around line 76-100: anonymous_aligned currently unmaps prefix/suffix using
byte counts which can produce non-page-aligned munmap calls (EINVAL) and
leak/steal pages; fix by making the helper either (A) retain the original raw
mapping and raw_size in mmap_region_t and unmap that entire raw mapping in the
destructor/reset(), or (B) compute prefix and suffix rounded up/down to the
system page size (use sysconf(_SC_PAGESIZE) or getpagesize()) so every munmap
boundary is page-aligned before calling ::munmap; implement RAII by adding
raw_ptr/raw_size members to mmap_region_t, unmapping in its destructor and
reset() and checking munmap return values, and update anonymous_aligned to
populate those members rather than attempting partial unmaps at arbitrary byte
offsets.
In `@cpp/src/io/experimental_mps_fast/mps_section_scanner.cpp`:
- Around line 125-130: mps_phase_registry_t::publish_endata currently overwrites
endata_begin_ and endata_present_ even after endata_ready_ was set, creating
races with readers that read the plain members after an acquire on
endata_ready_; change publish_endata so it performs a single-shot publication:
only update endata_begin_ and endata_present_ when endata_ready_ was not yet set
(use an atomic test-and-set or compare_exchange on endata_ready_ to detect first
publication) and return without mutating the payload if endata_ready_ is already
true; ensure this mirrors the behavior of publish() and apply the same fix to
the other occurrence mentioned (lines ~455-459) so readers of
endata_begin()/endata_present() see a single, immutable publication.
In `@cpp/src/io/utilities/error.hpp`:
- Around line 37-40: mps_parser_throw currently injects msg verbatim into a JSON
string which can produce invalid JSON when msg contains quotes, backslashes or
newlines; add a small JSON-escaping helper (e.g., json_escape(const
std::string&)) that replaces backslash, quote and control chars (e.g., \n, \r,
\t) with their JSON-escaped forms and use it when building the thrown
std::logic_error message (replace std::string(msg) with json_escape(msg));
reference mps_parser_throw and error_to_string when updating the construction so
the thrown payload remains {"MPS_PARSER_ERROR_TYPE": "...", "msg": "escaped
text"}.
In `@cpp/src/utilities/perf_counters.hpp`:
- Around line 11-15: This header is missing direct includes for utilities it
uses: add `#include` <utility> for std::pair, `#include` <cstring> for std::strlen
and std::strncmp, and `#include` <cstdlib> for std::strtol so perf_counters.hpp is
self-contained; update the include block (which currently has <array>, <cerrno>,
<cstdint>, <cstdio>, <vector>) to also include those three headers to satisfy
IWYU and the self-contained-header rule.
In
`@cpp/tests/linear_programming/experimental_mps_fast/fast_parser_edge_test.cpp`:
- Around line 94-129: The test helper check_models_match_reference_bitwise
currently uses EXPECT_EQ to compare floating-point vectors (A_, b_, c_,
variable_lower_bounds_, variable_upper_bounds_, constraint_lower_bounds_,
constraint_upper_bounds_), which compares values not IEEE-754 bit patterns;
change those comparisons to element-wise bit-wise comparisons (use the existing
bits() helper or std::bit_cast<uint64_t> on each double) so each corresponding
element in parser_model_t::A_, mps_data_model_t::A_, and the b_/c_/bound vectors
is compared by its bit representation; update the assertions for A_, b_, c_,
variable_lower_bounds_, variable_upper_bounds_, constraint_lower_bounds_, and
constraint_upper_bounds_ in check_models_match_reference_bitwise to iterate
elements and EXPECT_EQ(bits(ref[i]), bits(fast[i])) (with clear context strings)
instead of EXPECT_EQ on the whole vector.
---
Nitpick comments:
In `@cpp/tests/linear_programming/parser_test.cpp`:
- Around line 2553-2675: Add two negative tests exercising the new fast-reader
rejection branches: (1) call read<int,double> (or use dispatch_parse) with
fast_experimental enabled while passing fixed_mps_format=true and
EXPECT_THROW(std::logic_error) to cover the "reject fast_experimental with
fixed_mps_format" guard (reference the read function signature and any CLI
flag/parameter used to enable fast_experimental/fixed_mps_format in your API),
and (2) attempt to parse a ".qps" (or ".qps.gz"/".qps.bz2") file while forcing
the fast reader and EXPECT_THROW(std::logic_error) to cover the "reject .qps*
when fast reader selected" branch; add these as new TEST cases next to the
existing dispatch tests (e.g., alongside read,
qps_extension_dispatches_to_mps_parser) so they run with the other parser
dispatch tests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ae230bf3-b5ba-42f6-a034-5f88021e23b3
📒 Files selected for processing (27)
cpp/CMakeLists.txtcpp/cuopt_cli.cppcpp/include/cuopt/linear_programming/io/parser.hppcpp/src/CMakeLists.txtcpp/src/io/CMakeLists.txtcpp/src/io/experimental_mps_fast/fast_fp64_parser.hppcpp/src/io/experimental_mps_fast/fast_parse_primitives.hppcpp/src/io/experimental_mps_fast/fast_parser.cppcpp/src/io/experimental_mps_fast/fast_parser.hppcpp/src/io/experimental_mps_fast/file_reader.cppcpp/src/io/experimental_mps_fast/file_reader.hppcpp/src/io/experimental_mps_fast/hash_table_smallstr.hppcpp/src/io/experimental_mps_fast/lz4_file_reader.cppcpp/src/io/experimental_mps_fast/mmap_region.hppcpp/src/io/experimental_mps_fast/mps_section_scanner.cppcpp/src/io/experimental_mps_fast/mps_section_scanner.hppcpp/src/io/experimental_mps_fast/nvtx_ranges.hppcpp/src/io/file_to_string.cppcpp/src/io/file_to_string.hppcpp/src/io/mps_parser.cppcpp/src/io/parser.cppcpp/src/io/utilities/error.hppcpp/src/utilities/perf_counters.hppcpp/tests/linear_programming/CMakeLists.txtcpp/tests/linear_programming/experimental_mps_fast/fast_fp64_parser_test.cppcpp/tests/linear_programming/experimental_mps_fast/fast_parser_edge_test.cppcpp/tests/linear_programming/parser_test.cpp
|
I can confirm I extensively used this code to parse several big mps instances (> 50GB) and the results were bitwise equal with the original parser. |
|
/ok to test 1990c06 |
|
/ok to test d7358f6 |
|
/ok to test b1abd6f |
This PR introduces an experimental, opt-in fast parser for well-formed MPS files, which takes advantage of available parallelism and SIMD extensions on the target machine. This is mostly intended for huge (>1GB) MPS files, e.g. for distributed solves.
The parser relies on parallel I/O overlapping with MPS section parsing workers which start parsing as soon as read/decoded data is made available, in order to hide latency on slow NFS filesystems.
LZ4 is added as a supported decompression format, which is done in a parallel fashion. LZ4 tends to perform unusually well (~10-30% compression ratios) on MPS data due to their very regular nature and common prefixes in row/column names, and decompresses at a few GB/s.
This feature is exposed via a new CLI flag
--mps-reader, which can take eitherdefault(existing parser), orexperimental-fast(this PR) as its argument.LP, MIP, QP, and MPS SOCP formats are supported.
Results, on a corpus of 42 instances with sizes ranging from 1.55GB to 50.5GB:
Results for some >100GB files, NFS storage, cold cache (reference parser times out at 1 hour)
Description
Issue
Checklist