fix(nuscenes): make lidar model estimation robust for all scenes#148
Conversation
|
CC @OatmealLiu |
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
setting to draft still as I still see some remaining issues - still testing on some more sequences / variants |
c512639 to
111998b
Compare
|
[More scene testing passed; Fix is validated] @janickm I tested the new code, including the optimizer issue fixes for the structured spinning LiDAR model, on 10 nuScenes scenes that previously encountered the following assertion error: File ".../tools/data_converter/structured_lidar_model.py", line 459, in optimize_model
return RowOffsetStructuredSpinningLidarModelParameters(
File ".../ncore/impl/data/types.py", line 727, in __post_init__
assert np.all(np.diff(relative_column_azimuths_rad.relative_angle_rad) > 0), (
AssertionError: Column azimuth angles must be sorted in the spinning direction so the diff
between relative angles of consecutive columns should always be positiveAll 10 scenes passed the test and were successfully converted to NCoreV4. I also eyeballed the LiDAR projection results, and they look good to me. Testing log: |
c466b91 to
cf5e424
Compare
|
Thanks for testing @OatmealLiu - I'm still working on further simplifications of the root / core issue before I can merge this - I'll let you know once it's done |
Sure. Let me know when it's merged so that I can test again and compare. Thanks @janickm ! |
cf5e424 to
0d7f667
Compare
@OatmealLiu The latest version now contains the condensed version of the proper fixes for the core numerical issues - maybe you can run through the previously failing ones? If this is positive we can look into finalizing this PR [I'm just pushing one more docs update in a sec] |
Sure. Did you push already? If so, I can go ahead with the latest branch |
0d7f667 to
598da1a
Compare
|
Yes, the code is updated |
|
I tested the new push of this branch, the same numerical error occurred again, as the failed scenes shown below: where the errorrs are : |
|
Huh, interesting! I was testing with the dust one actually, will double check |
Yeah. Let me know. I will also double check my config file. Maybe I left some customized config as conversion args. Will do some test soon. Have a nice weekend! |
598da1a to
19229e6
Compare
|
@OatmealLiu Thanks for catching that. Your run hit an intermediate push of the branch (the failure was still the column-azimuth monotonicity assert, just relocated to
I reproduced your 5 failing scenes locally ( The branch is updated (force-pushed after rebase). Could you re-run your 10-scene validation against the latest? If it's green we can finalize. |
|
@janickm Sure! I just tested the newest push of the fix branch. All test passed. Fixes are validated for the azimuth-span issue. I paste the testing log below. |
|
Besides, I have completed the full 850 scenes conversion. With the new fix, 849/850 scenes are converted successfully, while only Another different assertion error can be observed fro |
|
Great, thanks @OatmealLiu! so only one more to go, I'll try to check that one also |
19229e6 to
8b34772
Compare
8b34772 to
9a1a931
Compare
|
@OatmealLiu Fixed
Validated: all your scenes (0007/0055/0057/0129/0180/0356) plus the v1.0-mini set convert in both empirical and nominal at 4x, far-range error 0.018-0.058 deg. Unit + lidar integration + ncore suites pass. Branch updated (force-pushed after rebase on |
|
Thanks a lot @janickm ! Yep. I will sweep the 850 scenes using the new branch again to make a final, consistent nuScenes in ncoreV4 format. |
9a1a931 to
3357d34
Compare
The nuScenes conversion crashed during lidar model estimation on several scenes (e.g. trainval scene-0007 and mini scenes 0103/0655/0796/1077) with: AssertionError: Column azimuth angles must be sorted in the spinning direction so the diff between relative angles of consecutive columns should always be positive (ncore/impl/data/types.py) The assertion was masking estimators that produced malformed (non-monotonic / wrapping) column azimuths. Root causes, fixed at source: 1. util.relative_angle reduced the scalar reference with `% 2pi` in float64 while reducing a float32 angle array in float32, so the same value reduced to results ~1 ULP apart. relative_angle(arr[0], arr, ...) therefore did not reliably yield 0 at index 0: for a float32 array whose first element reduces near the +/-pi boundary the self-distance wrapped to ~2*pi, spuriously failing the structured-model monotonicity check for otherwise-valid azimuths. Fixed by subtracting before reducing (a python scalar does not upcast a float32 numpy/torch array, so the whole computation stays in the array dtype); (a - b) mod 2pi == (a mod 2pi - b mod 2pi) mod 2pi, so the result is exact. Adds direct relative_angle tests (the function had none), including a torch case. 2. derive_model_from_decompensated took the column azimuths from a single reference row's raw measurements. That row's individual returns carry gross outliers (spurious points, multipath) -- adjacent columns could jump by ~1.7-2.9 rad (100-170 deg) and reorder. Fixed by estimating each column azimuth as the circular median across all valid rows: every beam in a column fires at nearly the same azimuth, so the median rejects per-beam outliers and yields a clean, already-monotonic sweep. 3. optimize_model applied independent per-(sub-)column median corrections. On a 4x-upsampled grid each sub-column is hit by only a few points, so its median residual was assignment noise that reordered neighbours; the multi-frame data is, however, more than sufficient at coarser granularity (~3900 obs per native column). Fixed by estimating the local correction with adaptive binning (pool >=200 observations per bin, interpolate back): a strictly better estimate that pools all frames, not a smoothing band-aid. A global circular- mean offset is still removed first so a systematic model/data phase offset does not tear the ramp apart. 4. derive_model_from_decompensated assumed a fixed firing-to-row order (reverse the beams). Spinning lidars do not necessarily fire beams in elevation order -- the HDL-32E interleaves its two lowest lasers -- so the reversal left two rows out of descending order, tripping the sibling "Row elevation angles must be sorted in descending order" assertion (scene-0356). Fixed at the source by recovering the beam-to-row mapping from the data: sort beams by measured elevation and apply that permutation to every per-beam quantity (elevations and azimuth offsets). The elevation ramp is then monotonic by construction, with no repair needed. Resolution analysis: model far-range error improves monotonically with column resolution (~0.105 deg at 1x, ~0.056 at 2x, ~0.044 at 4x), so 4x stays the default. The upsampling is also the sole source of the residual azimuth ordering ambiguity: it creates sub-columns finer than the data resolves, so a few adjacent ones can come out marginally out of order. enforce_spinning_monotonic resolves this with the minimal monotonic projection -- measured to touch <=66 of 4340 columns by <=0.4 deg (below the model's own accuracy), and to never fire at native resolution. It unwraps into a continuous ramp (no fragile re-wrap or global shift that could rotate the whole model by 2*pi when element 0 sits on the +/-pi seam), compresses a sweep that marginally overshoots one revolution (a real scan can overlap slightly at the seam), and rejects only gross (>5%) overflow as malformed input. Other changes: - Align NuScenesConverter4Config defaults with the CLI and README so a programmatically-built config matches the command line. - Default --lidar-model-source to empirical (see results below); nominal remains available for the data-independent case. - Document expected accuracy in the README files and the //tools:ncore_evaluate_lidar_model tool. Results (//tools:ncore_evaluate_lidar_model, per-point mean angular error over far-range points > 20 m, HDL-32E at 4x resolution, 1 optimization pass): empirical: 0.02 - 0.04 deg far-range error, |azimuth bias| < 0.005 deg nominal: 0.02 - 0.6 deg far-range error, azimuth bias up to ~0.4 deg scene empirical_far nominal_far 0061 0.033 0.031 0103 0.035 0.141 0553 0.020 0.022 0655 0.037 0.172 0757 0.028 0.027 0796 0.033 0.183 0916 0.033 0.032 1077 0.033 0.141 1094 0.037 0.037 1100 0.022 0.021 0007 0.035 0.577 Empirical is up to ~15x more accurate at far range and never worse, so it is the default. Validation: all 10 v1.0-mini scenes plus trainval scene-0007 convert with both model sources at resolution 4; previously-crashing configs now succeed and previously-working ones are unchanged or improved. ncore data/sensors suites, the structured lidar model unit tests (3.11 + 3.8), and the nuScenes integration test all pass.
3357d34 to
4bff82e
Compare
|
@OatmealLiu I'm going ahead and merge this one to fix the obvious apparent issues in |
|
@janickm Hi! To complete this fix's testing: I re-converted the whole 850 nuScenes' scenes with the newly fixed NCore codebase. All 850 scenes are successfully converted. The fix is validated. Further, I observed an improvement in conversion speed. On the same machine, the per-scene conversion time goes down from ~110s (before fix) to ~95s (after fix) on average, on the same machine with the same resources. Thank you! |
Summary
The nuScenes conversion crashed during lidar model estimation on several scenes (trainval
scene-0007, and v1.0-mini0103/0655/0796/1077) with:The assertion was masking estimators that produced malformed (non-monotonic / wrapping) column azimuths. Rather than repair the symptom downstream, this PR fixes each root cause at source so the estimators produce valid azimuths by construction.
Root causes & fixes
1.
util.relative_anglemixed-precision reduction. It reduced the scalar reference with% 2piin float64 while reducing a float32 array in float32, so the same value reduced to results ~1 ULP apart.relative_angle(arr[0], arr, ...)therefore did not reliably return 0 at index 0 -- for a float32 array whose first element reduces near the +/-pi boundary the self-distance wrapped to ~2*pi, spuriously failing the monotonicity check for otherwise-valid azimuths. Fix: subtract before reducing (a python scalar does not upcast a float32 numpy/torch array, so the computation stays in the array dtype);(a - b) mod 2pi == (a mod 2pi - b mod 2pi) mod 2pi, so it is exact. Adds directrelative_angletests (the function had none), including a torch case.2.
derive_model_from_decompensatedused a single reference row. That row's individual returns carry gross outliers (spurious points, multipath) -- adjacent columns could jump by ~1.7-2.9 rad (100-170 deg) and reorder. Fix: estimate each column azimuth as the circular median across all valid rows. Every beam in a column fires at nearly the same azimuth, so the median rejects per-beam outliers and yields a clean, already-monotonic sweep.3.
optimize_modelapplied independent per-column corrections. On a 4x-upsampled grid, adjacent columns sample different point subsets, so the per-column median correction carried high-frequency noise that exceeded the column spacing and reordered columns; it also left unobserved columns at the nominal value while shifting observed ones under a systematic phase offset. Fix: remove the global offset from every column, then interpolate the per-column correction across unobserved columns and smooth it -- a smooth correction added to a monotonic ramp stays monotonic by construction.With the estimators producing valid azimuths directly,
enforce_spinning_monotonic(generalized forcwandccw) is now only a cheap coordinate-normalization and final guard: it raises on genuinely malformed input (span >= 2*pi) rather than silently reshaping it.Other changes
NuScenesConverter4Configdefaults with the CLI and README so a programmatically-built config matches the command line.--lidar-model-sourcetoempirical(see results);nominalremains available for the data-independent case.//tools:ncore_evaluate_lidar_modeltool.Results
Measured with
//tools:ncore_evaluate_lidar_model(per-point mean angular error over far-range points > 20 m; HDL-32E, 4x resolution, 1 optimization pass), all 10 v1.0-mini scenes plus trainvalscene-0007:Summary: empirical 0.02-0.04 deg far-range, |azimuth bias| < 0.005 deg; nominal 0.02-0.6 deg, azimuth bias up to ~0.4 deg. Empirical is up to ~15x more accurate at far range and never worse, so it is the default.
Validation
scene-0007convert with both model sources at resolution 4; previously-crashing configs now succeed, previously-working ones are unchanged or improved.bazel test //tools/data_converter:pytest_structured_lidar_model_{3_11,3_8}and//ncore/impl/data:pytest_util_{3_11,3_8}pass, with new regression tests for: therelative_angleprecision fix (numpy + torch), the derive outlier rejection, the optimize correction smoothing, andenforce_spinning_monotonic(cw/ccw, malformed-input rejection).NUSCENES_DIR=... bazel test //tools/data_converter/nuscenes:pytest_converter_3_11passes.