Add waterdata.get_nearest_continuous helper#239
Merged
thodson-usgs merged 5 commits intoDOI-USGS:mainfrom Apr 23, 2026
Merged
Conversation
Collaborator
Author
|
I vibe coded this function to return the nearest continuous data given a list of timestamps, which just constructs the filter and passes it through the waterdata api. @ldecicco-USGS suggested this feature could be upstreamed as a new endpoint. How do you feel about that? |
For each target timestamp, fetch the nearest continuous observation in a single round trip. Builds a CQL OR-chain of per-target bracketed windows, pipes it through ``get_continuous`` (which already auto-chunks long filters across multiple sub-requests), then selects the single observation closest to each target client-side. Exists because the Water Data API matches a single-instant ``time=`` parameter exactly (10:30:31 returns zero rows on a 15-minute gauge), does not implement ``sortby`` for arbitrary queryables, and does not expose a ``T_NEAREST`` CQL function. The narrow-window + client-side reduction is the one pattern that works today for multi-target nearest lookups in one API call. Tie handling is configurable via ``on_tie``: - "first" (default): keep the earlier observation - "last": keep the later observation - "mean": average numeric columns; set ``time`` to the target Default ``window="7min30s"`` matches the 15-minute gauge cadence so most targets' windows contain exactly one observation. Users with irregular-cadence gauges or known data gaps can widen to "15min" or "30min" at the cost of more bytes per response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
``pandas.Timedelta`` already accepts ``"00:07:30"`` identically to ``"7min30s"``, so no behaviour change is needed — just switch the default and the docstring examples to the more readable form. Added a regression test that asserts the two spellings produce the same CQL filter so future refactors can't drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switch the default from "00:07:30" to "PT7M30S" so the user-visible
contract points at an actual international standard (ISO 8601
duration) rather than a pandas-specific colon form.
``pandas.Timedelta`` still accepts all the other forms users may
already have typed — ISO 8601, HH:MM:SS, shorthand ("7min30s",
"450s"), or a ``pd.Timedelta`` directly — and a parametrized test
now exercises each shape to lock in the "whatever ``pd.Timedelta``
takes" contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Split the helper's body into four private functions so the top-level
flow reads as a short recipe:
- ``_check_nearest_kwargs`` reject kwargs the helper owns
(``time``/``filter``/``filter_lang``);
validate ``on_tie``
- ``_build_window_or_filter`` CQL ``OR``-chain of bracketed time
windows, one per target
- ``_pick_nearest_row`` window → nearest row, with the three
tie-resolution branches isolated
- ``_empty_nearest_result`` empty frame with a ``target_time``
column, used wherever no match lands
Drops the nested ``for site → for target → mask → tie-branch`` loop in
favor of a flat list-comprehension + walrus against the new helper.
Fixes a fragile ``pd.to_datetime(list(targets), utc=True)`` (a numpy
``datetime64`` array would round-trip through ``list`` as tz-stripped
scalars) — now passes the input directly to ``pd.to_datetime`` and
wraps in ``pd.DatetimeIndex``. Swaps ``df = df.copy(); df["time"] =
...`` for ``df.assign(time=...)`` to avoid the full-frame copy.
Also NEWS.md: add a short entry describing the new helper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
b20d08e to
2613792
Compare
Calling ``get_nearest_continuous`` with an empty ``targets`` is almost always a caller bug (an unfiltered frame, a typo, a mis-named column). The previous code papered over it by firing a trivial-range HTTP request (``time=1900-01-01/1900-01-01``) purely so the caller received a real ``BaseMetadata`` object. That pattern wastes a round-trip on a nonsensical input and hides the bug. Raise ``ValueError`` on empty ``targets`` instead. Shrinks the body and makes a caller mistake loud. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
thodson-usgs
added a commit
to thodson-usgs/dataretrieval-python
that referenced
this pull request
Apr 24, 2026
For each target timestamp, fetch the nearest continuous observation in a single round trip. Builds a CQL OR-chain of per-target bracketed windows, pipes it through ``get_continuous`` (which already auto-chunks long filters across multiple sub-requests), then selects the single observation closest to each target client-side. Exists because the Water Data API matches a single-instant ``time=`` parameter exactly (10:30:31 returns zero rows on a 15-minute gauge), does not implement ``sortby`` for arbitrary queryables, and does not expose a ``T_NEAREST`` CQL function. The narrow-window + client-side reduction is the one pattern that works today for multi-target nearest lookups in one API call. Tie handling is configurable via ``on_tie``: - "first" (default): keep the earlier observation - "last": keep the later observation - "mean": average numeric columns; set ``time`` to the target
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
waterdata.get_nearest_continuous(targets, ...)— for each target timestamp, returns the single continuous observation closest to that timestamp, fetched in one HTTP round-trip (auto-chunked when the CQL filter gets long).Why
The Water Data API's
time=parameter treats a single instant as an exact match, not a nearest-match —time=2023-06-15T10:30:31Zon a 15-minute gauge returns 0 rows. The advertisedsortbyparameter would make "nearest" expressible asfilter=time <= 'target' & sortby=-time & limit=1, butsortbyis per-query, so N targets would mean N HTTP round-trips. There is noT_NEARESTCQL function either.The narrow-window + client-side reduction implemented here is the one pattern that folds N targets into a single request today.
Usage
Knobs:
window="7min30s"— half-window around each target. The default matches the 15-minute continuous cadence so most windows contain exactly one observation. Widen (e.g."15min","30min") for irregular cadences or resilience to data gaps.on_tie="first"— how to resolve ties when a target falls at the midpoint between two grid points (rare but possible). Alternatives:"last"(keep the later observation),"mean"(average numeric columns; settimeto the target).Multi-site calls return one row per
(target, monitoring_location_id)pair. Targets with no observations in their window are silently dropped. Passingtime=,filter=, orfilter_lang=raisesTypeError— the helper builds those itself.Reproduce the live result in one shot
The exact URL the helper generated from the 5-target example in the test plan below — paste into a browser or curl (no API key required; get one here if you want higher rate limits):
Decoded the filter is 5 OR'd ±7min30s windows around each target timestamp. The response should contain one feature per target, all from gauge
USGS-02238500with parameter_code00060, values ≈ 22.4 ft³/s.Relationship to #238
This PR is built on top of #238 (
Add CQL filter passthrough to OGC waterdata getters) and will look lighter once that lands. The helper's core trick — fanning N targets into one request — is only possible because #238 addsfilter=support + automatic URL-length-safe chunking toget_continuous. The branchadd-get-nearest-continuousis stacked onadd-ogc-cql-filter-passthrough, so until #238 merges the diff here will include both changesets; after #238 merges the commits on its branch become common ancestors and this PR's diff reduces to the one commit introducingget_nearest_continuousand its tests.Please merge #238 first.
Test plan
USGS-0223850000060with 5 off-grid targets (10:30:31,14:07:12,03:45:19,18:59:45,06:14:02) — one request, five ±7min30s windows OR'd together, five rows returned with deltas −432s to +58s (all inside the half-window). See the URL above.🤖 Generated with Claude Code