Add waterdata.get_nearest_continuous helper by thodson-usgs · Pull Request #239 · DOI-USGS/dataretrieval-python

thodson-usgs · 2026-04-23T14:54:19Z

Summary

Adds waterdata.get_nearest_continuous(targets, ...) — for each target timestamp, returns the single continuous observation closest to that timestamp, fetched in one HTTP round-trip (auto-chunked when the CQL filter gets long).

Why

The Water Data API's time= parameter treats a single instant as an exact match, not a nearest-match — time=2023-06-15T10:30:31Z on a 15-minute gauge returns 0 rows. The advertised sortby parameter would make "nearest" expressible as filter=time <= 'target' & sortby=-time & limit=1, but sortby is per-query, so N targets would mean N HTTP round-trips. There is no T_NEAREST CQL function either.

The narrow-window + client-side reduction implemented here is the one pattern that folds N targets into a single request today.

Usage

import pandas as pd
from dataretrieval import waterdata

targets = pd.to_datetime([
    "2023-06-15T10:30:31Z",
    "2023-06-15T14:07:12Z",
    "2023-06-16T03:45:19Z",
])

df, md = waterdata.get_nearest_continuous(
    targets,
    monitoring_location_id="USGS-02238500",
    parameter_code="00060",
)

# df: one row per target, augmented with a `target_time` column
# identifying which target each row corresponds to.

Knobs:

window="7min30s" — half-window around each target. The default matches the 15-minute continuous cadence so most windows contain exactly one observation. Widen (e.g. "15min", "30min") for irregular cadences or resilience to data gaps.
on_tie="first" — how to resolve ties when a target falls at the midpoint between two grid points (rare but possible). Alternatives: "last" (keep the later observation), "mean" (average numeric columns; set time to the target).

Multi-site calls return one row per (target, monitoring_location_id) pair. Targets with no observations in their window are silently dropped. Passing time=, filter=, or filter_lang= raises TypeError — the helper builds those itself.

Reproduce the live result in one shot

The exact URL the helper generated from the 5-target example in the test plan below — paste into a browser or curl (no API key required; get one here if you want higher rate limits):

https://api.waterdata.usgs.gov/ogcapi/v0/collections/continuous/items?monitoring_location_id=USGS-02238500&parameter_code=00060&filter=%28time+%3E%3D+%272023-06-15T10%3A23%3A01Z%27+AND+time+%3C%3D+%272023-06-15T10%3A38%3A01Z%27%29+OR+%28time+%3E%3D+%272023-06-15T13%3A59%3A42Z%27+AND+time+%3C%3D+%272023-06-15T14%3A14%3A42Z%27%29+OR+%28time+%3E%3D+%272023-06-16T03%3A37%3A49Z%27+AND+time+%3C%3D+%272023-06-16T03%3A52%3A49Z%27%29+OR+%28time+%3E%3D+%272023-06-16T18%3A52%3A15Z%27+AND+time+%3C%3D+%272023-06-16T19%3A07%3A15Z%27%29+OR+%28time+%3E%3D+%272023-06-17T06%3A06%3A32Z%27+AND+time+%3C%3D+%272023-06-17T06%3A21%3A32Z%27%29&skipGeometry=False&limit=50000&filter-lang=cql-text

Decoded the filter is 5 OR'd ±7min30s windows around each target timestamp. The response should contain one feature per target, all from gauge USGS-02238500 with parameter_code 00060, values ≈ 22.4 ft³/s.

Relationship to #238

This PR is built on top of #238 (Add CQL filter passthrough to OGC waterdata getters) and will look lighter once that lands. The helper's core trick — fanning N targets into one request — is only possible because #238 adds filter= support + automatic URL-length-safe chunking to get_continuous. The branch add-get-nearest-continuous is stacked on add-ogc-cql-filter-passthrough, so until #238 merges the diff here will include both changesets; after #238 merges the commits on its branch become common ancestors and this PR's diff reduces to the one commit introducing get_nearest_continuous and its tests.

Please merge #238 first.

Test plan

ruff check / format pass.
pytest tests/waterdata_nearest_test.py — 14/14 pass. Covers basic reduction, CQL filter shape, all three tie modes, missing-window drop, multi-site fan-out, empty targets, kwarg validation, and **kwargs forwarding.
Full non-live suite — 111/111 pass.
Live end-to-end against USGS-02238500 00060 with 5 off-grid targets (10:30:31, 14:07:12, 03:45:19, 18:59:45, 06:14:02) — one request, five ±7min30s windows OR'd together, five rows returned with deltas −432s to +58s (all inside the half-window). See the URL above.

🤖 Generated with Claude Code

thodson-usgs · 2026-04-23T16:39:50Z

@mikemahoney218-usgs,

I vibe coded this function to return the nearest continuous data given a list of timestamps, which just constructs the filter and passes it through the waterdata api. @ldecicco-USGS suggested this feature could be upstreamed as a new endpoint. How do you feel about that?

For each target timestamp, fetch the nearest continuous observation in a single round trip. Builds a CQL OR-chain of per-target bracketed windows, pipes it through ``get_continuous`` (which already auto-chunks long filters across multiple sub-requests), then selects the single observation closest to each target client-side. Exists because the Water Data API matches a single-instant ``time=`` parameter exactly (10:30:31 returns zero rows on a 15-minute gauge), does not implement ``sortby`` for arbitrary queryables, and does not expose a ``T_NEAREST`` CQL function. The narrow-window + client-side reduction is the one pattern that works today for multi-target nearest lookups in one API call. Tie handling is configurable via ``on_tie``: - "first" (default): keep the earlier observation - "last": keep the later observation - "mean": average numeric columns; set ``time`` to the target Default ``window="7min30s"`` matches the 15-minute gauge cadence so most targets' windows contain exactly one observation. Users with irregular-cadence gauges or known data gaps can widen to "15min" or "30min" at the cost of more bytes per response. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

``pandas.Timedelta`` already accepts ``"00:07:30"`` identically to ``"7min30s"``, so no behaviour change is needed — just switch the default and the docstring examples to the more readable form. Added a regression test that asserts the two spellings produce the same CQL filter so future refactors can't drift. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Switch the default from "00:07:30" to "PT7M30S" so the user-visible contract points at an actual international standard (ISO 8601 duration) rather than a pandas-specific colon form. ``pandas.Timedelta`` still accepts all the other forms users may already have typed — ISO 8601, HH:MM:SS, shorthand ("7min30s", "450s"), or a ``pd.Timedelta`` directly — and a parametrized test now exercises each shape to lock in the "whatever ``pd.Timedelta`` takes" contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Split the helper's body into four private functions so the top-level flow reads as a short recipe: - ``_check_nearest_kwargs`` reject kwargs the helper owns (``time``/``filter``/``filter_lang``); validate ``on_tie`` - ``_build_window_or_filter`` CQL ``OR``-chain of bracketed time windows, one per target - ``_pick_nearest_row`` window → nearest row, with the three tie-resolution branches isolated - ``_empty_nearest_result`` empty frame with a ``target_time`` column, used wherever no match lands Drops the nested ``for site → for target → mask → tie-branch`` loop in favor of a flat list-comprehension + walrus against the new helper. Fixes a fragile ``pd.to_datetime(list(targets), utc=True)`` (a numpy ``datetime64`` array would round-trip through ``list`` as tz-stripped scalars) — now passes the input directly to ``pd.to_datetime`` and wraps in ``pd.DatetimeIndex``. Swaps ``df = df.copy(); df["time"] = ...`` for ``df.assign(time=...)`` to avoid the full-frame copy. Also NEWS.md: add a short entry describing the new helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Calling ``get_nearest_continuous`` with an empty ``targets`` is almost always a caller bug (an unfiltered frame, a typo, a mis-named column). The previous code papered over it by firing a trivial-range HTTP request (``time=1900-01-01/1900-01-01``) purely so the caller received a real ``BaseMetadata`` object. That pattern wastes a round-trip on a nonsensical input and hides the bug. Raise ``ValueError`` on empty ``targets`` instead. Shrinks the body and makes a caller mistake loud. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

For each target timestamp, fetch the nearest continuous observation in a single round trip. Builds a CQL OR-chain of per-target bracketed windows, pipes it through ``get_continuous`` (which already auto-chunks long filters across multiple sub-requests), then selects the single observation closest to each target client-side. Exists because the Water Data API matches a single-instant ``time=`` parameter exactly (10:30:31 returns zero rows on a 15-minute gauge), does not implement ``sortby`` for arbitrary queryables, and does not expose a ``T_NEAREST`` CQL function. The narrow-window + client-side reduction is the one pattern that works today for multi-target nearest lookups in one API call. Tie handling is configurable via ``on_tie``: - "first" (default): keep the earlier observation - "last": keep the later observation - "mean": average numeric columns; set ``time`` to the target

thodson-usgs and others added 4 commits April 23, 2026 15:23

thodson-usgs force-pushed the add-get-nearest-continuous branch from b20d08e to 2613792 Compare April 23, 2026 20:25

thodson-usgs merged commit 33e928d into DOI-USGS:main Apr 23, 2026
8 checks passed

thodson-usgs deleted the add-get-nearest-continuous branch April 23, 2026 20:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add waterdata.get_nearest_continuous helper#239

Add waterdata.get_nearest_continuous helper#239
thodson-usgs merged 5 commits intoDOI-USGS:mainfrom
thodson-usgs:add-get-nearest-continuous

thodson-usgs commented Apr 23, 2026 •

edited

Loading

Uh oh!

thodson-usgs commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thodson-usgs commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Usage

Reproduce the live result in one shot

Relationship to #238

Test plan

Uh oh!

thodson-usgs commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thodson-usgs commented Apr 23, 2026 •

edited

Loading