Skip to content

Workflow dep resolution leaves scheduled_at stale, breaking queue delay monitoring #1185

@jguttman94

Description

@jguttman94

Description

When WorkflowStageJobs / WorkflowStageJobsByIDMany resolve dependencies and transition a job from pending to available, the scheduled_at column is not updated. It retains its original value from insertion time, which can be hours or months old for long-running workflows.

The UPDATE in both queries only sets state and metadata.workflow_staged_at:

UPDATE river_job
SET
  state = jobs_to_make_available.new_state,
  metadata = jsonb_set(metadata, '{workflow_staged_at}'::text[], $1::jsonb, true)
FROM jobs_to_make_available
WHERE river_job.id = jobs_to_make_available.id

The jobs_to_make_available CTE already reads scheduled_at to decide the target state (available if scheduled_at <= now() + 5s, otherwise scheduled), so by the time the UPDATE executes, the original scheduled_at value has served its purpose.

Impact

Any monitoring that uses NOW() - scheduled_at on available jobs to measure queue delay will report wildly inflated values for dependency-resolved workflow jobs. For workflows where deps take hours or months to resolve, this produces false alarms on queue health metrics.

Current workaround

We discovered that workflow_staged_at is already stamped in metadata during dep resolution, so we use it as a fallback in our metrics query:

MAX(
  CASE
    WHEN metadata ? 'workflow_staged_at'
      THEN NOW() - (metadata->>'workflow_staged_at')::timestamptz
    ELSE NOW() - scheduled_at
  END
) as oldest_delay

This works but requires casting a JSONB string to timestamptz in an aggregate query, which is less ergonomic than using the native scheduled_at column directly.

Proposed solutions

Either of these would address the problem:

  1. Update scheduled_at = now() in WorkflowStageJobs / WorkflowStageJobsByIDMany when transitioning jobs to available. This makes scheduled_at accurately reflect when the job became eligible for pickup, consistent with how non-workflow jobs behave. For jobs transitioning to scheduled (because their scheduled_at is still in the future), no change is needed — scheduled_at is already correct.

  2. Add a first-class available_at column to river_job that records when a job entered the available state, regardless of how it got there (direct insert, scheduled time reached, or workflow dep resolution). This would give monitoring queries a reliable, indexed timestamp without relying on scheduled_at semantics or JSONB metadata. It would also benefit non-workflow use cases like jobs inserted with Pending: true that are later moved to available by application code.

Environment

  • River Pro v0.22.0
  • PostgreSQL

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions