Skip to content

feat: add plan execution behavior#791

Open
nielspardon wants to merge 16 commits into
substrait-io:mainfrom
nielspardon:par-execbehavior
Open

feat: add plan execution behavior#791
nielspardon wants to merge 16 commits into
substrait-io:mainfrom
nielspardon:par-execbehavior

Conversation

@nielspardon

@nielspardon nielspardon commented Mar 30, 2026

Copy link
Copy Markdown
Member
  • Updates substrait Git submodule to v0.87.0
  • Adds ExecutionBehavior field to Plan with VariableEvaluationMode enum
  • Enhances SubstraitBuilder with overloaded plan() methods accepting custom execution behavior
  • Updates SqlToSubstrait and Spark converters to set default execution behavior (VARIABLE_EVALUATION_MODE_PER_PLAN)

fixes #801

Comment thread core/src/main/java/io/substrait/dsl/SubstraitBuilder.java

@benbellick benbellick left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it generally looks good, but just the one comment about ensuring that old plans are still allowed through. Thanks!

Comment thread core/src/test/java/io/substrait/plan/PlanValidationTest.java
Comment thread core/src/main/java/io/substrait/plan/ProtoPlanConverter.java
Comment on lines +44 to +46
if (!getExecutionBehavior().isPresent()) {
throw new IllegalArgumentException("ExecutionBehavior is required but was not set");
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the behavior we will eventually want.

However, if I understand correctly, this check will cause substrait-java to reject proto plans which don't contain the new field. Thus, I think this will inadvertently be a breaking change, and I think we should relax this check until the core libraries all have supported this feature for some time.

Perhaps we can for now just set the field if it is absent and then open a ticket for handling enforcement in the future?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get the idea. I vaguely remember we were discussing in one of the past community syncs about backwards compatibility of plans and I thought the consensus was that at least pre-1.0 we are not going to offer extensive backward compatibility. In that spirit I would introduce the exception with the substrait-java release which implements support for this feature. What we could do is to announce in the exception message which Substrait version introduces this requirement.

      throw new IllegalArgumentException("ExecutionBehavior is required since Substrait v0.87.0 but was not set");

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tried to capture the breaking change policy here.

The core bit being:

Breaking changes should be made so that users can update systems independently and asynchronously.

For us, this change as is will cause our systems to break. substrait-go is producing plans upstream which are consumed by substrait-java, so the missing ExecutionBehavior in the substrait-go-produced plan will cause an exception to be thrown. I think the appropriate migration strategy here is to allow consumption of old plans, regardless of if they have ExecutionBehavior or not, but we can produce plans that always have it. Then in the future once all of the core libraries have been consistently producing the new feature for some time, we can then enforce the presence of the field on consumption. This is how the URI -> URN migration was handled.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that sense you would not throw an exception when consuming a plan that has no execution behavior set but only throw an exception when a plan is being produced using substrait-java that does not have it set?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, exactly.

(Until later when all libraries have been producing the new ExecutionBehavior for a bit, at which point we can switch on exception-throwing for consumption of plans missing the new field.)

@nielspardon nielspardon Jun 5, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I tried to express above the practice is different/messy:

Even that is a little murky since we do have Substrait SDKs being updated to the newer spec version and released without all the changes fully reflected in the code, e.g. substrait-java is already producing plans claiming to be v0.87.0 since we already did the submodule bump which are technically invalid v0.87.0 plans. I don't think we can be a 100% clean here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it is messy here. This is why I am apprehensive to use the version as a gate. What are you worried about happening if we take this approach?

Is there anything wrong with passing the value through if it is present, and defaulting to VARIABLE_EVALUATION_MODE_PER_PLAN if it is absent (regardless of version)?

I personally am interpreting this to mean to proto version that plan was produced with, rather than the library's fully supported version. Since no library actually covers all of the spec, a library declaring that it supports the whole spec version is probably impossible at the moment 😅

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally am interpreting this to mean to proto version that plan was produced with, rather than the library's fully supported version. Since no library actually covers all of the spec, a library declaring that it supports the whole spec version is probably impossible at the moment 😅

Yes, that's the reality. I think there is a distinction between features that are opt-in via Substrait dialect like the support for execution context variables is opt-in but something like the Substrait plan version field is not opt-in via a dialect. This is a general requirement.

With the current definition of execution behavior I would argue it is like plan version: if you claim you are producing plans using spec version 0.87.0 then you have to define execution behavior (until we change the spec to relax this requirement).

We can update the spec to say: default is per_plan and then always setting it to per_plan if absent would be strictly compliant with the spec.

My main concern is that this behavior would not be strictly compliant with the current spec and we would be setting a bad example if we started to deviate from what the released version of the spec says.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other concern that feeds into this is that we introduced this in the spec 2+ months ago. If our turnaround for library adoption was shorter than we could react more quickly with spec changes, such findings could be updated in the spec sooner and we could adopt the modifications more quickly in the spec. In the last 2+ months we had multiple other modifications to the spec which are also on the backlog and if we are spending so much time on the adoption of all of the other spec changes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that this behavior would not be strictly compliant with the current spec and we would be setting a bad example if we started to deviate from what the released version of the spec says.

I see where you're coming from with this, and I agree that it's not totally desirable. It's taking advantage of the fact that if this isn't being set, you as a producer likely don't care about it, but it's not strictly in the spec. The ideal way to have done this would probably have been to include something like: "If this is not set / UNSPECIFIED default to X", which is effectively what we're doing here behind the specs back.

If we don't do some form of compatibility shim though, we end up in a situation where a consumer updates substrait-java and then having every plan produced by a system that has not yet been updated fail the validation.

BREAKING CHANGES: execution behavior is a required field introduced in Substrait v0.87.0

Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
@nielspardon nielspardon requested a review from benbellick June 3, 2026 14:55
Signed-off-by: Niels Pardon <par@zurich.ibm.com>

@benbellick benbellick left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment about the behavior. I think we still want the @Value.check annotation. We can just add the field value explicitly to the POJO Plan when we do FromProto.

Also just a minor aside, it makes it easier to review if you add new commits instead of force-pushing. We squash commit anyways so the history doesn't matter. However, if you force-push, the "show changes since your last review" button doesn't work anymore. Thanks!
image

Comment thread core/src/main/java/io/substrait/plan/Plan.java Outdated
Comment thread core/src/main/java/io/substrait/plan/ProtoPlanConverter.java Outdated
Comment thread core/src/main/java/io/substrait/plan/PlanProtoConverter.java Outdated
Comment thread core/src/main/java/io/substrait/plan/ProtoPlanConverter.java Outdated
Comment on lines +44 to +46
if (!getExecutionBehavior().isPresent()) {
throw new IllegalArgumentException("ExecutionBehavior is required but was not set");
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a cleaner approach would be to keep the @Value.Check validation.

My mental model is that our internal POJO representation should always be whatever the "new" thing is. So in this case, POJOS must have the required field set. Thus, if you construct a plan programmatically, you should have to set an ExecutionBehavior explicitly, or use a helper that supplies the default (which is what you have already done).

However for backwards compatibility, ProtoPlanConverter.from(...) can be the boundary. If an older proto plan does not have execution_behavior, we can set it to VARIABLE_EVALUATION_MODE_PER_PLAN during deserialization.

That keeps the compatibility handling invisible to callers and avoids having Java Plan objects that are only invalid when serialized back to protobuf.

I think defaulting is safe here. If an old plan did not have this field, it could not have been relying on per-record variable evaluation semantics, so defaulting to the existing/per-plan behavior should preserve semantics. Do you agree with that approach?

@nielspardon

nielspardon commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

Also just a minor aside, it makes it easier to review if you add new commits instead of force-pushing. We squash commit anyways so the history doesn't matter. However, if you force-push, the "show changes since your last review" button doesn't work anymore. Thanks!

I have to force push for a rebase, don't I? The commit are all still preservered just have different commit ids since I rebased.

nielspardon and others added 4 commits June 5, 2026 09:09
- "test: update tests"
- "feat: validate only when plan gets serialized to proto"
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Co-authored-by: Ben Bellick <36523439+benbellick@users.noreply.github.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
@benbellick

benbellick commented Jun 5, 2026

Copy link
Copy Markdown
Member

Also just a minor aside, it makes it easier to review if you add new commits instead of force-pushing. We squash commit anyways so the history doesn't matter. However, if you force-push, the "show changes since your last review" button doesn't work anymore. Thanks!

I have to force push for a rebase, don't I? The commit are all still preservered just have different commit ids since I rebased.

You can also merge main into your branch if you have merge conflicts. Yes the changes are still present, but the commit hash changes, which means GH no longer recognizes which commits came before a review and which come after.

@vbarua

vbarua commented Jun 5, 2026

Copy link
Copy Markdown
Member

I think we've drifted into two different discussions here:

  1. Based on the deprecation policy, how should we implement support for ExecutionContextVariable in a backwards compatible way.
  2. What does the version field mean?

For 1, the main constraint is that we want the library to read protobuf plans that don't set the ExecutionContextVariable, as this allows users to update to this version without having to coordinate producer changes. As Ben points out, we can handle this at the protobuf conversion layer by setting a default value. This should be safe, because a system that doesn't set this variable isn't using it yet. Within the POJO code, we can assume that ExecutionContextVariable is set, and we can emit protobuf plans including it as other systems will safely ignore it until they are updated.

2 is a discussion that goes beyond this PR IMO. In observed practice, the plan version set is just the protobuf version and doesn't correspond to specification support at the version. At this point in time we can't use it for anything beyond that.

Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
@nielspardon

Copy link
Copy Markdown
Member Author

2 is a discussion that goes beyond this PR IMO. In observed practice, the plan version set is just the protobuf version and doesn't correspond to specification support at the version. At this point in time we can't use it for anything beyond that.

I think this summary leaves out an important point. Sure, we can say: this is current practice and we just accept it. On the other hand since the project has such a high emphasis on interoperability and portability we should define what is the minimum required compliance with a given Substrait specification version in my opinion.

@nielspardon

Copy link
Copy Markdown
Member Author

I updated the PR to always set the default execution behavior when parsing a Substrait plan from proto.

@vbarua

vbarua commented Jun 8, 2026

Copy link
Copy Markdown
Member

Sure, we can say: this is current practice and we just accept it. On the other hand since the project has such a high emphasis on interoperability and portability we should define what is the minimum required compliance with a given Substrait specification version in my opinion.

Sorry, I didn't mean to imply that we shouldn't work on 2 with my statement. I think it is worth pursuing, I just don't think it's something that we can resolve unilaterally in this PR discussion as there are a ton of moving parts. Filed an issue substrait-io/substrait#1102 for it.

planBuilder.executionBehavior(
ExecutionBehavior.builder()
.variableEvaluationMode(VariableEvaluationMode.VARIABLE_EVALUATION_MODE_PER_PLAN)
.build());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the only piece we need for backwards compatability IMO.

Comment thread core/src/main/java/io/substrait/plan/ProtoPlanConverter.java Outdated
Comment thread core/src/main/java/io/substrait/plan/Plan.java Outdated
Comment on lines +44 to +46
if (!getExecutionBehavior().isPresent()) {
throw new IllegalArgumentException("ExecutionBehavior is required but was not set");
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that this behavior would not be strictly compliant with the current spec and we would be setting a bad example if we started to deviate from what the released version of the spec says.

I see where you're coming from with this, and I agree that it's not totally desirable. It's taking advantage of the fact that if this isn't being set, you as a producer likely don't care about it, but it's not strictly in the spec. The ideal way to have done this would probably have been to include something like: "If this is not set / UNSPECIFIED default to X", which is effectively what we're doing here behind the specs back.

If we don't do some form of compatibility shim though, we end up in a situation where a consumer updates substrait-java and then having every plan produced by a system that has not yet been updated fail the validation.

Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
@bestbeforetoday

Copy link
Copy Markdown
Member

I have to force push for a rebase, don't I? The commit are all still preservered just have different commit ids since I rebased.

You can also merge main into your branch if you have merge conflicts. Yes the changes are still present, but the commit hash changes, which means GH no longer recognizes which commits came before a review and which come after.

Just to add another alternative opinion... I have a pet hate for merging main into PR branches. It just ends up with such a messy commit history within the PR branch. I absolutely agree with adding additional commits to address PR comments so the specific changes can be viewed, but I much prefer rebasing to get the branch up-to-date with main and resolve any merge conflicts in the rebase so you only see the incremental changes from main in the PR commit history.

@nielspardon nielspardon requested review from benbellick and vbarua June 11, 2026 07:04
Comment thread core/src/main/java/io/substrait/plan/PlanProtoConverter.java

@vbarua vbarua left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One final comment on the definition of VariableEvaluationMode which deviates from the established pattern of enum definitions in the codebase.

Also left some minor comments for your tests. I don't consider these blocking.

VARIABLE_EVALUATION_MODE_PER_PLAN,
VARIABLE_EVALUATION_MODE_PER_RECORD
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an established pattern for representing these enums in the library that we should follow here for consistency:

/** Defines the type of window bounds (rows or range) for window functions. */
enum WindowBoundsType {
/** Unspecified window bounds type. */
UNSPECIFIED(io.substrait.proto.Expression.WindowFunction.BoundsType.BOUNDS_TYPE_UNSPECIFIED),
/** Window bounds based on row count. */
ROWS(io.substrait.proto.Expression.WindowFunction.BoundsType.BOUNDS_TYPE_ROWS),
/** Window bounds based on value range. */
RANGE(io.substrait.proto.Expression.WindowFunction.BoundsType.BOUNDS_TYPE_RANGE);
private final io.substrait.proto.Expression.WindowFunction.BoundsType proto;
WindowBoundsType(io.substrait.proto.Expression.WindowFunction.BoundsType proto) {
this.proto = proto;
}

/** Defines how aggregation functions are invoked (ALL, DISTINCT). */
enum AggregationInvocation {
/** Unspecified aggregation invocation. */
UNSPECIFIED(AggregateFunction.AggregationInvocation.AGGREGATION_INVOCATION_UNSPECIFIED),
/** Aggregate over all values. */
ALL(AggregateFunction.AggregationInvocation.AGGREGATION_INVOCATION_ALL),
/** Aggregate over distinct values only. */
DISTINCT(AggregateFunction.AggregationInvocation.AGGREGATION_INVOCATION_DISTINCT);
private final io.substrait.proto.AggregateFunction.AggregationInvocation proto;
AggregationInvocation(io.substrait.proto.AggregateFunction.AggregationInvocation proto) {
this.proto = proto;
}

Comment thread core/src/main/java/io/substrait/plan/PlanProtoConverter.java
}

/**
* Test case 6: Conversion from protobuf Plan with UNSPECIFIED mode fails validation.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: none of the others tests you added in this file are numbered.

* execution behavior with PER_PLAN mode.
*/
@Test
void testRoundTripWithExecutionBehaviorPerPlan() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: The to/from tests above feel somewhat redundant given that you also have roundtrip tests. I don't feel strongly enough to ask to keep one or the other, but thought it was worth mentioning at least.

* execution behavior correctly.
*/
@Test
void testRoundTripWithExecutionBehaviorPerRecord() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this test and one above differ only in the VariableEvaluationMode. This feels like a good place to use a parameterized test like:

  @ParameterizedTest
  @EnumSource(
      value = Plan.ExecutionBehavior.VariableEvaluationMode.class,
      names = {"VARIABLE_EVALUATION_MODE_PER_PLAN", "VARIABLE_EVALUATION_MODE_PER_RECORD"})
  void testRoundTripWithExecutionBehaviorPerRecord(
      Plan.ExecutionBehavior.VariableEvaluationMode vem) {

to avoid duplication.

* without throwing any exceptions.
*/
@Test
void testValidExecutionBehavior_PerPlan() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: This, with the test below, also feels like a good place for parameterized tests. Also, the _ in the name doesn't feel very Java and isn't really consistent with this file or your other names.

* successfully converted.
*/
@Test
void testRoundTripEmptyPlanWithExecutionBehavior() {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I don't if know if this provides any additional value given that you already have roundtrip tests above it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add new Plan.ExecutionBehavior

4 participants