feat: add plan execution behavior#791
Conversation
93bf3c4 to
782c994
Compare
benbellick
left a comment
There was a problem hiding this comment.
I think it generally looks good, but just the one comment about ensuring that old plans are still allowed through. Thanks!
| if (!getExecutionBehavior().isPresent()) { | ||
| throw new IllegalArgumentException("ExecutionBehavior is required but was not set"); | ||
| } |
There was a problem hiding this comment.
I think this is the behavior we will eventually want.
However, if I understand correctly, this check will cause substrait-java to reject proto plans which don't contain the new field. Thus, I think this will inadvertently be a breaking change, and I think we should relax this check until the core libraries all have supported this feature for some time.
Perhaps we can for now just set the field if it is absent and then open a ticket for handling enforcement in the future?
There was a problem hiding this comment.
I get the idea. I vaguely remember we were discussing in one of the past community syncs about backwards compatibility of plans and I thought the consensus was that at least pre-1.0 we are not going to offer extensive backward compatibility. In that spirit I would introduce the exception with the substrait-java release which implements support for this feature. What we could do is to announce in the exception message which Substrait version introduces this requirement.
throw new IllegalArgumentException("ExecutionBehavior is required since Substrait v0.87.0 but was not set");There was a problem hiding this comment.
We tried to capture the breaking change policy here.
The core bit being:
Breaking changes should be made so that users can update systems independently and asynchronously.
For us, this change as is will cause our systems to break. substrait-go is producing plans upstream which are consumed by substrait-java, so the missing ExecutionBehavior in the substrait-go-produced plan will cause an exception to be thrown. I think the appropriate migration strategy here is to allow consumption of old plans, regardless of if they have ExecutionBehavior or not, but we can produce plans that always have it. Then in the future once all of the core libraries have been consistently producing the new feature for some time, we can then enforce the presence of the field on consumption. This is how the URI -> URN migration was handled.
There was a problem hiding this comment.
in that sense you would not throw an exception when consuming a plan that has no execution behavior set but only throw an exception when a plan is being produced using substrait-java that does not have it set?
There was a problem hiding this comment.
yes, exactly.
(Until later when all libraries have been producing the new ExecutionBehavior for a bit, at which point we can switch on exception-throwing for consumption of plans missing the new field.)
There was a problem hiding this comment.
As I tried to express above the practice is different/messy:
Even that is a little murky since we do have Substrait SDKs being updated to the newer spec version and released without all the changes fully reflected in the code, e.g. substrait-java is already producing plans claiming to be v0.87.0 since we already did the submodule bump which are technically invalid v0.87.0 plans. I don't think we can be a 100% clean here.
There was a problem hiding this comment.
Agreed it is messy here. This is why I am apprehensive to use the version as a gate. What are you worried about happening if we take this approach?
Is there anything wrong with passing the value through if it is present, and defaulting to VARIABLE_EVALUATION_MODE_PER_PLAN if it is absent (regardless of version)?
I personally am interpreting this to mean to proto version that plan was produced with, rather than the library's fully supported version. Since no library actually covers all of the spec, a library declaring that it supports the whole spec version is probably impossible at the moment 😅
There was a problem hiding this comment.
I personally am interpreting this to mean to proto version that plan was produced with, rather than the library's fully supported version. Since no library actually covers all of the spec, a library declaring that it supports the whole spec version is probably impossible at the moment 😅
Yes, that's the reality. I think there is a distinction between features that are opt-in via Substrait dialect like the support for execution context variables is opt-in but something like the Substrait plan version field is not opt-in via a dialect. This is a general requirement.
With the current definition of execution behavior I would argue it is like plan version: if you claim you are producing plans using spec version 0.87.0 then you have to define execution behavior (until we change the spec to relax this requirement).
We can update the spec to say: default is per_plan and then always setting it to per_plan if absent would be strictly compliant with the spec.
My main concern is that this behavior would not be strictly compliant with the current spec and we would be setting a bad example if we started to deviate from what the released version of the spec says.
There was a problem hiding this comment.
The other concern that feeds into this is that we introduced this in the spec 2+ months ago. If our turnaround for library adoption was shorter than we could react more quickly with spec changes, such findings could be updated in the spec sooner and we could adopt the modifications more quickly in the spec. In the last 2+ months we had multiple other modifications to the spec which are also on the backlog and if we are spending so much time on the adoption of all of the other spec changes.
There was a problem hiding this comment.
My main concern is that this behavior would not be strictly compliant with the current spec and we would be setting a bad example if we started to deviate from what the released version of the spec says.
I see where you're coming from with this, and I agree that it's not totally desirable. It's taking advantage of the fact that if this isn't being set, you as a producer likely don't care about it, but it's not strictly in the spec. The ideal way to have done this would probably have been to include something like: "If this is not set / UNSPECIFIED default to X", which is effectively what we're doing here behind the specs back.
If we don't do some form of compatibility shim though, we end up in a situation where a consumer updates substrait-java and then having every plan produced by a system that has not yet been updated fail the validation.
BREAKING CHANGES: execution behavior is a required field introduced in Substrait v0.87.0 Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
1a28255 to
32b8933
Compare
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
benbellick
left a comment
There was a problem hiding this comment.
I left a comment about the behavior. I think we still want the @Value.check annotation. We can just add the field value explicitly to the POJO Plan when we do FromProto.
Also just a minor aside, it makes it easier to review if you add new commits instead of force-pushing. We squash commit anyways so the history doesn't matter. However, if you force-push, the "show changes since your last review" button doesn't work anymore. Thanks!

| if (!getExecutionBehavior().isPresent()) { | ||
| throw new IllegalArgumentException("ExecutionBehavior is required but was not set"); | ||
| } |
There was a problem hiding this comment.
I think a cleaner approach would be to keep the @Value.Check validation.
My mental model is that our internal POJO representation should always be whatever the "new" thing is. So in this case, POJOS must have the required field set. Thus, if you construct a plan programmatically, you should have to set an ExecutionBehavior explicitly, or use a helper that supplies the default (which is what you have already done).
However for backwards compatibility, ProtoPlanConverter.from(...) can be the boundary. If an older proto plan does not have execution_behavior, we can set it to VARIABLE_EVALUATION_MODE_PER_PLAN during deserialization.
That keeps the compatibility handling invisible to callers and avoids having Java Plan objects that are only invalid when serialized back to protobuf.
I think defaulting is safe here. If an old plan did not have this field, it could not have been relying on per-record variable evaluation semantics, so defaulting to the existing/per-plan behavior should preserve semantics. Do you agree with that approach?
I have to force push for a rebase, don't I? The commit are all still preservered just have different commit ids since I rebased. |
- "test: update tests" - "feat: validate only when plan gets serialized to proto"
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Co-authored-by: Ben Bellick <36523439+benbellick@users.noreply.github.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
You can also merge |
|
I think we've drifted into two different discussions here:
For 1, the main constraint is that we want the library to read protobuf plans that don't set the ExecutionContextVariable, as this allows users to update to this version without having to coordinate producer changes. As Ben points out, we can handle this at the protobuf conversion layer by setting a default value. This should be safe, because a system that doesn't set this variable isn't using it yet. Within the POJO code, we can assume that ExecutionContextVariable is set, and we can emit protobuf plans including it as other systems will safely ignore it until they are updated. 2 is a discussion that goes beyond this PR IMO. In observed practice, the plan version set is just the protobuf version and doesn't correspond to specification support at the version. At this point in time we can't use it for anything beyond that. |
This reverts commit 7ed2da4.
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
I think this summary leaves out an important point. Sure, we can say: this is current practice and we just accept it. On the other hand since the project has such a high emphasis on interoperability and portability we should define what is the minimum required compliance with a given Substrait specification version in my opinion. |
|
I updated the PR to always set the default execution behavior when parsing a Substrait plan from proto. |
Sorry, I didn't mean to imply that we shouldn't work on 2 with my statement. I think it is worth pursuing, I just don't think it's something that we can resolve unilaterally in this PR discussion as there are a ton of moving parts. Filed an issue substrait-io/substrait#1102 for it. |
| planBuilder.executionBehavior( | ||
| ExecutionBehavior.builder() | ||
| .variableEvaluationMode(VariableEvaluationMode.VARIABLE_EVALUATION_MODE_PER_PLAN) | ||
| .build()); |
There was a problem hiding this comment.
This is actually the only piece we need for backwards compatability IMO.
| if (!getExecutionBehavior().isPresent()) { | ||
| throw new IllegalArgumentException("ExecutionBehavior is required but was not set"); | ||
| } |
There was a problem hiding this comment.
My main concern is that this behavior would not be strictly compliant with the current spec and we would be setting a bad example if we started to deviate from what the released version of the spec says.
I see where you're coming from with this, and I agree that it's not totally desirable. It's taking advantage of the fact that if this isn't being set, you as a producer likely don't care about it, but it's not strictly in the spec. The ideal way to have done this would probably have been to include something like: "If this is not set / UNSPECIFIED default to X", which is effectively what we're doing here behind the specs back.
If we don't do some form of compatibility shim though, we end up in a situation where a consumer updates substrait-java and then having every plan produced by a system that has not yet been updated fail the validation.
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Signed-off-by: Niels Pardon <par@zurich.ibm.com>
Just to add another alternative opinion... I have a pet hate for merging main into PR branches. It just ends up with such a messy commit history within the PR branch. I absolutely agree with adding additional commits to address PR comments so the specific changes can be viewed, but I much prefer rebasing to get the branch up-to-date with main and resolve any merge conflicts in the rebase so you only see the incremental changes from main in the PR commit history. |
vbarua
left a comment
There was a problem hiding this comment.
One final comment on the definition of VariableEvaluationMode which deviates from the established pattern of enum definitions in the codebase.
Also left some minor comments for your tests. I don't consider these blocking.
| VARIABLE_EVALUATION_MODE_PER_PLAN, | ||
| VARIABLE_EVALUATION_MODE_PER_RECORD | ||
| } | ||
| } |
There was a problem hiding this comment.
There's an established pattern for representing these enums in the library that we should follow here for consistency:
substrait-java/core/src/main/java/io/substrait/expression/Expression.java
Lines 1758 to 1771 in bb8d922
substrait-java/core/src/main/java/io/substrait/expression/Expression.java
Lines 2161 to 2174 in bb8d922
| } | ||
|
|
||
| /** | ||
| * Test case 6: Conversion from protobuf Plan with UNSPECIFIED mode fails validation. |
There was a problem hiding this comment.
minor: none of the others tests you added in this file are numbered.
| * execution behavior with PER_PLAN mode. | ||
| */ | ||
| @Test | ||
| void testRoundTripWithExecutionBehaviorPerPlan() { |
There was a problem hiding this comment.
minor: The to/from tests above feel somewhat redundant given that you also have roundtrip tests. I don't feel strongly enough to ask to keep one or the other, but thought it was worth mentioning at least.
| * execution behavior correctly. | ||
| */ | ||
| @Test | ||
| void testRoundTripWithExecutionBehaviorPerRecord() { |
There was a problem hiding this comment.
minor: this test and one above differ only in the VariableEvaluationMode. This feels like a good place to use a parameterized test like:
@ParameterizedTest
@EnumSource(
value = Plan.ExecutionBehavior.VariableEvaluationMode.class,
names = {"VARIABLE_EVALUATION_MODE_PER_PLAN", "VARIABLE_EVALUATION_MODE_PER_RECORD"})
void testRoundTripWithExecutionBehaviorPerRecord(
Plan.ExecutionBehavior.VariableEvaluationMode vem) {to avoid duplication.
| * without throwing any exceptions. | ||
| */ | ||
| @Test | ||
| void testValidExecutionBehavior_PerPlan() { |
There was a problem hiding this comment.
minor: This, with the test below, also feels like a good place for parameterized tests. Also, the _ in the name doesn't feel very Java and isn't really consistent with this file or your other names.
| * successfully converted. | ||
| */ | ||
| @Test | ||
| void testRoundTripEmptyPlanWithExecutionBehavior() { |
There was a problem hiding this comment.
minor: I don't if know if this provides any additional value given that you already have roundtrip tests above it.
fixes #801